I2Lab blog

I2Lab home

Mon, 19 Oct 2009

Blog moved
The I2Lab website was moved to a new server a few weeks ago, but the new user page and this blog didn't get moved... The blog is now back, the new user form will be shortly, perhaps with some improvements.

posted at: 15:36 | path: /news | permanent link to this entry

Sat, 26 Sep 2009

power outage?
There appears to have been a power event some time around 7:30pm Friday. Some nodes (with jobs on them) were rebooted. Check your jobs.

posted at: 00:29 | path: /news | permanent link to this entry

Fri, 21 Aug 2009

sge scratch resource limit
There have been some updates to the local SGE hints page, so it might be worth re-reading it.

Also, I have added the scratch resource request. For example, if your job requires at least 40G of space in /state/partition1 then add the following option to your sge command line:

   -l scratch=40G
Your job will wait for a node with at least this much space.

posted at: 00:52 | path: /news | permanent link to this entry

Tue, 11 Aug 2009

Systems down...
A few systems in hilbert shut down this afternoon due to a short temperature event. Some jobs may need to be restarted.

posted at: 15:28 | path: /news | permanent link to this entry

Sun, 26 Jul 2009

Lightning storm...
A brief power flicker caused by this afternoon's lightning storm took out a very small number of nodes. Check your jobs...

posted at: 17:21 | path: /news | permanent link to this entry

Wed, 15 Jul 2009

Lightning storm
Bad lightning storm may have taken out a few nodes tonight. Check your jobs and make sure they finished rather than aborting.

posted at: 22:53 | path: /news | permanent link to this entry

Fri, 10 Jul 2009

HEC power failure
Power failed for the HEC building and all servers went down. All jobs will need to be restarted.

posted at: 11:20 | path: /news | permanent link to this entry

Mon, 29 Jun 2009

Temperature event
There was a temperature event around 4am this morning that caused a number of machines in euler and hilbert to automatically shut down. Likely some jobs were lost due to this and will need to be restarted.

posted at: 08:35 | path: /news | permanent link to this entry

Wed, 20 May 2009

First major lightning storm of the season
Tonight's storm took out nearly all the clusters. Nearly all jobs that were running were aborted. Jobs that were pending in the queue should have started by now.

posted at: 23:48 | path: /news | permanent link to this entry

Thu, 07 May 2009

All hilbert jobs have been aborted.
Hilbert's home disk ran out of space in the morning of May 7. Since jobs running at that time were probably corrupted, all jobs have been aborted.

As a reminder, the home disk on hilbert is not very large, and IS NOT TO BE USED FOR LONG TERM STORAGE. Please move all files you don't immediately need for the execution of your jobs onto newton.

Users not following this policy will find all of their files moved for them.

posted at: 03:23 | path: /news | permanent link to this entry