Blog moved
The I2Lab website was moved to a new server a few weeks ago, but the
new user page and this blog didn't get moved... The blog is now back,
the new user form will be shortly, perhaps with some improvements.
posted at: 15:36 | path: /news | permanent link to this entry
power outage?
There appears to have been a power event some time around 7:30pm Friday.
Some nodes (with jobs on them) were rebooted. Check your jobs.
posted at: 00:29 | path: /news | permanent link to this entry
sge scratch resource limit
There have been some updates to the local
SGE hints page, so
it might be worth re-reading it.
Also, I have added the scratch resource request. For example, if your job requires at least 40G of space in /state/partition1 then add the following option to your sge command line:
-l scratch=40GYour job will wait for a node with at least this much space.
Systems down...
A few systems in hilbert shut down this afternoon due to a short
temperature event. Some jobs may need to be restarted.
posted at: 15:28 | path: /news | permanent link to this entry
Lightning storm...
A brief power flicker caused by this afternoon's lightning storm
took out a very small number of nodes. Check your jobs...
posted at: 17:21 | path: /news | permanent link to this entry
Lightning storm
Bad lightning storm may have taken out a few nodes tonight.
Check your jobs and make sure they finished rather than aborting.
posted at: 22:53 | path: /news | permanent link to this entry
HEC power failure
Power failed for the HEC building and all servers went down.
All jobs will need to be restarted.
posted at: 11:20 | path: /news | permanent link to this entry
Temperature event
There was a temperature event around 4am this morning that caused a number
of machines in euler and hilbert to automatically shut down.
Likely some jobs were lost due to this and will need to be restarted.
posted at: 08:35 | path: /news | permanent link to this entry
First major lightning storm of the season
Tonight's storm took out nearly all the clusters.
Nearly all jobs that were running were aborted.
Jobs that were pending in the queue should have started by now.
posted at: 23:48 | path: /news | permanent link to this entry
All hilbert jobs have been aborted.
Hilbert's home disk ran out of space in the morning of May 7.
Since jobs running at that time were probably corrupted, all jobs
have been aborted.
As a reminder, the home disk on hilbert is not very large, and IS NOT TO BE USED FOR LONG TERM STORAGE. Please move all files you don't immediately need for the execution of your jobs onto newton.
Users not following this policy will find all of their files moved for them.
posted at: 03:23 | path: /news | permanent link to this entry
