I2Lab Policies

Note that these policies may change without notice. Major policy changes requiring changes in user usage will be posted in the i2lab blog and when appropriate, emailed to affected users and/or posted in the login message of the day.

Table of contents:

Usage priorities

The priorities in using the I2Lab systems are:
  1. Research groups that contribute to the I2Lab with either hardware or funds or personnel.
  2. High-impact funded research. - Research that produces results published in top journals in the field.
  3. Funded research.
  4. Research with a high potential to be funded.
  5. Other research
Job scheduling policies will be set so that users within each group get equal chance to get cpu time.

Batch queues

Users will need to familiarize themselves with use of the batch queue systems, and use them to submit their jobs. Jobs not using the BQS will be killed without warning or notice.

The current batch queue system installed is Sun GridEngine.

Batch queues will be adjusted to optimize fairness, system usage, and to allow hardware maintaince as necessary.

Currently, there are two main queues:

Finer grained queues may be added if usage justifies it. Any machine may be used for the short queue. The the number of machines dedicated to the short queue will be adjusted based on usage.

We strongly discourage the use of a single cpu of a parallel machine. Single cpu jobs use resources that would be more effectively used by other users' paralell jobs. Users needing computing cycles but unwilling to parallelize their applications should make other arrangements to farm the unused computing cycles of systems belonging to their colleagues. A small number of single cpu jobs will be tolerated, but only if the single cpu jobs are not blocking resources needed by parallel jobs. Array jobs would pe preferable to multiple single cpu jobs.

We do not currently prefer any particular method of parallelization. For example, for shared memory jobs, pthreads or OpenMP could be used. MPI can be used to make jobs more flexible and able to use more than one node. Embarassingly parallelizable jobs are also acceptable, but should be structured as an array job to minimize overhead on job scheduling.

Power management

Power management is currently implemented on the cluster. Idle nodes with no jobs on them will be automatically powered down. A few idle nodes may be left on to allow fast response for short jobs.

Nodes will be powered back on to match the needs of the batch queue system. Clusters with a large number of long term unused nodes will be deeply powered down, which requires manual intervention to power back on.

Scratch disk space

Each compute node (excluding the head node) has local disk space which is faster than the networked home directory from the head node. If such "scratch space" is useful in speeding up a compute job, users are welcome to use it. On all rocks based clusters, this space is in /state/partition1/ on each compute node which is writable to all users and not shared with other nodes. Please be aware of and follow the following rules for using this space:
  1. Create a directory in this space using your user name, and create any local files you need within that directory or subdirectories of that directory.
  2. Be considerate of others, and be careful to not completely fill the space.
  3. This area IS NOT BACKED UP and could dissapear at any moment. Ideally, this will not happen when your job is running, but if a machine crashes due to disk failure, both your job and the data would be lost. If a machine requires a disk change, no efforts will be made to recover data in the scratch area.
  4. Users may not permanently store any files in the scratch space without special arrangement.
  5. If you do not have jobs running on the machine, your scratch area may be purged without notice or backup. It is your job's responsibility to copy critical result data back to your home directory when it completes. (Let us know if you need help making this work.)
  6. Preferably, your job will also delete its scratch directory when it is done.
If any of these policies are a problem for your job, please let us know, and we may consider making an exception.

Home directory space

Limited home directory space is available on the head node for each cluster. This data will not be actively backed up, but it is backed by a redundant RAID array, so no data should be lost in the event of a single disk failure. Every attempt will be made to protect and recover any data short of actually backing it up.

Each user will be allowed a maximum of 100G of space per head-node. Exceptions can be made on request, but the disk space is limited and not easily expandable. Disk quotas are implemented and enforced. You can view your current disk quota with the command

  quota -vs
You should get a warning when you log in if you are exceeding your soft limit. When the soft limit is exceeded, you have one week to go below it. When the time expires or if you exceed your hard limit, you will not be able to write any more data to the disk, and editing files will cause data loss.

Note that users must comply with the data retention policy, and not store large data files on the head nodes long term. Users violating this may get their files archived and their disk quota reduced.

Data retention and backup

Cluster head nodes should not be used for permanent data storage. Long term data should be moved to newton when it is no longer active, but still needed for future reference. The rsync -a command is convenient for copying data sets between machines; it can be used to quickly update directories that are already partially copied, and can be used to merge similar directories. Read the man page or ask for assistance for more information.

Space on newton will be less limited, but you may need to ask for more if you run out. Newton also will not be backed up, but the RAID arrays on newton have a somewhat higher level of redundancy.

Currently, not all users have newton accounts; but all users can get one by asking.

Data older than 6 months left on cluster head nodes will be archived and removed without warning, but with notice after the fact. Accounts that allow the grace period to expire after exceeding the soft quota limit may be archived before the 6 months expire. If accounts are archived, most likely the entire account will be archived (including current files) and all the files will be removed from the cluster head node. Archived files will be restored on newton only by specific request. If space on head nodes becomes short, some users may be asked to reduce space use or move files to newton. Accounts found storing data long term on head nodes may find their disk quota reduced below the default limits without warning.

Event notification

Occasionally, things happen to the cluster. When we expect them, we will notify cluster users in advance of major events via the cluster mailing list. Users should be automatically subscribed when their account is created.

Events that impact current and future use of the cluster may also be put in the message of the day (viewed at each login) for a short period. Versions of these messages will also be posted in the blog.

Also, occasionally unexpected events occur (such as frequent power outages during Florida's lightning season). To keep mailing list traffic low, the mailing list will not usually be notified of such unplanned events. However, all events will be posted in the I2Lab blog, currently at http://www.i2lab.ucf.edu/blog. You can view the blog either as a web page or as an RSS feed. Ask for assistance if you would like help finding an appropriate RSS reader.

Events will be posted in the blog as soon as reasonable (usually as soon as we know about planned events, and within an hour to half a day after we find out about unplanned events).

Note that while efforts will be made to prevent it while jobs are running, compute nodes may be rebooted without warning. Head nodes will only be rebooted when absolutely necessary. Most (but not all) head nodes are on backup power; most compute nodes are not. Jobs using the batch queue system usually surive a head node reboot.

Software installation

If you are the only user that will use a particular software package, you are welcome to install it in your home directory yourself.

If other users might use the software, or you need assistance in installing it, you can ask it to be installed in a system directory for you. Some system directories can be distributed to each compute node to improve performance (software must be in an RPM package to be distributed). If multiple users install the same large software package, they may be asked to switch to a shared version in a system directory.

Please note that the I2Lab can not be responsible for adherance to software licenses unless it was bought by the I2Lab. Installation support will be limited for software requiring license agreements to be signed.

If you are unsure if a software package is already installed, please ask. A partial list can be found at:

Note that many packages in the Rocks Bio-Roll and other common parallel software applications may be installed already.

Node sharing

Direct access to compute nodes is strongly discouraged. Jobs should be submitted through the batch queue system (currently Sun GridEngine) which also implements fair sharing policies. Jobs run outside of the batch queue system will be viewed as attempts to bypass the fair sharing policies and will be terminated without prejudice.

However, sometimes it is helpful to access a node your job is already running on for debugging purposes. As long as this access is not abused, it will remain open.

Users are also strongly discouraged from leaving open shells or idle processes on compute nodes when they are not currently running a job. Such idle shells will likely be automatically closed without notice.

If you have problems with another user's job disrupting or slowing your job down, please notify me.

Contacts

For contact information, or to ask questions, contact anyone on the I2Lab support page. If you are curious how the fairness policy is currently implemented, you may contact me to discuss it. The fairness policy will be adjusted based on usage patterns of each cluster, and will not be detailed here at this time as it is tweaked frequently. (Details can be partially discovered by examining SGE configuration.)