Body
When downtime is required, users are informed by email, and the system "Message-of-the-Day" also posts downtime schedules as well as other information.
Occasionally during downtime, a number of batch jobs may have to be removed. A separate email is sent to each user with a job in this state.
Job Scheduling
There are two different job scheduling systems in use. The SGI systems use Platform Computing (IBM) LSF scheduler, and the Mercury cluster uses PBS Torque/Maui from Adaptive Computing. All work is handled through these scheduling systems to effectively manage resources.
LSF summary
LSF is a workload management system. LSF uses a group of configurable queues that run each job based on a number of resource requirements of the job and availability of system resources.
A limit of 30 cpu minutes has been placed on all processes that start interactively during a session. When an interactive task reaches this limit it will be killed by the system. Jobs run via LSF are not affected by this limit and are controlled by the LSF queue definition.
Useful LSF files on Zeus in the directory: /usr/local/lsf/ there are several adobe acrobat (pdf) files that can be downloaded and printed for your use:
- lsf_qrefcard_6.0.pdf | A quick reference card for normal LSF commands
- lsf_using_6.0.pdf | The user guide for LSF
- running_jobs.pdf | A tutorial on running jobs with LSF
PBS Torque/Maui
PBS Torque is the open source version of PBS Pro and is a standard in a large number of HPC environments. The Maui resource manager is the open source version of Adaptive's MOAB product. Together these programs manage scheduling and resource allocation across the entire cluster.