HPC Downtime and Job Scheduling

When downtime is required, users are informed by email, and the system "Message-of-the-Day" also posts downtime schedules as well as other information.

Occasionally during downtime, a number of batch jobs may have to be removed. A separate email is sent to each user with a job in this state.

Job Scheduling

There are two different job scheduling systems in use. The SGI systems use Platform Computing (IBM) LSF scheduler, and the Mercury cluster uses PBS Torque/Maui from Adaptive Computing. All work is handled through these scheduling systems to effectively manage resources.

LSF summary

LSF is a workload management system. LSF uses a group of configurable queues that run each job based on a number of resource requirements of the job and availability of system resources.

A limit of 30 cpu minutes has been placed on all processes that start interactively during a session. When an interactive task reaches this limit it will be killed by the system. Jobs run via LSF are not affected by this limit and are controlled by the LSF queue definition.

Useful LSF files on Zeus in the directory: /usr/local/lsf/ there are several adobe acrobat (pdf) files that can be downloaded and printed for your use:

  • lsf_qrefcard_6.0.pdf | A quick reference card for normal LSF commands
  • lsf_using_6.0.pdf | The user guide for LSF
  • running_jobs.pdf | A tutorial on running jobs with LSF

PBS Torque/Maui

PBS Torque is the open source version of PBS Pro and is a standard in a large number of HPC environments. The Maui resource manager is the open source version of Adaptive's MOAB product. Together these programs manage scheduling and resource allocation across the entire cluster.

Details

Article ID: 67088
Created
Tue 6/4/19 12:32 PM
Modified
Tue 11/15/22 10:23 AM
Service Owner
Enterprise Systems & Operations

Related Services / Offerings (1)

Administered by ITCS and managed by the Departments of Chemistry and Biology, the high performance computing (HPC) lab offers ECU researchers with high-end computational power to apply to complex computing tasks.