Linux Cluster (rcluster)
Running Jobs on the rcluster
- Using the Batch Queues
- Batch Queues on the rcluster
- LSF Usage Information
- Submitting a Batch Job to the Queue
- Checking the Status of Jobs
- Files Created at Job Start
- Canceling/Removing a Job
- Receiving an Email when Job Terminates
- Runchaining Jobs
- Running an Interactive Job
Using the Batch Queues
Jobs of over ten (10) minutes duration must be submitted to the queues, not run on the login node (rcluster.rcc.uga.edu). Processes that use too much CPU or RAM on the headnode may be terminated by RCC staff, or automatically, in order to keep the cluster running properly. Graphical front ends to programs, programming tools, etc. will not be terminated.
The queueing system being used on the rcluster is Platform LSF.
Batch Queues on the rcluster
The rcluster compute nodes have 2 processors each. Some of the nodes have 2 single-core, some have 2 dual-core, and some have 2 quad-core processors, which means these nodes behave as though they had 2, 4, and 8 CPUs each, respectively. Because the rcluster is comprised of both AMD Opterons and Intel quad-core Xeons, code compiled on one type of node might not be optimized to run on the other type. For more information, please refer to Code Compilation on the rcluster.
The queue names specify the type of node to which the job will be submitted to and the maximum runtime of the job. In general, the queue names fall into the following categories:
Queue names beginning with: |
Submit jobs to: |
Allowed users |
s |
single-core Opterons |
all |
d |
dual-core Opterons |
all |
r |
either single- or dual-core Opterons |
all |
q |
quad-core Xeons |
all |
iob-s |
single-core Opterons |
IOB members |
iob-q |
quad-core Xeons |
IOB members |
stat-d |
dual-core Opterons |
Statistics dept |
- Multi-thread jobs submitted to single-core nodes (that is, queue names starting with "s" or "iob-s") can have two threads, those submitted to dual-core nodes (that is, queue names starting with "d" or "stat-d" ) can have up to 4 threads, and those submitted to quad-core nodes (that is, queue names starting with "q" or "iob-q" ) can have up to 8 threads.
- A job might have slightly different performance on single-core and dual-core processors. Therefore, for better load balance, we recommend that parallel MPI jobs be sent to the queues that target specifically either single-core machines or dual-core machines, and not to the queues whose names start with "r".
The batch queue can be used for serial jobs (that is, jobs that require only one CPU core) and for parallel jobs. The form of a queue name indicates how many processing cores (CPU cores) it is limited to and the run time limit. For example, the queue r4-24h has a limit of 4 CPU cores and 24 hours of run time per core. However, by default, a job submitted to this queue will only have 1 CPU core assigned to it. To request more cores, please refer to Submitting a Batch Job to the Queue below. To submit a job to the resource, first determine your CPU core number and time requirements. This will determine which queue you need.
Here are more examples of queue names:
r1-24h |
One CPU core, maximum run time of 24h, sends job to either single-core or dual-core Opterons. |
r1-96h |
One CPU core, maximum run time of 96h, sends job to either single-core or dual-core Opterons. |
r1-10d |
One CPU core, maximum run time of 10 days, sends job to either single-core or dual-core Opterons. |
r4-24h |
Up to four CPU cores, maximum run time of 24h per core, sends job to either single-core or dual-core Opterons. |
s4-24h |
Up to four CPU cores, maximum run time of 24h per core, sends job to single-core Opterons. |
d4-24h |
Up to four CPU cores, maximum run time of 24h per core, sends job to dual-core Opterons. |
q1-24h |
One CPU core, maximum run time of 24h, sends job to quad-core Xeons. |
iob-s16-10d |
Up to 16 CPU cores, maximum run time of 10 days, sends job to single-core Opterons. For IOB's associate members only. |
iob-s32-10d |
Up to 32 CPU cores, maximum run time of 10 days, sends job to single-core Opterons. For IOB's full members only. |
iob-q32-10d |
Up to 32 CPU cores, maximum run time of 10 days, sends job to quad-core Xeons. For IOB's full members only. |
stat-d16-10d |
Up to 16 CPU cores, maximum run time of 10 days, sends job to dual-core Opterons. For Statistics Dept. members only. |
For a list of all valid queue names, please use the command queuenames from a rcluster shell prompt.
We recommend that users checkpoint their codes whenever possible to avoid losing valuable compute time if the system goes down before a job is completed. A long job that can be checkpointed can be run as a sequence of shorter jobs, which can be automatically submitted to the queue as described below in the Runchaining Jobs section. If you cannot fit your job within the established processor and runtime limits, please let us know.
LSF Usage Information
These are the common LSF commands:
bsub |
Submit a job to the queue |
bkill |
Cancel a queued or running job |
bhold |
Place a queued job on hold |
bjobs |
Check the status of queued and running jobs |
bqueues |
List all valid queue names |
The preferred way to submit a batch job to the queue is to use the bsub command to submit a job submission shell script. The syntax of the bsub command is:
wherebsub -n nprocs -q queuename -o stdout -e stderr ./shellscriptname
nprocs is the number of CPU cores (not required for serial jobs)
queuename is the name of the batch queue
stdout is the name of the file where the standard output is stored
stderr is the name of the file where the standard error is stored
shellscriptname is the name of the job submission shell script file
Examples:
1.To submit a serial job with script sub.sh to the r1-24h batch queue and have the standard output and error go to test.jobid.out and test.jobid.err, respectively, use
bsub -q r1-24h -o test.%J.out
-e test.%J.err ./sub.sh
2.To submit a parallel job (which uses 4 CPU cores) with script subp.sh to the r4-24h batch queue and have the standard output and error go to test.jobid.out and test.jobid.err, respectively, use
bsub -n 4 -q r4-24h -o test.%J.out
-e test.%J.err ./subp.sh
IMPORTANT NOTES:
-o test.%J.out -e test.%J.err in
the submission command), the standard output and error of the
job will be sent to you by email.chmod u+x sub.sh
Example of job submission shell scripts (sub.sh):
In the examples below, the executable name will be called myprog and
it requires input parameters to be piped in. The input parameters are in a file
called myin and the output data will be stored in a file called myout.
The working_directory is the path to your working directory (e.g., it
could be /home/labname/username/subdir or /scratch/username/subdir )
To run a serial job:
#!/bin/csh
cd working_directory
time ./myprog < myin > myout
To run a parallel MPI job using 4 processors (csh shell):
#!/bin/csh
cd working_directory
echo $LSB_HOSTS
cat /dev/null > mlist.$$
foreach variable ($LSB_HOSTS)
echo $variable >> mlist.$$
end
mpirun -np 4 -machinefile mlist.$$ ./myprog < myin > myout
rm -f mlist.$$
To run a parallel MPI job using 4 processors (bash shell):
#!/bin/bash
cd working_directory
echo $LSB_HOSTS
cat /dev/null > mlist.$$
for variable in $LSB_HOSTS; do
echo $variable >> mlist.$$
done
mpirun -np 4 -machinefile mlist.$$ ./myprog < myin > myout
rm -f mlist.$$
To run a parallel OpenMP job using 2 threads:
#!/bin/csh
cd working_directory
setenv OMP_NUM_THREADS 2
./myprog < myin > myout
NOTE: Do NOT put the job into the background with a '&' in the shell script. This will confuse the queueing system.
The file myin in the examples above is only necessary if your program requires standard input data and the file myout is only necessary if you want the standard output data (if any) to be stored in a separate file instead of the standard output file of the batch job (test.jobid.out in the example above). If your program does not require one or both of these files, you have to remove the corresponding piping symbols ( < and/or > ) in the last line of the scripts above.
MORE IMPORTANT NOTES:
1. MPI jobs executed with mpirun have to use the -machinefile option as shown in the examples above, otherwise your mpi job will not use the processors assigned to it by the queueing system. Using the script above for MPI jobs, a file called mlist.xxxxx containing a list of processors assigned to your job will be generated when your job starts running and it will be deleted when your job is done. The processors used for your job will be listed in the stdout.
2.When running threaded applications, please add the
bsub option
-R "span[hosts=1]" to ensure that all processors
assigned to your job (up to 4 when running on dual core machines and up to 2 when
running on single core ones) are on the same machine. Without this bsub option,
LSF might assign processors on different machines to your job.
Checking the Status of Jobs
Use the bjobs command to check the status of jobs:
bjobs [-u username]
[-l] [jobid]
where username is the user whose jobs you want to check and jobid is the JOBID of a specific job. The -l option gives long output, with detailed information about the job(s).
For example:bjobs -u
all |
shows all the jobs in the pool |
bjobs -u
johndoe |
shows all jobs for user johndoe |
bjobs -l
10407 |
gives detailed information about the job with JOBID 10407 |
Files Created at Job Start
If you submit your job with the -o mystdout -e mystderr options,
then the files mystdout and mystderr will be
created when your job starts running, unless they already exist.
In the latter case, the stdout and stderr of the job will be
appended to the corresponding files. If you would like to have the jobid number
incorporated into the stdout and stderr file names, use the special character %J
in these file names.
If the -o and -e options
are not specified at job submission, the stdout and stderr of
the job will be sent to you by email to your rcluster account and rcluster
will automatically forward it to the email address that you listed
when you requested your rcluster account (for example, your ugamail or departmental account). The sender of the email is
LSF. You might want to check whether your email server flags such messages
as spam and filter them out. To ensure that this does not happen, you might
want to whitelist messages sent by LSF.
Canceling/Removing a Job
Use the bkill command to cancel/remove a job from the job pool:
bkill [-u username] jobid [jobid]
For example:
bkill 10408 |
cancels your job with JOBID 10408 |
bkill 10408
10409 |
cancels your jobs with JOBIDs 10408 and 10409 |
bkill -u
your_user_id |
cancels all jobs you have in the queue |
Receiving an Email when Job Terminates
When you submit a batch job with bsub without the -o and -e options, you will receive the standard output and standard error of the job by email when the job terminates (whether it completes successfully or not). You can add the bsub option -N to have the standard output of the LSF job (not of the application) sent to you when the job terminates. The standard output of the application and the standard error of the job can still be written to files specified by the -o and -e options, respectively. For example:
bsub -n 4 -q r4-24h -o out.%J -e err.%J -N ./sub.sh
The 4 processor job running on the r4-24h queue will write the standard output of the application in the file out.jobid, write the standard error of the job in the file err.jobid, and it will send the standard output of the batch job (exit code, CPU time used, node used, etc) to the user's preferred email address.
Runchaining Jobs
We have found that a common need is to be able to run the same
job over and over. For instance when you need to do a large number
of iterations, you run so many and write in a data set the information
needed to restart the job where it left off. When the job is
restarted it reads the restart information and continues where
the previous execution left off.
To have one job automatically submit the next one once it finishes,
you can add the following lines at the end of your job submission
script:
bsub -n nprocs -q queuename -o stdout -e
stderr ./next_script_name
exit
Example: sub1.sh
In the examples below we assume that the executable myprog does not require any standard input. The working directory is assumed to be /home/labname/username/subdirectory.
#!/bin/csh
cd /home/labname/username/suddirectory
time ./myprog
bsub -q r1-24h -o sub.%J.out -e sub.%J.err ./sub2.sh
exit
Parallel job using csh (tcsh) :
#!/bin/csh
cd /home/labname/username/subdirectory
echo $LSB_HOSTS
cat /dev/null > mlist.$$
foreach variable ($LSB_HOSTS)
echo $variable >> mlist.$$
end
mpirun -np 4 -machinefile mlist.$$ ./myprog
rm -f mlist.$$
bsub -n 4 -q r4-24h -o sub.%J.out -e sub.%J.err ./sub2.sh
exit
First the script sub1.sh is submitted to the queue. Once it finishes running, it automatically submits script sub2.sh to the queue. This script can in turn submit sub3.sh to the queue when it completes, and so on. For this procedure, the user can prepare a sequence of scripts, which will then be submitted one at a time to the queue and run in sequence. Alternatively, the script sub1.sh can resubmit itself back to the queue once it finishes running. This would create an "infinite loop", a situation that is not recommended. To break the infinite loop, the user can set some termination rules for the job resubmission process.
Example of a termination rule:
One way to break out of an infinite job resubmission loop is to have the code generate a file when the program finally "converges" (or when it completes a predetermined number of steps, for example). Let us call this file finalresults.txt. The job submission script sub.sh checks whether the file finalresults.txt exists. If it does not, then the script sub.sh is submitted to the queue again, otherwise the script simply exits and the resubmission chain is terminated. A simple script sub.sh that accomplishes this is the following:
Serial job using csh (tcsh):
#!/bin/csh
cd /home/labname/username/subdirectory
time ./myprogram
if ( ! -e finalresults.txt ) then
bsub -q r1-24h -o mystdout -e mystderr ./sub.sh
endif
exit
Serial job using ksh (bash):
#!/bin/ksh
cd /home/labname/username/subdirectory
time ./myprogram
if [ ! -f finalresults.txt ]
then
bsub -q r1-24h -o mystdout -e mystderr ./sub.sh
fi
exit
Running an Interactive Job
We have set aside the following nodes for interactive jobs:
- One dual-processor dual-core Opteron node(4 CPU cores) called inter1.
- One dual-processor quad-core Xeon node (8 CPU cores) called inter2.
These nodes are not part of the queueing system. To access these nodes, first login to rcluster.rcc.uga.edu and from there use ssh to connect to inter1 or to inter2.
For example, to connect to inter1:
rcluster> ssh inter1
Your prompt on inter1 or inter2 will not read inter1 or inter2, it will read for example compute-2-12, or a similar name.
A single CPU executable (a.out) can be run on inter1 and inter2 as follows:
compute-2-12> ./a.out
Or run the code using 'nohup' in order to be able to logout without
interrupting the running job:
compute-2-12> nohup ./a.out &
To run a parallel MPI job interactively, first you need to create a file (for example, call it host.list) with the word 'inter1' (without the quotes) in it (or 'inter2' if you are running it on inter2 instead), repeated multiple times in a column (the number of times that it is repeated should be equal to the number of CPU cores you want to use for the MPI program). That is, the contents of host.list will be for example
inter1
inter1
inter1
inter1
Put this file (host.list) in your working directory and then run the MPI program a.out as follows (e.g. using 4 CPU cores):
compute-2-12> mpirun -np 4 -machinefile host.list ./a.outcompute-2-12> nohup mpirun -np 4 -machinefile host.list ./a.out &Because inter1 has a total of 4 CPU cores, users should not run parallel jobs that use more than 4 CPU cores or threads on it. Similarly, inter2 has a total of 8 CPU cores and parallel jobs on it should not use more than 8 CPU cores or threads.
These nodes should only be used for short jobs (for example, for code compilation and debugging purposes) and for those that cannot be run on the batch queueing system (for example, if the job requires an X windows front-end). The load of the node can be monitored using top or w.

