infocubein this case. Any of the following commands while on
rasor TJ CSL computer will allow you to connect to
infocubeyou will be placed into your Cluster home directory (
infocubeis a virtual machine and does not have nearly the amount of resources as the entire Cluster does, so do not run programs directly on infocube. Instead, you want to tell Slurm to launch a job
gpu. All nodes that are in the
gpupartition are in the
computepartition, but not vice versa. Nodes in the
gpupartition have GPUs installed which can be accessed through Slurm.
sinfo. You should get a table similar to this one:
idlemeans that that block of nodes is not currently in use, and will be immediately allocated to any job that requests resources.
allocmeans that the node is busy and will not be available for any other jobs until the job is complete
mixmeans that some of the cores within the node are allocated and others are free. Because this is annoying, it is good etiquette to allocate your jobs in multiples of full nodes (24 cores)
downmeans that the node cannot currently be used.
*that means those nodes cannot have jobs scheduled on them (useless nodes)
idle*, no jobs will be scheduled onto that node, even though it is marked as
squeue. You should see a table like this:
STstands for state. The two common states are
R, which means the job is currently running, and
PD, which stands for pending. If the job is running, the rightmost column displays which nodes the job is running on. If the job is pending, the rightmost column displays why the job is not yet running. In this example, job 884 is waiting for six nodes worth of resources because job 882 is running on all 12 of the available nodes.
infocube(the login node explained in the section above), so that your code is compiled in a similar environment to where it will be run. The login node should have all the necessary tools to do so, such as gcc, g++, and mpicc/mpixx. WARNING: compiling your program on a workstation/any other computer that is not part of the cluster and then transferring the generated executable over to the cluster WILL NOT WORK. This is called CROSS COMPILATION and it WILL NOT WORK. This is not a challenge, it is a statement of fact
man sallocto see them all), but the only one you need for most uses is
-n [number]which specifies how many cores you want to allocate. You can also specify a command simply by placing it after all command line options (ex:
salloc -n 4 echo "hello world"). This is currently the suggested way to run MPI jobs on the cluster. To run MPI jobs, first you must load the mpi module, as stated above (
module load mpi). After that, simply run
salloc -n [number of cores] mpiexec [your program]. Unfortunately, the displayed name of this job is, by default, just "mpiexec", which is not helpful for anyone. To give it a name, pass salloc (NOT mpirun)
srun -n (processes) (path_to_program), where
(processes)is the number of instances of your program that you want to run, and
(path_to_program)is, you guessed it, the path to the program you want to run. If your program is an MPI program, you should not use
srun, and instead use the
sallocmethod described above.
sacct. You will receive any output of your program to the console. For more resource options, run
man srunor use the official Slurm documentation.
sbatchallows you to create batch files which specify a job and the resources required for the job and submit that directly to Slurm, instead of passing all the options to
srun. Here's an example script, and assume you save it as
sbatch test.sh. This would tell Slurm to launch the program at
(path_to_program), and to launch 4 tasks, limit the maximum execution time to 30 minutes, and require that no more than two tasks run on a specific system. Here are some other examples: https://www.hpc2n.umu.se/batchsystem/examples_scripts.