infosphere
, has been decommissioned and replaced with the new login node: infocube
. Running ssh infosphere
may result in a weird SSH error, but it is safe to ignore this error. Running ssh infocube
bypasses this SSH error altogether. infocube
in this case. Any of the following commands while on ras
or TJ CSL computer will allow you to connect to infocube
:infocube
you will be placed into your Cluster home directory (/cluster/<username>
). infocube
is a virtual machine and does not have nearly the amount of resources as the entire Cluster does, so do not run programs directly on infocube. Instead, you want to tell Slurm to launch a jobcompute
and gpu
. All nodes that are in the gpu
partition are in the compute
partition, but not vice versa. Nodes in the gpu
partition have GPUs installed which can be accessed through Slurm.sinfo
. You should get a table similar to this one:idle
means that that block of nodes is not currently in use, and will be immediately allocated to any job that requests resources.alloc
means that the node is busy and will not be available for any other jobs until the job is completemix
means that some of the cores within the node are allocated and others are free. Because this is annoying, it is good etiquette to allocate your jobs in multiples of full nodes (24 cores)down
means that the node cannot currently be used.*
that means those nodes cannot have jobs scheduled on them (useless nodes)idle*
, no jobs will be scheduled onto that node, even though it is marked as idle
.squeue
. You should see a table like this:ST
stands for state. The two common states are R
, which means the job is currently running, and PD
, which stands for pending. If the job is running, the rightmost column displays which nodes the job is running on. If the job is pending, the rightmost column displays why the job is not yet running. In this example, job 884 is waiting for six nodes worth of resources because job 882 is running on all 12 of the available nodes.infocube
(the login node explained in the section above), so that your code is compiled in a similar environment to where it will be run. The login node should have all the necessary tools to do so, such as gcc, g++, and mpicc/mpixx.
WARNING: compiling your program on a workstation/any other computer that is not part of the cluster and then transferring the generated executable over to the cluster WILL NOT WORK. This is called CROSS COMPILATION and it WILL NOT WORK. This is not a challenge, it is a statement of fact salloc
man salloc
to see them all), but the only one you need for most uses is -n [number]
which specifies how many cores you want to allocate. You can also specify a command simply by placing it after all command line options (ex: salloc -n 4 echo "hello world"
). This is currently the suggested way to run MPI jobs on the cluster. To run MPI jobs, first you must load the mpi module, as stated above (module load mpi
). After that, simply run salloc -n [number of cores] mpiexec [your program]
. Unfortunately, the displayed name of this job is, by default, just "mpiexec", which is not helpful for anyone. To give it a name, pass salloc (NOT mpirun) --job-name=[name]
srun
srun -n (processes) (path_to_program)
, where (processes)
is the number of instances of your program that you want to run, and (path_to_program)
is, you guessed it, the path to the program you want to run. If your program is an MPI program, you should not use srun
, and instead use the salloc
method described above.sacct
. You will receive any output of your program to the console. For more resource options, run man srun
or use the official Slurm documentation.sbatch
sbatch
allows you to create batch files which specify a job and the resources required for the job and submit that directly to Slurm, instead of passing all the options to srun
. Here's an example script, and assume you save it as test.sh
:sbatch test.sh
. This would tell Slurm to launch the program at (path_to_program)
, and to launch 4 tasks, limit the maximum execution time to 30 minutes, and require that no more than two tasks run on a specific system. Here are some other examples: https://www.hpc2n.umu.se/batchsystem/examples_scripts.​