The cluster was purchased by the Computer Systems Lab to serve the Parallel Computing and Computer Vision classes, but is available for usage by all TJ students and staff. Several senior research labs have expressed interest in using the cluster's resources for their own purposes. Academic jobs run on the cluster will receive priority allocation of resources, but non-academic jobs are accepted as well as long as they abide by the FCPS Acceptable Use Policy (Regulation 6410).
The full cluster consists of 12 HPC cluster nodes, 40 Borg nodes, and 1 dedicated GPU node (zoidberg). This setup occupies almost 3 full racks in the server room. The CSL obtained the 40-node Borg cluster from NASA through an educational grant. The Borg nodes are named borg[1-40] consecutively and the HPC nodes are named hpc[1-12]. The login node is
The login node and all cluster nodes use a file system that is separate from the
ras2/workstation AFS directories. Instead all user files are stored in CephFS under the directory
/cluster. For example the cluster files for
2021jdoe would be in the directory
When you log into
infocube or any cluster node your cluster directory will be
/cluster/<username>. In addition to the cluster nodes, you can access your cluster files on
ras2, under the same directory.
/cluster directory is not your default directory and is separate from your default directory files. You may use
mv, or another utility for moving files back and forth from anywhere on
ras to your
/cluster directory. This feature is currently unavailable on workstations. If you wish to copy files from a workstation over to your
/cluster directory you may use
scp to copy your files from the workstation over to the target cluster node.
It is safe to assume use of the keywords: "default directory", "home directory", "homedir", "cluster directory", etc while in the context of the Cluster refer to the directory at
Speaking of SLURM (the Simple Linux Utility for Resource Management), Slurm is the utility used for job control and submission. Users log in to
infocube, run some simple commands, specifying what they want to run, how many resources it should have, priority, and other optional arguments, and SLURM takes care of allocating cluster resources for them, and provides job accounting so users know the status of their jobs. More information at our Slurm docs.