Cluster Introduction

Purpose

The cluster was purchased by the Computer Systems Lab to serve the Parallel Computing and Computer Vision classes, but is available for usage by all TJ students and staff. Several senior research labs have expressed interest in using the cluster's resources for their own purposes. Academic jobs run on the cluster will receive priority allocation of resources, but non-academic jobs are accepted as well as long as they abide by the FCPS Acceptable Use Policy (Regulation 6410).

Specifications

The full cluster consists of 12 HPC cluster nodes, 40 Borg nodes, and 1 dedicated GPU node (zoidberg). This setup occupies almost 3 full racks in the server room. The CSL obtained the 40-node Borg cluster from NASA through an educational grant. The Borg nodes are named borg[1-40] consecutively and the HPC nodes are named hpc[1-12]. The login node is infocube

File Management

The login node and all cluster nodes use a file system that is separate from the ras1/ras2/workstation AFS directories. Instead all user files are stored in CephFS under the directory /cluster. For example the cluster files for 2021jdoe would be in the directory /cluster/2021jdoe.

When you log into infocube or any cluster node your cluster directory will be /cluster/<username>. In addition to the cluster nodes, you can access your cluster files on ras1/ras2, under the same directory.

Note: on ras your /cluster directory is not your default directory and is separate from your default directory files. You may use cp,mv, or another utility for moving files back and forth from anywhere on ras to your /cluster directory. This feature is currently unavailable on workstations. If you wish to copy files from a workstation over to your /cluster directory you may use sftp or scp to copy your files from the workstation over to the target cluster node.

It is safe to assume use of the keywords: "default directory", "home directory", "homedir", "cluster directory", etc while in the context of the Cluster refer to the directory at /cluster/<username>

Slurm

Speaking of SLURM (the Simple Linux Utility for Resource Management), Slurm is the utility used for job control and submission. Users log in to infocube, run some simple commands, specifying what they want to run, how many resources it should have, priority, and other optional arguments, and SLURM takes care of allocating cluster resources for them, and provides job accounting so users know the status of their jobs. More information at our Slurm docs.