Cluster Introduction
Purpose
The cluster was purchased by the Computer Systems Lab to serve the Parallel Computing and Computer Vision classes, but is available for usage by all TJ students and staff. Several senior research labs have expressed interest in using the cluster's resources for their own purposes. Academic jobs run on the cluster will receive priority allocation of resources, but non-academic jobs are accepted as well as long as they abide by the FCPS Acceptable Use Policy (Regulation 6410).
Specifications
The full cluster consists of 12 HPC cluster nodes, 40 Borg nodes, and 1 dedicated GPU node (zoidberg). This setup occupies almost 3 full racks in the server room. The CSL obtained the 40-node Borg cluster from NASA through an educational grant. The Borg nodes are named borg[1-40] consecutively and the HPC nodes are named hpc[1-12]. The login node is infocube
Cluster Nodes
The recommended method of running jobs on the Cluster is through Slurm
All nodes on the cluster are directly accessible through ssh
as well as Slurm. However, due to unfinished maintenance caused by COVID-19, some nodes are inaccessible indefinitely.
Available Cluster Nodes
Format: <prefix>[sequence]. (e.g borg[1-3] means borg1
, borg2
, and borg3
are available)
borg[1-3], borg[6-31], borg40
hpc[1-6], hpc[8-12], zoidberg
snowy
Unavailable Cluster Nodes
borg[4-5], borg[32-38]
hpc7
duke
An always up-to-date version of this list can be viewed by running sinfo
on infocube
or an available node.
SSH Access
You can directly access any available cluster node through ssh
. To ssh
into a cluster node:
ssh
intoremote.tjhsst.edu
using your TJCSL username and passwordssh 2021abagali1@remote.tjhsst.edu
ssh
into a cluster nodessh borg1
(or any available node)
File Management
The login node and all cluster nodes use a file system that is separate from the ras1
/ras2
/workstation AFS directories. Instead all user files are stored in CephFS under the directory /cluster
. For example the cluster files for 2021jdoe
would be in the directory /cluster/2021jdoe
.
When you log into infocube
or any cluster node your cluster directory will be /cluster/<username>
. In addition to the cluster nodes, you can access your cluster files on ras1
/ras2
, under the same directory.
Note: on ras
your /cluster
directory is not your default directory and is separate from your default directory files. You may use cp
,mv
, or another utility for moving files back and forth from anywhere on ras
to your /cluster
directory. This feature is currently unavailable on workstations. If you wish to copy files from a workstation over to your /cluster
directory you may use sftp
or scp
to copy your files from the workstation over to the target cluster node.
It is safe to assume use of the keywords: "default directory", "home directory", "homedir", "cluster directory", etc while in the context of the Cluster refer to the directory at /cluster/<username>
Slurm
Speaking of SLURM (the Simple Linux Utility for Resource Management), Slurm is the utility used for job control and submission. Users log in to infocube
, run some simple commands, specifying what they want to run, how many resources it should have, priority, and other optional arguments, and SLURM takes care of allocating cluster resources for them, and provides job accounting so users know the status of their jobs. More information at our Slurm docs.
Last updated