Documentation

Obtaining an Account

To gain access to one of the Cy-Tera resources you must first apply for access. Please see here for more details on how to apply.
After your application has been accepted you should request an account on the allocated HPC resource(s) by completing the Request an account form. As “Project ID” put the project code (eg. lspre100s1) corresponding to your application submission. For the educational access projects as “Project ID” type “Educational Access”. You will then be contacted by the user support team to provide your public key.

How to generate a Public/Private key pair

Mac OS X/Linux/Unix Keypairs

Create a public and private key pair using the following command:
$ssh-keygen -t rsa

You will prompted for a filename for saving the private key. You should press Enter to use the default name: /home/username/.ssh/id_rsa
You will prompted for a password to protect your private key which you will also confirm. ALWAYS use such a password to protect your private key. ssh-keygen will then create a private and public key. The public key will have a .pub extension (e.g., id_rsa.pub)

Note: It is important to always attach a password to your ssh keys – this will be the password which allows access to use these keys.

Windows Keypairs

One can generate a keypair on Windows using Putty. More specifically PuTTygen will have to be installed, and this can be downloaded here.
Then you should follow the steps as explained and shown below:

  1. Click on Generate.
  2. Run your mouse over the blank area to generate some randomness.
  3. Save your key pair when done (first save your public key and then save your private key).

randomness

Note: It is important to always attach a password to your ssh keys – this will be the password which allows access to use these keys.

Accessing the clusters

To access a cluster you must have access to your private key. You must also know the hostname of the available clusters which are as follows:

  • Cy-Tera: login.cytera.cyi.ac.cy
  • Euclid: euclid.cyi.ac.cy
  • Prometheus: prometheus.cyi.ac.cy
  • Phi: phi.cytera.cyi.ac.cy
  • Post-processing node: post02.cyi.ac.cy
Accessing the clusters using Mac OS X/Linux/Unix

If your private key is stored in your .ssh folder in your home directory:
$ssh username@hostname

If the key is stored in another directory or in a usb:
$ssh –i /key_path/id_rsa username@hostname

Should you encounter a problem with your key permissions, go to the directory where the key is stored and do the following:
$chmod 600 .ssh/id_rsa

Accessing the clusters using Windows

With Windows, one can access the system using Putty. To download Putty click here.
Then, you should follow the steps as explained and shown below:

  1. Browse for and select your private key file.
  2. Enter login information.

putty 1putty 2

Transferring files

The two most common commands you can use for data transfers over SSH:

  • scp: for the full transfer of files and directories (only works fine for single files or directories of small/trivial size)
  • rsync: a software application which synchronizes files and directories from one location to another while minimizing data transfer as only the outdated or inexistent elements are transfered (practically required for lengthy complex transfers, which are more likely to be interrupted in the middle).
    Of both, normally the second approach should be preferred, as more generic; note that, both ensure a secure transfer of the data, within an encrypted tunnel.

There are also other alternative ways to transfer files such as bbcp. Before you can use the bbcp utility though, it must be installed on both the local and remote systems.

Windows users can transfer files to and from the clusters using WinSCP.

Using scp

scp or secure copy is probably the easiest of all the methods. The basic syntax is as follows:

$scp [-r] source_path destination_path

where the -r option copies the directories recursively.

Transfer from your local machine to the remote cluster home directory

Let’s assume you have a local directory ~/code/myfiles you want to transfer to the cluster, in your remote home directory:

$scp -r ~/code/myfiles username@hostname:

To transfer the local directory ~/code/myfiles to the directory ~/project:

$scp -r ~/code/myfiles username@hostname:project/.

Transfer from the remote cluster to your local machine

Let’s assume you want to transfer back the directory ~/project/code/myfiles to your local machine:

$scp -r username@hostname:project/code/myfiles /path/to/local/directory

See the scp(1) man page for more details.

Using rsync

An alternative to scp is rsync, which has the advantage of transferring only the files which differ between the source and the destination. This feature is often referred to as fast incremental file transfer. The typical syntax of rsync is similar to the one of scp:

$rsync -rv source_path destination_path

where the -v option enables the verbose mode and the -r option copies the directories recursively.

Transfer from your local machine to the remote cluster home directory

Coming back to the previous examples, let’s assume you have a local directory ~/code/myfiles you want to transfer to the cluster, in your remote home directory:

$rsync -rv ~/code/myfiles username@hostname:

To transfer the local directory ~/code/myfiles to the directory ~/project:

$rsync -rv ~/code/myfiles username@hostname:project/.

Transfer from the remote cluster to your local machine

Let’s assume you want to transfer back the directory ~/project/code/myfiles to your local machine:

$rsync -rv username@hostname:project/code/myfiles /path/to/local/directory

See the rsync(1) man page for more details.

Using WinSCP

WinSCP is a popular Secure File Transfer application (SFTP) for Windows computers which also includes support for SCP file transfers, and the ability to create secure tunnels via SSH.

The software can be downloaded from the following URL.

The installation of WinSCP is quite simple. Basically you need to click on the Next button several times leaving the default options.

In order to customize it to your likings you can choose between the Norton-Commander interface and the Windows Explorer-like interface. The main difference is that the first interface provides 2 panels (one with the local computer directory structure and the other with the remote) while the second shows only the remote.

Once you are ready with the installation, you can establish a secure connection from your local computer to the remote server as seen below. You just need to give the hostname of the cluster, your username and browse to find your private key:

winscp (1)

During the establishment of the connection you will be prompted for the pass phrase of your key.

Transfer from your local machine to the remote cluster

1. In the right panel, open the directory on the remote computer to which you want to upload the file.

2. In the left panel, click once on a local file name to select it. Note: It is best to remove spaces, capitalization, and extensions from filenames before uploading them.

3. Upload the file or folder by using the mouse to drag the file to the remote server panel, by pressing the F5 function key on your keyboard, or by right clicking the file you want to transfer and selecting Copy from the pop-up menu.

Transfer from the remote cluster to your local machine

1. In the left panel, open a directory folder or location on your computer to which you want to download the file.

2. In the right panel, click once on a file or directory folder name to select it.

3. Download the file or folder by using the mouse to drag the remote file to your local directory, by pressing the F5 function key on your keyboard, or by right clicking the file you want to transfer and selecting Copy from the pop-up menu.

winsc2

User Environment

After a successful login, you end into your personal home directory. The default shell is /bin/bash.

Environment Variables

An important component of a user’s environment is the set of environment variables. An environment variable contains a value that can be used for UNIX commands and tools. For each user, the following environment variables are defined:

  • $HOME defines the home directory of a user. Users should store their source code and build executables here.
  • $WORK defines the storage directory of a user within the project shared directory. Users must change to this directory in their batch scripts to run their jobs.
  • $SCRATCH defines a temporary storage directory for data to reside. Month old contents of $SCRATCH are purged during the monthly maintenance windows.

It should be noted that $HOME is limited in size and $WORK has a total maximum quota as allocated to each project.

Modules

Modules are used to provide a uniform mechanism for accessing different revisions of software such as application packages, compilers, communication libraries, tools, and math libraries. Modules manage environment variables such as PATH, LD_LIBRARY_PATH and MANPATH, enabling the use of application/library profiles and their dependencies.

Modules also facilitate the task of updating applications and provide a user-controllable mechanism for accessing software revisions and controlling combination of versions.

Useful module commands include the following:

Command Description
module avail list available modules
module load <name> load a specific module
module unload <name> unload a specific module
module list list loaded modules
module help <name> help on a specific module
module whatis <name> brief description of a specific module
module display <name> display changes by a given module
module purge unload all loaded modules

You can also use the $EB* variables if you refer to dependencies in other modules. For example if you need to refer to the OpenMPI path you don’t need to write the whole path, you just need to use the environment variable for that which is $EBROOTOPENMPI. You can use the “module show OpenMPI” command to list all available environment variables for OpenMPI.

Compiling

Your workflow may include compiling your own applications and libraries or third party software. All HPC systems offer a range of compiler suites. Additionally, a large number of scientific libraries have been pre-built and can be made available by loading modules.

Compilers

The following C/C++ and FORTRAN compilers are available:

C C++ Fortran
Intel icc icpc ifort
GNU gcc g++ gfortran
Toolchains

Compiler toolchains are basically a (set of) compilers together with a bunch of libraries that provide additional support that is commonly required to build software. In the world of High Performance Computing this usually consists of an library for MPI (inter-process communication over a network), BLAS/LAPACK (linear algebra routines) and FFT (Fast Fourier Transforms).

A full list of available toolchains can be found here.

It should be noted that most software available on HPC systems – the complete list of these can be found here, have been compiled using toolchains. It is thus important to choose the software needed which has been compiled with the appropriate toolchain (and toolchain version).

Running Applications

SLURM Scheduler

Simple Linux Utility for Resource Management (SLURM) is an open-source workload manager designed for Linux clusters of all sizes. SLURM is used by many of the world’s supercomputers and computer clusters. All systems in our Cy-Tera HPC Facility are using SLURM.

SLURM Commands
Job Submission

In order to create a resource allocation and launch tasks you need to submit a batch script. To submit a job, user can use the sbatch command.

sbatch job_script

Job deletion

To cancel a job use the scancel command.

scancel job_id

To cancel all running jobs of a specific user use the -u option.

scancel -u username

Queue status

To view information about jobs located in the SLURM scheduling queue use the squeue command.

squeue

  • JOBID: job ID
  • PARTITION: partition (queue)
  • NAME: job name
  • USER: username
  • ACCOUNT: user  project/group
  • ST: Job State/Status
    • R: Running
    • PD: PenDing
    • TO: TimedOut
    • S: Suspended
    • CD: Completed
    • CA: CAncelled
    • F: Failed
    • NF: Node Failure
  • TIME: time used by the job
  • TIMELIMIT: walltime requested
  • NODES: number of nodes requested
  • NODELIST: list of nodes occupied by the job

To view information about jobs of a specific user located in the SLURM scheduling queue use the -u option.

squeue -u username

Nodes and queues information

To view information about SLURM nodes and partitions use the sinfo command.

sinfo

Job information

To view detailed job information use the scontrol command as below.

scontrol show job job_id

SLURM Job Specifications

Submitting jobs to the scheduling system is probably the most important part of the scheduling system, and because you need to be able to express the requirements of your job so that it can be properly scheduled, is also the most complicated. For SLURM, the lines specifying the job requirements should begin with #SBATCH.

The following table translates some of the more commonly used job specifications.

Description Job Specification
Job Name –job-name=<job_name>

-J <job_name>

Partition/Queue –partition=<queue_name>

–p <queue_name>

Account/Project –acount=<account_name>

-A <account_name>

Number of nodes –nodes=<number_of_nodes>

-N <number_of_nodes>

Number of cores (tasks) per node –ntasks-per-node=<number_of_tasks>
Walltime Limit –time=<timelimit>

-t <timelimit>

Number of GPUs –gres=gpu:<number_of_gpus>
Memory requirements –mem=<memory_in_MB>
Standard Output FIle –output=<filename>

–o <filename>

 Standard Error File –error=<filename>

–e <filename>

 Combine stdout/stderr  use –o without -e
Email Address  –mail-user=<email_address>
Job Dependency  –dependency=<job_id>

-d <job_id>

SLURM Environment Variables

The table below shows some SLURM equivalent variables.

Environment Variable Description
$SLURM_JOBID Job ID
$SLURM_JOB_NAME Job Name
$SLURM_SUBMIT_DIR Submit directory
$SLURM_SUBMIT_HOST Submit host
$SLURM_JOB_NODELIST Node list
$SLURM_JOB_NUM_NODES Number of nodes allocated to job
$SLURM_CPUS_ON_NODE Number of cores per node
$SLURM_NTASKS_PER_NODE Number of tasks requested
Interactive Jobs

To run an interactive job in SLURM you should first create an allocation from the login node using the salloc command:

$salloc –N <number_of_nodes> -n <number_of_cores> -t <timelimit>

When you are finished running your program, you can release the allocation by exiting the shell:

$exit

Useful Commands

The following describes the function of a number of commands users may find useful when using available HPC systems.

  • qhist: qhist is a locally written command which summarizes the total usage of the user’s group and the usage of the other group members. See the output of qhist –help for details.
    To display your usage simply type qhist.

With qhist you can also view your group’s usage for a given time interval, just specifying the exact dates. For example, to check the usage between 1st of September 2013 to 31st of October 2013, one should type:

$qhist 2013-09-01 2013-10-31

  • ns: Stands for “Node Status” and displays the current status of cluster nodes.
    • Green coloured nodes identify busy nodes
    • White coloured nodes identify available nodes
    • Yellow coloured nodes identify offline nodes
    • Red coloured nodes identify nodes that are down
    • A “+” sign indicates that some cores of node are being utilised
    • A “*” sign indicates that all cores of a node are being utilised
  • du: Is a system command which estimates the disk usage of a given directory. It is best practise to use the “-sh” options to get the total usage of the given directories in a human readable format.
    $du -sh /path/to/directory

For more details on the du command consult the manual using “man du”.

  • df -h: Is a system command which estimates the disk usage of a filesystem.
    The df -h command alone will list the details of all available filesystems. On Cy-Tera, home directories and project directories are on different filesystems. Thus, to view the disk usage of either one, the relative path should be given.
Application Specific Examples
Running MPI Jobs

When having an MPI job first you need to compile your code. To do that, you have to load the MPI module you need and proceed with the compilation command. If for example you want to compile your code called hello.c:

$ module load OpenMPI/1.6.4-GCC-4.7.2
$ mpicc hello.c -o hello

Then, you have to create a script so as to submit the job to the queue. A sample job script (hello.sub) could be:

#!/bin/bash
#SBATCH --job-name=Hello
#SBATCH --nodes=2 # 2 nodes
#SBATCH --ntasks-per-node=12 # Number of tasks to be invoked on each node
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --error=job.%J.out
#SBATCH --output=job.%J.out
module load goolf/1.4.10
echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running on $SLURM_NPROCS processors."
echo "Job id is $SLURM_JOBID"
mpirun ./hello
echo "Program finished with exit code $? at: `date`"

Finally, you submit the job script to the queue:

$ sbatch hello.sub

Running Hybrid (OpenMP + MPI) Jobs

OpenMP spawns multiple threads, which are children of the MPI processes. Therefore, the default core binding option of mprirun/mpiexec, which is –bind-to-core must be changed to –bind-to-socket –by-socket, in order for the OpenMP threads to be able to run on all cores of the node.

Below is an example script which runs a hybrid programming job.

#!/bin/bash
#SBATCH --job-name=mixed-hello
#SBATCH -o mixed-hello.out
#SBATCH --nodes=2 --ntasks-per-node=
export NN=$SLURM_JOB_NUM_NODES
export NP=$(($SLURM_NTASKS_PER_NODE*$SLURM_JOB_NUM_NODES))
export OMP_NUM_THREADS=6
echo $NN $NP $OMP_NUM_THREADS
module load OpenMPI/1.6.4-GCC-4.7.2
mpirun --bind-to-socket --bysocket -np $NP ./hello

Running NAMD Jobs

Below is an example script which runs a NAMD job on GPU nodes.

#!/bin/bash
#SBATCH --job-name=namd_test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --error=job.%J.out
#SBATCH --output=job.%J.out
#SBATCH --gres=gpu:2
module load NAMD/2.9-goolfc-1.3.12
namd2=`which namd2`
echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on hosts: $SLURM_JOB_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running on $SLURM_NPROCS processors."
echo "Job id is $SLURM_JOBID"
charmrun ++mpiexec ++nodelist $SLURM_JOB_NODELIST +p12 $namd2 +idlepoll /path/to/stmv.namd
echo "Program finished with exit code $? at: `date`"