ThaiSC

TARA User Guide

Running Jobs by SLURM Script

Submitting a job using a bash script has the following steps.

  1. Create a bash script using vi sbatch_script.sh or nano sbatch_script.sh
  2. Specify the details in the bash script according to the template provided below.
#!/bin/bash
#SBATCH -p compute                  # Specify the partition or machine type used [Compute/Memory/GPU]
#SBATCH -N 1  --ntasks-per-node=40   # Specify the number of nodes and the number of core per node
#SBATCH -t 00:10:00                 # Specifies the maximum time limit (hour: minute: second)
#SBATCH -J my_job                   # Specify the name of the Job
#SBATCH  -A tutorial                 # Specify Project account which will be received after Register ** If you do not specify in this section, the job will not be able to run.
#SBATCH                             # You can specify additional options.

module purge                        # unload all modules as they may have previously been loaded.
module load intel                   # Load the module that you want to use. This example is intel

srun  my_hpc_script                 # Run your program or executable code

List of sbatch options

Sbatch optionDetails
#SBATCH mail-type=NONEWhen the job is submitted or the job is completed, an email will be sent to inform you. If you don’t want to receive the email, you can specify this command in your script.
#SBATCH begin=16:00Specify the submit time at 16:00
#SBATCH begin=now+1Specify the submit time (1 hour later)
#SBATCH begin=2019-02-14T12:00:00Specify the date and time to submit
#SBATCH cpu-per-task=2Specify CPU resources per task
#SBATCH dependency=afterok:<jobID>Submit the job after the specified jobID is complete
SBATCH exclusiveSpecifies that the running node cannot be shared with other users
#SBATCH mem=16GSpecify memory to use in the node (Different units can be specified using the suffix [K|M|G|T] ) or specify –mem = MaxMemPerNode to use all memory of the node
#SBATCH mem-per-cpu=16Specify memory per CPU or specify –mem = MaxMemPerNode to use all memory of node
#SBATCH output=slurm-%A.outSpecify the file name and file extension of the calculation result
Note: “% A” means put jobID in the file name.
#SBATCH test-onlyTests “submit job”. This is not a real submission.

Example of adding additional options

#!/bin/bash
#SBATCH -p compute                  # Specify the partition or machine type used
#SBATCH -N 1  --ntasks-per-node=40   # Specify the number of nodes and the number of core per node 
#SBATCH -t 00:10:00                 # Specifies the maximum time limit (hour: minute: second) 
#SBATCH -J my_job                   # Specify the name of the Job 
#SBATCH -A tutorial                 # Specify Project account which will be received after Register ** If you do not specify in this section, the job will not be able to run.

#SBATCH  --mail-type=NONE            # Don't want to receive the email 
#SBATCH  --mem=16G                    # Specify 16Gb memory

module purge                        # Unload all module 
module load foss                    # Load the module that you want to use. This example is Foss. 

srun  my_hpc_script                 # Run your program or executable code 

Tips: CPU core per node= 40 ( tara-c-[001-060] ) and CPU core per node = 192 ( tara-m-[001-010] )

1. Specify values within the bash script and save them

2. Use sbatch command followed by your bash script name to submit your job to SLURM as shown in the example.

sbatch sbatch_script.sh

Running Specific applications

In this section, we discuss specialized programs, in this case Gaussian 16 and Python.

This is how to submit Gaussian jobs.

Gaussian is a computational chemistry software package for calculating chemical structures. There are Gaussian job submissions in the TARA HPC system as follows.

1. Prepare the Gaussian input file with the appropriate file name.
Note: There must be a last blank line.

Example Gaussian input for geometry optimization of H2O 

%mem=10GB
%nprocshared=40
%chk=h20.chk
# opt HF/6-31G(d)    

water energy    

0   1    
O  -0.464   0.177   0.0     
H  -0.464   1.137   0.0     
H   0.441  -0.143   0.0

2. Create the input file for submitting into TARA system by preparing a script for submitting a job (submitg16.sub)

#!/bin/bash -l
######## select compute node ########
#SBATCH -p compute                          # if you select compute node
#SBATCH -N 1  --ntasks-per-node=40          # specific number of nodes and task per node
#SBATCH -t 60:00:00                         # job time limit 
#SBATCH -J job_name                         # job name
#SBATCH -A tutorial                         # Specify Project account which will be received after Register ** If you do not specify in this section, the job will not be able to run. 

module purge                                # purge all module
module load intel                           # load intel module for MPI run
module load Gaussian                        #load gaussian version16 

FILENAME=job_name
WORKDIR=$SLURM_SUBMIT_DIR
######################################################

cd $WORKDIR

###### Identify Gaussian root ##############################
export GAUSS_SCRDIR=$SLURM_TMPDIR           # Specify location for temp file

rm -rf   /tarafs/.../g16/$USER/$SLURM_JOB_ID
mkdir -p /tarafs/.../g16/$USER/$SLURM_JOB_ID

cp $FILENAME.gjf $GAUSS_SCRDIR
cp $FILENAME.chk $GAUSS_SCRDIR
cd $GAUSS_SCRDIR

##run_GAUSSIAN  $FILENAME.com

g16 < $FILENAME.com >> $WORKDIR/$FILENAME.log.$SLURM_JOB_ID

cp *.chk $WORKDIR/$FILENAME.chk
cp $FILENAME.fch* $WORKDIR/
cp *.cu* $WORKDIR/
cd $WORKDIR
mv $FILENAME.log.$SLURM_JOB_ID $FILENAME.DONE.log

# Clean up scratch space

rm -rf $GAUSS_SCRDIR/

#----- End of g16 SLURMJOB ---------

3. Submit the job

$ sbatch submitg16.sub      #Submit job

$ squeue -t your_username   #Check queue job

The result of a Gaussian job is myjob.log, which is located in the same directory as the Submit job file.

Python

Example: Python code

1. Use the vi command to create a Python program file with the following details.

print ("Hello, world!")

2. Create a file for submission to the TARA system by preparing a script for submission. (submitpython.sub)

#!/bin/bash -l
#SBATCH -p compute                                # specific partition (compute, memory, gpu)
#SBATCH -N 1  --ntasks-per-node=1                 # specific number of nodes and task per node
#SBATCH -t 1:00:00                                # job time limit 
#SBATCH -J job_name                               # job name
#SBATCH -A tutorial                              # Specify Project account which will be received after Register ** If you do not specify in this section, the job will not be able to run.  

module purge                                      # purge all module
module load Python                                # load python module 

python myjob.py

3. Submit the job

$ sbatch  submitpython.sub      #Submit job

$ squeue -t your_username   #Check queue job

You can see the results of running in the .out file. The result is the following message.

Hello, world!

More details of sbatch can be found at https://slurm.schedmd.com/sbatch.html

Cancelling jobs

A job can be canceled using scancel command followed by the JobID.

$ scancel [JOBID]

For example,

$ scancel 1234

Show job status

squeue is the command to show the job status.

[tara@tara-frontend-1-node-ib ~]$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
2138   compute FeN3G_T2 snamungr PD       0:00      1 (Resources)
1866   compute diamond_ schinkan  R   11:06:07      1 tara-c-009
2040   compute diamond_ schinkan  R      26:51      3 tara-c-[002,033,057]
2039   compute diamond_ schinkan  R    2:16:35      1 tara-c-023

The details are as follows.

  • JobID is the job code.
  • Partition is the type of node.
  • Name is the job name.
  • User is the job user.
  • ST is the job status. For example, PD = Pending, R = Running.
  • Time is the time used to run the job.
  • Nodes are the number of node in the run.
  • The Node list is a list of nodes used to run: Compute node (tara-c-xxx), Memory node (tara-m-xxx), GPU node (tara-g-xxx).

Viewing Job Status by Specifying User Name

Use the squeue -u [user] command to view only the specified users

[tara@tara-frontend-1-node-ib ~]$ squeue -u tara
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             17986   compute Dimer_N2     tara  R 4-10:08:27      2 tara-c-[046,057]
             18082   compute N4-top-t     tara  R 4-01:05:51      2 tara-c-[036,059]
             20178   compute       sv     tara  R       0:40      1 tara-c-010

Viewing Job Status by Specifying Partition

Use the squeue -p [partition] command to view only the specified partition

[tara@tara-frontend-1-node-ib ~]$ squeue -p gpu
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             18267       gpu  voc_ssd     tara  R 2-00:13:12      1 tara-g-001

Viewing Job Status by Specifying Various Jobs Status

Use squeue -t PD command to view only pending jobs

[tara@tara-frontend-1-node-ib ~]$ squeue -t PD
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             20149    memory svaba_As     tara PD       0:00      1 (Priority)
             20150    memory svaba_As     tara PD       0:00      1 (Priority)
             20151    memory svaba_As     tara PD       0:00      1 (Priority)
             20152    memory svaba_As     tara PD       0:00      1 (Priority)
             20134    memory svaba_As     tara PD       0:00      1 (Resources)
             20153    memory svaba_As     tara PD       0:00      1 (Priority)
             20119   compute Hy_SiOBr     tara PD       0:00      1 (Resources)
             20120   compute Hy_SiOBr     tara PD       0:00      1 (Priority)
             20121   compute Hy_SiOH1     tara PD       0:00      1 (Priority)

Use squeue -t R command to view only running jobs

[tara@tara-frontend-1-node-ib ~]$ squeue -t R
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             19875    memory  16World     tara  R   19:00:11      1 tara-m-002
             19869    memory  22World     tara  R   19:01:18      1 tara-m-002
             19969    memory   16Mcep     tara  R   18:11:40      1 tara-m-002
             19971    memory   44Mcep     tara  R   18:10:14      1 tara-m-006
             19970    memory   22Mcep     tara  R   18:11:00      1 tara-m-006
             19978    memory  sat-1ht     tara  R   15:39:59      1 tara-m-010
             20167    memory cmd_kour     tara  R      44:24      1 tara-m-004
             19991    memory  sat-5ht     tara  R    5:55:27      2 tara-m-[002,004]
             18267       gpu  voc_ssd     tara  R 2-00:31:00      1 tara-g-001
             18679   compute rev10Ef_     tara  R 1-04:39:55      1 tara-c-039
             18678   compute rev20_Te     tara  R 1-04:41:04      1 tara-c-039
             18677   compute rev25_Te     tara  R 1-04:42:24      1 tara-c-039

Use scontrol show job [JOBID] command to view job details such as the amount of CPU, memory size, time spent, start time and end time.

Submitted batch job 36309
[tara@tara-frontend-1-node-ib ~]$ scontrol show job 36309
JobId=36309 JobName=test_sbatch
   UserId=tara(1997000023) GroupId=tara(1997000023) MCS_label=N/A
   Priority=58753 Nice=0 Account=thaisc QOS=thaisc
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=INVALID TimeLimit=00:10:00 TimeMin=N/A
   SubmitTime=2019-06-20T11:22:51 EligibleTime=2019-06-20T11:22:51
   AccrueTime=2019-06-20T11:22:51
   StartTime=2019-06-20T11:22:52 EndTime=2019-06-20T11:32:52 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-06-20T11:22:52
   Partition=compute AllocNode:Sid=tara-frontend-1-node-ib:30010
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=tara-c-037
   BatchHost=tara-c-037
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=4800M,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=4800M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Reservation=root_11
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/.../sbatch.sh
   WorkDir=/.../tara
   StdErr=/.../slurm-36309.out
   StdIn=/dev/null
   StdOut=/.../slurm-36309.out
   Power=