Getting started on the NCI supercomputer (Australian National Supercomputing Facility)

Apply for an account at http://nci.org.au

If you are in Daniel Price’s research group, request to join project “wk74”

Log in

Please read the general instructions for how to log in/out and copy files to/from a remote cluster

ssh -Y USER@gadi.nci.org.au

Configure your environment

First edit your .bashrc file in your favourite text editor:

vi ~/.bashrc

Mine has:

export SYSTEM=nci
export PROJECT=wk74
export OMP_STACKSIZE=512M
export OMP_NUM_THREADS=32
export SPLASH_DIR=/g/data/$PROJECT/splash
export PATH=$PATH:$SPLASH_DIR/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPLASH_DIR/giza/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/hdf5/1.10.5/lib
export MAXP=2000000
ulimit -s unlimited
source ~/.modules

If you are using phantom+mcfost, you will need the following lines:

export MCFOST_DIR=/g/data/$PROJECT/mcfost-src/
export MCFOST_AUTO_UPDATE=0
export MCFOST_INSTALL=/g/data/$PROJECT/mcfost/
export MCFOST_GIT=1
export MCFOST_NO_XGBOOST=1
export MCFOST_LIBS=/g/data/$PROJECT/mcfost
export MCFOST_UTILS=/g/data/$PROJECT/mcfost/utils
export HDF5ROOT=/apps/hdf5/1.10.5/lib

Then relevant modules in your .modules file:

vi ~/.modules

Mine contains:

module load intel-compiler
module load intel-mpi
module load git
module load hdf5

where the last line is needed for git’s large file storage (LFS) to work.

Finally, make a shortcut to the /g/data filesystem:

cd /g/data/$PROJECT
mkdir $USER
cd
ln -s /g/data/$PROJECT/$USER runs
cd runs
pwd -P

Get phantom

Clone a copy of phantom into your home directory:

$ cd $HOME
$ git clone https://github.com/danieljprice/phantom.git
$ cd phantom

and tell git who you are:

$ git config --global user.name "Joe Bloggs"
$ git config --global user.email "joe.bloggs@monash.edu"

Run a calculation

then make a subdirectory for the name of the calculation you want to run (e.g. tde):

$ cd; cd runs
$ mkdir tde
$ cd tde
$ ~/phantom/scripts/writemake.sh grtde > Makefile
$ make setup
$ make

then run phantomsetup to create your initial conditions:

$ ./phantomsetup tde
(just press enter to all the questions to get the default)

To run the code, you need to write a pbs script. You can get an example by typing “make qscript”:

$ make qscript INFILE=tde.in JOBNAME=myrun > run.q

should produce something like:

$ cat run.q
#!/bin/bash
## PBS Job Submission Script, created by "make qscript" Tue Mar 31 12:32:08 AEDT 2020
#PBS -l ncpus=48
#PBS -N myrun
#PBS -q normal
#PBS -P wk74
#PBS -o tde.in.pbsout
#PBS -j oe
#PBS -m e
#PBS -M daniel.price@monash.edu
#PBS -l walltime=48:00:00
#PBS -l mem=16G
#PBS -l storage=gdata/wk74
#PBS -l other=hyperthread
## phantom jobs can be restarted:
#PBS -r y

cd $PBS_O_WORKDIR
echo "PBS_O_WORKDIR is $PBS_O_WORKDIR"
echo "PBS_JOBNAME is $PBS_JOBNAME"
env | grep PBS
cat $PBS_NODEFILE > nodefile
echo "HOSTNAME = $HOSTNAME"
echo "HOSTTYPE = $HOSTTYPE"
echo Time is `date`
echo Directory is `pwd`

ulimit -s unlimited
export OMP_SCHEDULE="dynamic"
export OMP_NUM_THREADS=48
export OMP_STACKSIZE=1024m

echo "starting phantom run..."
export outfile=`grep logfile "tde.in" | sed "s/logfile =//g" | sed "s/\\!.*//g" | sed "s/\s//g"`
echo "writing output to $outfile"
./phantom tde.in >& $outfile

You can then proceed to submit the job to the queue using:

qsub run.q

Check the status using:

qstat -u $USER

How to keep your job running for more than 48 hours

Often you will want to keep your calculation going for longer than the 48-hour maximum queue limit. To achieve this you can just submit another job with the same script that depends on completion of the previous job

First find out the job id of the job you have already submitted:

$ qstat
Job id                 Name             User              Time Use S Queue
---------------------  ---------------- ----------------  -------- - -----
18780261.gadi-pbs      croc             abc123            402:48:0 R normal-exec

Then submit another job that depends on this one:

qsub -W depend=afterany:18780261 run.q

The job will remain in the queue until the previous job completes. Then when the new job starts phantom will just carry on where it left off.

A more sophisticated version of the above can be achieved by generating your PBS script with PBSRESUBMIT=yes:

make qscript INFILE=tde.in PBSRESUBMIT=yes > run.q

you can check the details of this using:

cat run.q

and submit your script using:

qsub -v NJOBS=10 run.q

which will automagically submit 10 jobs to the queue, each depending on completion of the previous job.

how to not annoy everybody else

Do not fill the disk quota! Use a mix of small and full dumps where possible and set dtmax to a reasonable value to avoid generating large numbers of unnecessary large files.

For how to move the results of your calculations off gadi see here

how to use splash to make movies without your job getting killed

If you try to make a sequence of images using splash on the login node (e.g. by typing /png or file.png at the device prompt), your job will get killed due to the runtime limits:

Graphics device/type (? to see list, default /xw):/png
...
Killed

A simple workaround for this is to launch N instances of splash using a bash loop:

$ for x in dump_0*; do echo $x; done

Then replace “echo $x” with the relevant splash command:

$ for x in dump_0*; do splash -r 6 -dev $x.png $x; done

If you still get prompts that need answers you can follow the procedure here <https://splash-viz.readthedocs.io/en/latest/other.html#reading-processing-data-into-images-without-having-to-answer-prompts>, or simply list the answers to the prompts in a file (here called answers.txt) and use:

$ for x in dump_0*; do splash -r 6 -dev $x.png $x < answers.txt; done

this way each process is short and your movie-making can proceed without getting killed.

more info

See general instructions for how to log in/out and copy files to/from a remote cluster

For more information on the actual machine read the userguide