Euler instructions

Useful bash commands

Euler’s command line interface (shell) is called “bash”. The scicomp wiki gives a good overview over a list of bash commands. We just give a short list of useful applications of these commands:

  • Ctrl-p and Ctrl-n give previous and next commands in the command history.

  • Ctrl-r <expression> searches for <expression> in the command history.

  • Use cd for changing directories. E.g., cd nexus-e/Run_Nexuse to go to the directory where the main matlab script run_Nexuse.m lives.

    Note: avoid white space in directory or file names like the plague.

  • Use ls for listing the contents of the current directory

  • Use rm <filename> for deleting file with name filename

  • Use rm -r <foldername> for recursively deleting the contents of folder

  • Use pwd for printing the current working directory

  • grep <expression> <filename> prints all lines of file(s) filename in which expression appears. Use option -i for case-insensitive search.

  • <command1> | <command2> employs a “pipe” (|) to redirect output of command1 as input to command2. E.g., ls | grep <filename> sends the listing of the current directory contents to grep, which looks for lines in which filename appears. If this command does not return anything, filename does not exist in the current directory.

  • Wildcards: bash accepts several wildcards for file name or string completion. E.g.,

    • * stands for a sequence of characters of arbitrary length

    • ? stands for one character

    ls lsf.o14*, e.g., will list all files with names starting with lsf.o14.

  • Use scp for making secure (remote) copies of files and folders. E.g. enter

    scp ./run_Nexuse.m <username>@euler.ethz.ch:/cluster/home/<username>/nexus-e/Run_Nexuse/run_Nexuse.m

    To transfer a local copy of run_Nexuse.m to ~/nexus-e/Run_Nexuse/run_Nexuse.m on Euler. Issue this command on your local machine!

    This command may come in handy for scripting certain things but is unnecessary if you are happy to make all file transfers using FileZilla.

Euler specific commands

Account information

To access information about your account on Euler

  • lquota checks the amount of data and files you have on the cluster

  • busers -w shows resource usage

  • my_share_info returns your user group

Modules

Diverse commands exist for organizing and checking the modules that Euler has loaded in your environment. Generally, running Nexus-e should work fine if the modules listed under Setup are loaded. Here’s a list anyway (see scicomp wiki for more explanation around the commands).

  • module load new loads all new modules

  • module list shows currently loaded modules

  • module avail shows all available modules

  • module help <module_name> brief description

  • module show <module_name> what the module would do

  • module load <module_name> load a module

  • which icc check the compiler

  • module unload <module_name> unload module

Batch system: How to run Nexus-e

On Euler, users are asked to run large jobs using Euler’s batch system. Scicomp gives an extensive description of this system. Here we summarize what is useful for running Nexus-e on Euler.

Generally, commands like

bsub -n 36 -R "model==EPYC_7742" -R "rusage[mem=5180]" -W "10:00" matlab -r run_Nexuse_platform

will be used to run Nexus-e from folder nexus-e/Run_Nexuse_platform on Euler.

  • -W "10:00" gives Euler 10 hours to run the process

  • -R "rusage[mem=5180]" tells Euler to allocate 5180 MB of RAM per core (default is 1GB)

  • -R "model==EPYC_7742" tells Euler to use the EPYC_7742 nodes Available nodes:

    • XeonE5_2680v3 High memory nodes (24 cores, max memory 512 GB per node) - Recomended!

    • XeonGold_6150 High performance nodes (36 cores, max memory 192 GB per node)

    • EPYC_7742 AMD nodes (128 cores, max memory 512 GB per node) Note that requiring for specific nodes may imply longer queuing times.

  • -n 36 tells Euler to use 36 processors

  • matlab -r run_Nexuse tells Euler to run the script run_Nexuse using matlab

Other useful options are the following.

  • Add -nojvm to matlab command to prevent Java from being used.

    bsub [...] matlab -nojvm -r run_Nexuse

  • Add -nodisplay to matlab command to explicitly tell matlab that no graphical interface is available.

    bsub [...] matlab -nodisplay -r run_Nexuse

  • Set the output filename using -o <output_filename>. Default is lsf.o<JobID>.

  • Add -B and/or -N to be notified via email when your job begins and/or ends (reference). Also, you need to have a file .forward in your home directory containing your email address.

    bsub -B -N [...] matlab -r run_Nexuse

  • For parallel computation using OpenMP, the number of processor cores available needs to be specified using the environmental variable OMP_NUM_THREADS. Set this with the command

    export OMP_NUM_THREADS=<number_of_cores>

    before issuing your bsub ... command. Consider writing this to .bash_profile if you don’t want to repeat this each session.

  • To use job arrays for parallel calculation, use option -J "arrayname[1-10]" ./program [arguments]

    If some jobs fail, it’s possible to rerun only those:

    brequeue -e <JOBID>

Batch system: How to check the status of your jobs

  • bjobs lists jobs

  • bjobs -p lists only pending jobs

  • bjobs -l lists jobs with more details

  • bkill <JOBID> kills job with JOBID

  • bpeek <JOBID> checks the output of a particular running job

  • bbjobs <JOBID> checks resources used

  • bjobs -l -aff is another method for checking the resources used

Recommendations for running Nexus-e

Experience with resource usage

(1) [For final run] 2-day time resolution with convergence criterium 0.1 percent

For this, set tpRes = 2; and limDifference = 0.001 in run_Nexuse.m. The following command works fine.

bsub -n 36 -R "model==XeonGold_6150" -R "rusage[mem=5180]" -W "350:00" matlab -r run_Nexuse

(2) [For quick test] 8-day time resolution with convergence criterium 10 percent

For this, set tpRes = 8; and limDifference = 0.1 in run_Nexuse.m. The following command works fine.

bsub -n 36 -R "rusage[mem=2180]" -W "15:00" matlab -r run_Nexuse

(3) 8-day time resolution with convergence criterium 2 percent

For this, set tpRes = 8; iand limDifference = 0.02 in run_Nexuse.m. The following command works fine.

bsub -n 36 -R "rusage[mem=2180]" -W "30:00" matlab -r run_Nexuse

(4) 8-day time resolution with convergence criterium 0.1 percent

For this, set tpRes = 8; and limDifference = 0.001 in run_Nexuse.m. The following command works fine. (This is still being tested as of October 22 2020).

bsub -n 36 -R "rusage[mem=2180]" -W "60:00" matlab -r run_Nexuse

(5) [For benchmark] Run the benchmark script bench_Nexuse.m

As instructed in the setup procedure, you sometimes need to run the script bench_Nexuse.m to (re-)calibrate GemEl. This script runs rather quickly because it doesn’t run the entire energy-economic loop and has a low time resolution. The following command works fine.

bsub -n 36 -W "2:00" matlab -r bench_Nexuse

Finding your way around output on Euler

After you issue a batch job through bsub, Euler will respond by telling you what the jobID is. After completion, the ‘standard output’ of running Nexus-e will be written to lsf.jobID and can inspected there using the text editor of your choice.

While the job is running, however, standard output can be accessed by means of the command

bpeek

which in turn writes to standard output of your console. To more conveniently browse this, several options exist:

  • write the output of bpeek to a file:

    bpeek>yourfilename.txt

    and then look at it in the text editor of your choice.

  • use a pipe (‘|’) to find lines including certain ‘patterns’ with grep:

    bpeek | grep <pattern>

    for example,

    bpeek | grep ‘maximum difference’

    will print all lines that contain information about Gemel’s convergence criterion in given iterations.

  • use a pipe (‘|’) to display the output of bpeek using the ‘less’ command:

    bpeek | less

    This will bring up an interface that lets the user browse through the output of bpeek without using the mouse. Some basic functions of the interface:

    • press “Space” to go down one page,

    • press “q” to quit the “less” interface,

    • press “/” to enter a pattern to search for.

      • Pressing “n” (n for next) after having searched for a pattern jumps to the next instance of the pattern;

      • pressing “N” jumps to the previous.

    • A relative to “/” (forward search) is “?” (backward search).

    • “G” goes to the end of bpeek’s output, “g” to the beginning.

Finding stuff in lsf.o<JobID> files

  • grep -i <expression> lsf* searches for <expression> in all files starting with lsf. E.g.,

    grep -inH nexus_disagg_nuc50_oct20 lsf*

    searches for copies of database nexuse_disagg_nuc50 mentioned in lsf* files that where made on October 20. This may be helpful for identifying the copies of the database that can safely be removed (“dropped”) from the PSL server.

Viewing output

The output of Matlab is directed to files lsf.o<JobID>. To inspect this output, open it in your editor of choice.

  • Several editors are available on Euler, e.g.,

    • emacs

    • vim

    • nano

    nano may be a good choice if you don’t know emacs or vim (if you do know emacs or vim, you will have strong opinions on which one to use). Nano gives some on-screen instructions on basic key combinations. (E.g., ^G means type g while holding Ctrl pressed).

  • For better user experience, copy files to your local computer for viewing with a GUI editor (e.g., using FileZilla).