Euler instructions
Useful bash commands
Euler’s command line interface (shell) is called “bash”. The scicomp wiki gives a good overview over a list of bash commands. We just give a short list of useful applications of these commands:
Ctrl-pandCtrl-ngive previous and next commands in the command history.Ctrl-r <expression>searches for<expression>in the command history.Use
cdfor changing directories. E.g.,cd nexus-e/Run_Nexuseto go to the directory where the main matlab scriptrun_Nexuse.mlives.Note: avoid white space in directory or file names like the plague.
Use
lsfor listing the contents of the current directoryUse
rm <filename>for deleting file with name filenameUse
rm -r <foldername>for recursively deleting the contents of folderUse
pwdfor printing the current working directorygrep <expression> <filename>prints all lines of file(s)filenamein whichexpressionappears. Use option-ifor case-insensitive search.<command1> | <command2>employs a “pipe” (|) to redirect output of command1 as input to command2. E.g.,ls | grep <filename>sends the listing of the current directory contents to grep, which looks for lines in which filename appears. If this command does not return anything, filename does not exist in the current directory.Wildcards: bash accepts several wildcards for file name or string completion. E.g.,
*stands for a sequence of characters of arbitrary length?stands for one character
ls lsf.o14*, e.g., will list all files with names starting with lsf.o14.Use
scpfor making secure (remote) copies of files and folders. E.g. enterscp ./run_Nexuse.m <username>@euler.ethz.ch:/cluster/home/<username>/nexus-e/Run_Nexuse/run_Nexuse.mTo transfer a local copy of
run_Nexuse.mto~/nexus-e/Run_Nexuse/run_Nexuse.mon Euler. Issue this command on your local machine!This command may come in handy for scripting certain things but is unnecessary if you are happy to make all file transfers using FileZilla.
Euler specific commands
Account information
To access information about your account on Euler
lquotachecks the amount of data and files you have on the clusterbusers -wshows resource usagemy_share_inforeturns your user group
Modules
Diverse commands exist for organizing and checking the modules that Euler has loaded in your environment. Generally, running Nexus-e should work fine if the modules listed under Setup are loaded. Here’s a list anyway (see scicomp wiki for more explanation around the commands).
module load newloads all new modulesmodule listshows currently loaded modulesmodule availshows all available modulesmodule help <module_name>brief descriptionmodule show <module_name>what the module would domodule load <module_name>load a modulewhich icccheck the compilermodule unload <module_name>unload module
Batch system: How to run Nexus-e
On Euler, users are asked to run large jobs using Euler’s batch system. Scicomp gives an extensive description of this system. Here we summarize what is useful for running Nexus-e on Euler.
Generally, commands like
bsub -n 36 -R "model==EPYC_7742" -R "rusage[mem=5180]" -W "10:00" matlab -r run_Nexuse_platform
will be used to run Nexus-e from folder nexus-e/Run_Nexuse_platform on Euler.
-W "10:00"gives Euler 10 hours to run the process-R "rusage[mem=5180]"tells Euler to allocate 5180 MB of RAM per core (default is 1GB)-R "model==EPYC_7742"tells Euler to use the EPYC_7742 nodes Available nodes:XeonE5_2680v3 High memory nodes (24 cores, max memory 512 GB per node) - Recomended!
XeonGold_6150 High performance nodes (36 cores, max memory 192 GB per node)
EPYC_7742 AMD nodes (128 cores, max memory 512 GB per node) Note that requiring for specific nodes may imply longer queuing times.
-n 36tells Euler to use 36 processorsmatlab -r run_Nexusetells Euler to run the script run_Nexuse using matlab
Other useful options are the following.
Add
-nojvmto matlab command to prevent Java from being used.bsub [...] matlab -nojvm -r run_NexuseAdd
-nodisplayto matlab command to explicitly tell matlab that no graphical interface is available.bsub [...] matlab -nodisplay -r run_NexuseSet the output filename using
-o <output_filename>. Default islsf.o<JobID>.Add
-Band/or-Nto be notified via email when your job begins and/or ends (reference). Also, you need to have a file.forwardin your home directory containing your email address.bsub -B -N [...] matlab -r run_NexuseFor parallel computation using OpenMP, the number of processor cores available needs to be specified using the environmental variable OMP_NUM_THREADS. Set this with the command
export OMP_NUM_THREADS=<number_of_cores>before issuing your
bsub ...command. Consider writing this to .bash_profile if you don’t want to repeat this each session.To use job arrays for parallel calculation, use option
-J "arrayname[1-10]" ./program [arguments]If some jobs fail, it’s possible to rerun only those:
brequeue -e <JOBID>
Batch system: How to check the status of your jobs
bjobslists jobsbjobs -plists only pending jobsbjobs -llists jobs with more detailsbkill <JOBID>kills job with JOBIDbpeek <JOBID>checks the output of a particular running jobbbjobs <JOBID>checks resources usedbjobs -l -affis another method for checking the resources used
Recommendations for running Nexus-e
Experience with resource usage
(1) [For final run] 2-day time resolution with convergence criterium 0.1 percent
For this, set tpRes = 2; and limDifference = 0.001 in run_Nexuse.m. The following command works fine.
bsub -n 36 -R "model==XeonGold_6150" -R "rusage[mem=5180]" -W "350:00" matlab -r run_Nexuse
(2) [For quick test] 8-day time resolution with convergence criterium 10 percent
For this, set tpRes = 8; and limDifference = 0.1 in run_Nexuse.m. The following command works fine.
bsub -n 36 -R "rusage[mem=2180]" -W "15:00" matlab -r run_Nexuse
(3) 8-day time resolution with convergence criterium 2 percent
For this, set tpRes = 8; iand limDifference = 0.02 in run_Nexuse.m. The following command works fine.
bsub -n 36 -R "rusage[mem=2180]" -W "30:00" matlab -r run_Nexuse
(4) 8-day time resolution with convergence criterium 0.1 percent
For this, set tpRes = 8; and limDifference = 0.001 in run_Nexuse.m. The following command works fine. (This is still being tested as of October 22 2020).
bsub -n 36 -R "rusage[mem=2180]" -W "60:00" matlab -r run_Nexuse
(5) [For benchmark] Run the benchmark script bench_Nexuse.m
As instructed in the setup procedure, you sometimes need to run the script bench_Nexuse.m to (re-)calibrate GemEl. This script runs rather quickly because it doesn’t run the entire energy-economic loop and has a low time resolution. The following command works fine.
bsub -n 36 -W "2:00" matlab -r bench_Nexuse
Finding your way around output on Euler
After you issue a batch job through bsub, Euler will respond by telling you what the jobID is.
After completion, the ‘standard output’ of running Nexus-e will be written to lsf.jobID and can inspected there using the text editor of your choice.
While the job is running, however, standard output can be accessed by means of the command
bpeek
which in turn writes to standard output of your console. To more conveniently browse this, several options exist:
write the output of bpeek to a file:
bpeek>yourfilename.txtand then look at it in the text editor of your choice.
use a pipe (‘|’) to find lines including certain ‘patterns’ with grep:
bpeek | grep <pattern>for example,
bpeek | grep ‘maximum difference’will print all lines that contain information about Gemel’s convergence criterion in given iterations.
use a pipe (‘|’) to display the output of bpeek using the ‘less’ command:
bpeek | lessThis will bring up an interface that lets the user browse through the output of bpeek without using the mouse. Some basic functions of the interface:
press “Space” to go down one page,
press “q” to quit the “less” interface,
press “/” to enter a pattern to search for.
Pressing “n” (n for next) after having searched for a pattern jumps to the next instance of the pattern;
pressing “N” jumps to the previous.
A relative to “/” (forward search) is “?” (backward search).
“G” goes to the end of bpeek’s output, “g” to the beginning.
Finding stuff in lsf.o<JobID> files
grep -i <expression> lsf*searches for<expression>in all files starting with lsf. E.g.,grep -inH nexus_disagg_nuc50_oct20 lsf*searches for copies of
database nexuse_disagg_nuc50mentioned in lsf* files that where made on October 20. This may be helpful for identifying the copies of the database that can safely be removed (”dropped”) from the PSL server.
Viewing output
The output of Matlab is directed to files lsf.o<JobID>. To inspect this output, open it in your editor of choice.
Several editors are available on Euler, e.g.,
emacs
vim
nano
nano may be a good choice if you don’t know emacs or vim (if you do know emacs or vim, you will have strong opinions on which one to use). Nano gives some on-screen instructions on basic key combinations. (E.g.,
^Gmeans type g while holding Ctrl pressed).For better user experience, copy files to your local computer for viewing with a GUI editor (e.g., using FileZilla).