Accessing LASSO-CACTI via ARM’s Computing Environment

The ARM user facility provides two methods for users to work with LASSO data within ARM’s computing environment. This can greatly simplify the experience for users by avoiding having to download LASSO-CACTI data to outside computers. Depending on how many files a user needs and the level of computing requirements, users should consider using ARM’s resources if they would better suit their research needs. The two methods are a JupyterHub server and the Cumulus cluster.

Users can benefit from using ARM’s resources in many ways. The Jupyter server is an easy way to do plotting and small analyses using ARM observations and LASSO data. The network connection between ARM’s tape drives and the parallel file system attached to Jupyter and Cumulus is much faster than for transferring data offsite. Jupyter can also be used for prototyping more sophisticated analyses that would be submitted to the cluster for production purposes across a larger number of datasets or times. More computationally demanding tasks can be submitted to the compute nodes on Cumulus. For example, running additional subsets of the wrfout files, re-running WRF simulations, or feature tracking can be done on the compute nodes.

Requesting an ARM Computing Account

Many of ARM’s computing resources share the same login account. Thus, one only must establish an account once for accessing both the JupyterHub and high-performance computing (HPC) resources. Note that an HPC account is different from an ARM LDAP account. The latter is tied more to ARM’s web resources and will be required to request an HPC account. The HPC account will be set up by the Oak Ridge Leadership Class Facility (OLCF) in coordination with ARM staff since OLCF maintains ARM’s HPC systems.

Instructions for requesting an account and the associated web form are at https://www.arm.gov/capabilities/computing-resources. Clicking on the green “Request Access to the Cumulus Cluster” button brings one to the request form. A single request can be done per user, or a group of investigators can organize to make a single request. Requests to only use the JupyterHub server do not need to provide detailed computational information, while users wanting to run more intensive programs on Cumulus should provide more of the requested information. The primary purpose is to help ARM management understand what is needed so they can appropriately guide users to the right resources. For example, the Cumulus cluster currently does not have any GPUs, so programs requiring GPUs would not work on Cumulus. Smaller LASSO requests go through a streamlined approval process, while more demanding requests that include substantial computing time get reviewed quarterly.

Once the application to ARM’s HPC has been approved, the next step for users is to request an OLCF account. This is done at https://my.olcf.ornl.gov/account-application-new, at which point users will need to know the allocation name that will be provided by ARM. Foreign nationals will also need to provide sufficient information to pass a security review, which is like reviews for accessing DOE national laboratories. This review can take several weeks, so users should plan ahead.

Questions about how to navigate the account request process can be addressed to clustersupport@arm.gov and questions about working with LASSO data can be sent to lasso@arm.gov.

ARM’s JupyterHub Server

ARM uses JupyterHub, which in turn provides users with a containerized Jupyterlab single-user environment upon login. Collectively, we will simply call this Jupyter for our purposes. When a user logs into Jupyter at https://jupyter-open.olcf.ornl.gov/, they can select the default environment called “Slate,” which serves up 8 CPUs and 24 GB of memory on an available computer. This instance can directly access ARM’s GPFS file system where data can be staged for analysis. Staging data is done via working on the Cumulus cluster, described below.

The default kernel available to users has a handful of libraries but is likely insufficient for most needs. Therefore, we recommended defining a new conda environment containing the essential analysis software necessary to work with WRF and netCDF data. An example environment definition can be found in this yaml file: lasso_conda_env.yaml. To create the associated conda environment, place the yaml file in an appropriate directory and then do the following. From a login node on Cumulus, create a new conda environment by typing

conda env create -n lasso -f lasso_conda_env.yaml

Then, install this environment on Jupyter by typing

python -m ipykernel install --user --name=lasso

If this succeeds, this newly created kernel will be visible within Jupyter. More information about using Jupyter is readily available online.

Direct Logins to Cumulus

Users requiring more resources than available via Jupyter notebooks can log into Cumulus to access traditional HPC computing. Using SSH to open a terminal session at cumulus.ccs.ornl.gov brings one to the login node in a Unix environment. Simple tasks can be done interactively from this node, and anything resource intensive should be run on one or more of the compute nodes, either through an interactive queue request or by submitting a batch job to the job queue.

The Cumulus cluster currently consists of two login nodes and 118 compute nodes. Each has 128 cores per node, and most have 256 GB of memory per node. Sixteen of the compute nodes are “high memory” with 512 GB. Mounted to Cumulus is a home partition and a GPFS parallel file system called Wolf. The Wolf file system is the same one visible from Jupyter, so users can share files between resources this way.

Job scheduling on Cumulus is done using Slurm. A selection of queues exist for selecting normal versus high-memory nodes (batch_short versus batch_high_mem) and types of jobs. The batch_short queue is sufficient for most needs. An interactive node can be requested with a command like

salloc -A atm000 -p debug -N 1 -t 2:00:00

where atm000 and debug would be replaced by the appropriate project allocation and partition, respectively. The above requests two hours of time, which can also be adjusted based on job requirements.

The following example Slurm submission script for a multi-node WRF simulation can be used as a model for other types of submissions that users might need. In this case, 56 nodes are requested with 896 MPI ranks and 8 OpenMP threads per rank. This is the typical approach used when the LASSO-CACTI LES domains were run.

#!/usr/bin/csh
#SBATCH --job-name=wrf_run
#SBATCH -A atm000
#SBATCH --partition=batch_short
#SBATCH --time=20:00:00
#SBATCH --nodes=56
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=8
#SBATCH --output=slurm.out
#SBATCH --no-requeue
#SBATCH --exclusive
#SBATCH --mail-type=END
#SBATCH --mail-user=john.doe@pnnl.gov

date
cd $SLURM_SUBMIT_DIR
echo $SLURM_SUBMIT_DIR
rm rsl.out.* rsl.error.*

setenv OMP_DISPLAY_ENV true
setenv OMP_NUM_THREADS 8
setenv OMP_PLACES cores
setenv OMP_PROC_BIND true
setenv OMP_STACKSIZE 64000000
limit stacksize 64000000

srun --ntasks-per-node=16 --cpus-per-task=8 --cpu_bind=cores wrf.exe
date

More detailed information about Cumlus can be found at https://docs.arm-hpc.ornl.gov/systems/cumulus_2.html and documentation for Slurm is at https://slurm.schedmd.com/documentation.html.