/tag/parallel-computing
Using UVA’s High-Performance Computing Systems
Afton is the University of Virginia’s newest High-Performance Computing system. The Afton supercomputer is comprised of 300 compute node each with 96 compute cores based on the AMD EPYC 9454 architecture for a total of 28,800 cores. The increase in core count is augmented by a significant increase in memory per node compared to Rivanna. Each Afton node boasts a minimum of 750 Gigabytes of memory, with some supporting up to 1.5 Terabytes of RAM memory. The large amount of memory per node allows researchers to efficiently work with the ever-expanding datasets we are seeing across diverse research disciplines. The Afton and Rivanna systems provide access to 55 nodes with NVIDIA general purpose GPU accelerators (RTX2080, RTX3090, A6000, V100, A40, and A100), including an NVIDIA BasePOD.
NVIDIA DGX BasePOD™
Introducing the NVIDIA DGX BasePOD™ As artificial intelligence (AI) and machine learning (ML) continue to change how academic research is conducted, the NVIDIA DGX BasePOD, or BasePOD, brings new AI and ML functionality UVA’s High-Performance Computing (HPC) system. The BasePOD is a cluster of high-performance GPUs that allows large deep-learning models to be created and utilized at UVA.
The NVIDIA DGX BasePOD™ on Rivanna and Afton, hereafter referred to as the POD, is comprised of:
10 DGX A100 nodes with 2TB of RAM memory per node 80 GB GPU memory per GPU device Compared to the regular GPU nodes, the POD contains advanced features such as:
ACCORD: Jupyter Lab
Back to Overview
Jupyter Lab allows for interactive, notebook-based analysis of data. A good choice for pulling quick results or refining your code in numerous languages including Python, R, Julia, bash, and others.
Learn more about Jupyter Lab
ACCORD: RStudio
Back to Overview
RStudio is the standard IDE for research using the R programming language.
Learn more about RStudio
ACCORD: Theia IDE
Back to Overview
Theia Python is a rich IDE that allows researchers to manage their files and data, write code with an intelligent editor, and execute code within a terminal session.
Learn more about the Theia Python IDE
FastX Web Portal
Overview FastX is a commercial solution that enables users to start an X11 desktop environment on a remote system. It is available on the UVA HPC frontends. Using it is equivalent to logging in at the console of the frontend.
Using FastX for the Web We recommend that most users access FastX through its Web interface. To connect, point a browser to:
https://fastx.hpc.virginia.edu
Off Campus? Connecting to Rivanna and Afton HPC systems from off Grounds via Secure Shell Access (SSH) or FastX requires a VPN connection. We recommend using the UVA More Secure Network if available. The UVA Anywhere VPN can be used if the UVA More Secure Network is not available.
Open OnDemand
Overview Open OnDemand is a graphical user interface that allows access to UVA HPC via a web browser. Within the Open OnDemand environment users have access to a file explorer; interactive applications like JupyterLab, RStudio Server & FastX Web; a command line interface; and a job composer and job monitor.
Logging in to UVA HPC The HPC system is accessible through the Open OnDemand web client at https://ood.hpc.virginia.edu. Your login is your UVA computing ID and your password is your Netbadge password. Some services, such as FastX Web, require the Eservices password. If you do not know your Eservices password you must change it through ITS by changing your Netbadge password (see instructions).
Open OnDemand: File Explorer
Open OnDemand provides an integrated file explorer to browse and manage small files. Rivanna and Afton have multiple locations to store your files with different limits and policies. Specifically, each user has a relatively small amount of permanent storage in his/her home directory and a large amount of temporary storage (/scratch) where large data sets can be staged for job processing. Researchers can also lease storage that is accessible on Rivanna. Contact Research Computing or visit the storage website for more information.
The file explorer provides these basic functions:
Renaming of files Viewing of text and small image files Editing text files Downloading & uploading small files To see the storage locations that you have access to from within Open OnDemand, click on the Files menu.
Open OnDemand: Job Composer
Open OnDemand allows you to submit Slurm jobs to the cluster without using shell commands.
The job composer simplifies the process of:
Creating a script Submitting a job Downloading results Submitting Jobs We will describe creating a job from a template provided by the system.
Open the Job Composer tab from the Open OnDemand Dashboard.
Go to the New Job tab and from the dropdown, select From Template. You can choose the default template or you can select from the list.
Click on Create New Job. You will need to edit the file that pops up, so click the light blue Open Editor button at the bottom.
Economic Market Behavior
While conducting research for a highly-technical study of market behavior, Dr. Ciliberto realized that he needed to parallelize an integration over a sample distribution. RC staff member Ed Hall successfully parallelized Ciliberto’s Matlab code and taught him how to do production runs on the University’s high-performance clusters. “The second stage estimator was computationally intensive,” Ciliberto recalls. “We needed to compute the distribution of the residuals and unobservables for multiple parameter values and at many different points of the distribution, which requires parallelizing the computation. Ed Hall’s expertise in this area was crucial. In fact, without Ed’s contribution, this project could not have been completed.
Tracking Bug Movements
Ed Hall worked with the Brodie Lab in the Biology department, to set up a workflow to analyze videos of bug tracking experiments on the Rivanna Linux cluster. They wanted to use the community Matlab software (idTracker) for beetle movement tracking. Their two goals were to shorten the software runtime and to automate the process. There was a large backlog of videos to go through. Ed installed the idTracker software on Rivanna and modified the code to parallelize the bug tracking process. He wrote and documented shell scripts to automate their workflow on the cluster.
PI: Edmund Brodie, PhD (Department of Biology)
Slurm Job Manager
SLURM Would you like to take an interactive SLURM quiz? y/N |
Overview UVA HPC is a multi-user, managed environment. It is divided into login nodes (also called frontends), which are directly accessible by users, and compute nodes, which must be accessed through the resource manager. Users prepare their computational workloads, called jobs, on the login nodes and submit them to the job controller, a component of the resource manager that runs on login nodes and is responsible for scheduling jobs and monitoring the status of the compute nodes.
We use Slurm, an open-source tool that manages jobs for Linux clusters.