For HPC Users

High Performance Computing (HPC)vs the UK Biobank Research Analysis Platform

Component

HPC

UKB-RAP

Driver/ Requestor

Head Node of Cluster

API Server

Submission Script Language

Portable Bash System (PBS) or SLURM

dx-toolkit

Worker

Requested from pool of machines in private cluster

requested from pool of machines in AWS/ Azure

Shared Storage

Shared file system for all nodes (Lustre, GPFS, etc)

Project storage (Amazon S3 storage)

Worker File I/O

Handled by Shared file system

needs to be transferred to and from project storage my commands on worker

Key Players with an HPC

  • With an HPC, there is a collection of specialized hardware, including mainframe computers, as well as a distributed processing software framework so that the incredibly large computer system can handle massive amounts of data and processing at high speeds.

  • The goal of an HPC is to have the files on the hardware and to also do the analysis on it. In this way, it is similar to a local computer, but with more specialty hardware and software to have more data and processing power.

  • Your computer: this communicates with the HPC cluster for resources

  • HPC Cluster

  • Shared Storage: common area for where files are stored. You may have directories branching out by users or in another format

  • Head Node: manages the workers and the shared storage

  • HPC Worker: is where we do our computation and is part of the HPC cluster.

  • These work together to increase processing power and to have jobs and queues so that when the amount of workers that are needed are available, the jobs can run.

Key Players in Cloud Computing

  • In comparison, cloud computing adds layers into analysis to increase computational power and storage.

  • This relationship and the layers involved are in the figure below:

  • Let's contrast this with processing a file on the UKB-RAP platform.

  • We'll start with our computer, the UKB-RAP platform, and a file from project storage.

  • We first use the dx run command, requesting to run an app on a file in project storage. This request is then sent to the platform, and an appropriate worker from the pool of workers is made available.

  • When the worker is available, we can transfer a file from the project to the worker.

  • The platform handles installing the app and its software environment to the worker as well.

  • Once our app is ready and our file is set, we can run the computation on the worker.

  • Any files that we generate must be transferred back into project storage.

Key Differences

  • HPC jobs are limited by how many workers are physically present on the HPC.

  • Traditionally, cloud computing has better architecture than an HPC, so the jobs are more efficient.

Transferring Files

  • One common barrier is getting our files onto the worker from project storage, and then doing computations with them on the worker. The last barrier we'll review is getting the file outputs we've generated from the worker back into the project storage.

  • Cloud computing has a nestedness to it and transferring files can make learning it difficult.

  • A mental model of how cloud computing works can help us overcome these barriers.

Resolution:

  • Cloud computing is indirect, and you need to think 2 steps ahead.

  • Here is the visual for thinking about the steps for file management:

Running apps

  • Creating apps and running them is covered later in the documentation.

  • Apps serve to (at minimum):

  • Request an EC2/AWS worker

  • Configure the worker's environment

  • Establish data transfer

Why do this with UKB-RAP?

  • Highly secure platform with built-in compliance infrastructure

  • Fully configurable platform

  • User can run single scripts to fully-automated, production-level workflows

  • Data transfer designed to be fast and efficient

  • Read and analyze massive files directly using dxfuse

  • Instances are configured for you via apps

  • Variety of ways to configure your own environments

  • Access to the wealth of AWS/Azure resourcesarrow-up-right

  • Largest Azure instances: ~4Tb RAM

  • Largest AWS instances: ~2Tb RAM

Equivalent Commands

Task

dx-toolkit

PBS

SLURM

Run Job

dx run <app-id> <script>

qsub <script>

sbatch <script>

Monitor Job

dx find jobs

qstat

squeue

Kill Job

dx terminate <jobid>

qdel <jobid>

scancel <jobid>

Practical Approaches

Batch Processing Comparisons

Component

HPC Recipe

Cloud Recipe

1

List Files

List Files

2

Request 1 worker/ file

Use loop for each file: 1) use dx run, 2) transfer file, and 3) run commands

3

use array ids to process 1 file/worker

4

submit job to head node

Last updated

Was this helpful?