AWS High Performance Computing (HPC)

This is part of a blog series giving a high level overview of the different services examined on the AWS Solution Architect Associate exam, to view the whole series click here.

HPC Summary

High performance computing typically refers to aggregating computing power in order to deliver higher performance.
AWS provides elastic and scalable infrastructure and so High Performance Computing can be achieved in a number of ways; through data transfer, compute, networking, storage, orchestration and automation.

HPC for Data Transfer & Management

To run HPC applications in the cloud, you might first need to move your data into AWS. For this AWS offers a number of different services depending on the size of the data you plan on transferring.
Snowball and Snowmobile can transport terabyte/petabytes worth of data in a secure manner, reducing common challenges of high network costs or long transfer times.
DataSync which can make it easier for you to automate the transportation of your data from on-premise to AWS S3, EFS or FSx. Data sync can also help with handling encryption and network optimisation.
Direct Connect allows you to set up a dedicated network connection between your on-premise datacenter/ office and AWS for increased bandwidth throughout and a more consistent network experience.

HPC for Compute & Networking

For setting up your HPC applications there is a variety of compute instances available and network enhancements to choose from and configure to support your needs.
EC2 instances can be GPU or CPU optimised in order to run HPC workloads.
Enhanced Networking (EN) can be used with EC2 instances in order to provide higher network performance and lower latencies.
AWS Auto Scaling can be used to optimise performance, by monitoring applications and adjusting capacity to achieve required performance at the lowest cost.
Placement groups can also be used to meet the needs to your HPC. For example you could use a cluster placement group to achieve lower latency and higher network throughput.
Elastic Fabric Adapters (EFA) can also be used to enable high levels of inter-node communication and to enhance performance at scale.

HPC for Storage

When it comes to storage for HPC applications, AWS has both instance attached options and network storage options depending on your requirements.

Instance Attached Storage:

Elastic Block Storage (EBS) can be used if your application requires up to 64,000 IOPS.
However, if you applications requires millions of IOPs and low latency you might consider using Instance Store.

2. Network storage:

S3 is a distributed object based storage that could be used to store the input/output of your HPC applications (it is not for file systems)
EFS is highly scalable and can scale on demand to petabytes automatically, without disrupting applications.
FSx Lustre is HPC optimised for file systems and offers low latencies, high throughput and millions of IOPS.

HPC for Automation & Orchestration

For HPC applications scheduling and automating jobs can be essential for efficiency.
AWS Batch can be used in order to run hundreds/thousands of batch computing jobs. It can dynamically provision the right quantity and type of resources for the job and it also supports multi-node parallel jobs spanning multiple EC2 instances. s
AWS Parallel Cluster allows you to quickly build a HPC environment in AWS. It is an open source cluster management tool for managing and deploying HPC clusters in the cloud.