Canonical’s recipe for High Performance Computing

In essence, High Performance Computing (HPC) is quite simple. Speed and scale. In practice, the concept is quite complex and hard to achieve. It is not dissimilar to what happens when you go from a regular car to a supercar or a hypercar – the challenges and problems you encounter at 100 km/h are vastly different from those at 300 km/h. A whole new set of constraints emerges.

Furthermore, no two HPC setups are the same. Every organisation does HPC in a manner uniquely tailored to their workloads and requirements. That said, the basic components and functions are repeatable across the entire spectrum of tools and technologies used in this space. Today, we’d like to share our recipe on how you can best blend these “ingredients” to build a Canonical-flavoured HPC setup. We do not see it as an absolute truth or the one superior way of doing things – but we believe it is practical, pragmatic and efficient.

What makes HPC … HPC?

At the hardware stratum, the power of HPC will usually come from parallelised execution of discrete chunks of data on relatively inexpensive commodity machines – relative to what the cost would be if there was a single system that could offer the same kind of performance. The use of GPU modules can make data crunching significantly faster compared to conventional CPUs, which is why HPC setups will often include graphics cards for processing.

The parallel execution needs to be fast – and well orchestrated – to avoid starvation at any one point in the setup. To that end, HPC setups will also include fast network and fast storage, to minimise (or eliminate) the gap in the data transfer and processing speed provided by conventionally fast elements (CPU/GPU, memory) and slow elements (I/O bus).

On the software level, orchestration will require an intelligent scheduler, which can process, dispatch and manage data across the HPC setup (a cluster of compute nodes). Usually, engineers and scientists working with the data will also require some programs to feed their workloads into the HPC environment. To that end, they will need a console to dispatch their jobs, and a set of tools and utilities to prepare their jobs – best if they can be managed centrally through a package manager.

In a rather recipe-like style, HPC boils down to: a set of machines with CPU/GPUs, fast storage, fast network, scheduler, package manager, and a management console. Now, let’s talk about what you could expect from Canonical in the HPC space.

Juju for hardware deployment

If you’ve never heard of Juju, here’s a one-liner: Juju is an open source orchestration engine for software operators that enables the deployment, integration and lifecycle management of applications at any scale, on any infrastructure.

The idea is that rather than spending a LOT of time figuring out how to build the topology of your data centre or cloud or HPC setup, you let Juju build the environment for you, and you invest time in the actual workloads you want to run. Of course, there are no shortcuts, and Juju isn’t a magical solution for IT-less IT. But it can perhaps help, in some scenarios, get businesses and organisations past the initial hurdle of setting up an environment.

Juju’s combination of deployment and operations behaviour comes in the form of charms – these are operators – business logic encapsulated in reusable software packages that automate every aspect of an application’s life. In other words, charms are packages that bundle both the configuration and application business logic (how it should work once deployed). Therefore, for HPC, there could be a Charmed HPC solution. It will include the deployment component, as well as the software ingredients needed to run a fully self-contained, independent HPC setup.

A set of charms and snaps

Once you move past the hardware hurdle, the software part should become easier. Juju will provision Ubuntu systems, complete with all the bits and pieces needed to do HPC workloads right away.

Spack will be the package manager (as we mentioned in our holidays message). It allows HPC users to quickly and easily search and install some 7,000 scientific programs and utilities. Next, once the engineers and scientists are ready to dispatch their workloads, they can use the Open OnDemand utility to connect to the HPC cluster.

From there on, SLURM will process the tasks and schedule them across the cluster. Canonical aims to provide software and hardware optimisation at every junction, so when different processes run on a Canonical HPC cluster, the end user will benefit from better performance, power utilisation, and improved security and stability. In due course, we will share more details on how we intend to achieve this, as well as provide timely updates on benchmarks and tweaks.

There will also be an observability component to help monitor the environment, which for now will remain unnamed. We also think strong integration with LDAP and NFS is a must for such a setup to work well, so we will invest time in providing those, too. Ceph integration is another component in our recipe.

This is a recommendation, not a rule

By and large, this is our “dish”. But, like any dish, it can always benefit from more seasoning. We will figure out those fine details as we go along, and that may include additional software components we haven’t mentioned above. Nevertheless, we want to build our HPC solution to be flexible and modular, so if you have specific requirements, it doesn’t become a take-it-or-leave it product. 

It’s still early days in how we want to conceive and build our solution, so you should stay tuned for updates. The easiest way is to subscribe to the newsletter, or reach out directly if you have questions..

Take care, and watch out for the next instalment in this series.

Photo by Katie Smith on Unsplash.

Talk to us today

Interested in running Ubuntu in your organisation?

Newsletter signup

Get the latest Ubuntu news and updates in your inbox.

By submitting this form, I confirm that I have read and agree to Canonical's Privacy Policy.

Related posts

We wish you RISC-V holidays!

There are three types of computer users: the end user, the system administrator, and the involuntary system administrator. As it happens, everyone has found...

High Performance Computing – It’s all about the bottleneck

The term High Performance Computing, HPC, evokes a lot of powerful emotions whenever mentioned. Even people who do not necessarily have vocational knowledge...

Designing Canonical’s Figma libraries for performance and structure

How Canonical’s Design team rebuilt their Figma libraries, with practical guidelines on structure, performance, and maintenance processes.