Recently we announced the first SingularityCE 3.9.0 release candidate, and gave an update on community engagement in the project. Over the next weeks,leading up to the stable release, we’ll be exploring more features in depth.
One of the benefits of SingularityCE has always been easy access to GPUs and other devices. Because of our focus on shared systems, and HPC in particular, we’ve aimed to make it easy to use GPUs from inside the container within your batch jobs and interactive workflows. GPU devices are available to users in containers, just as they are on the host. The
--nv option makes required libraries and utilities available to your containerized workload, so that CUDA applications run easily.
Up until SingularityCE 3.9.0, we’ve used our own code to set up GPU devices and libraries in the container. This has the advantage of requiring no external tools, but means SingularityCE itself needs to keep pace with additions to the set of CUDA libraries. GPU configuration also works differently than with OCI runtimes such as Docker.
We’ve recently added experimental support to set up GPUs in the container using NVIDIA’s
nvidia-container-cli. This tool, which is part of the libnvidia-container project, is used in the nvidia-docker runtime, and by other OCI container platforms, to support CUDA compute and GPU graphics. As well as tracking driver and CUDA updates,
nvidia-container-cli brings additional benefits. With large GPU nodes containing multiple GPU cards becoming more common, there’s a frequent need to run multiple containerized jobs on the same host, with each limited to a subset of the GPUs.
SingularityCE’s legacy GPU support doesn’t allow strict limits on GPU access. All devices are presented to every container, and containers must obey the
CUDA_VISIBLE_DEVICES environment variable in order to distribute work to specific GPUs.
nvidia-container-cli is able to manage which devices are bound into the container at setup time, making strict controls on GPU usage per container possible.
To use the new GPU support, you must have
nvidia-container-cli installed on your system. Then, add the
--nvccli flag wherever you use
--nv. We’ve written the nvccli functionality so that it works similarly to the existing approach by default. We’ll make all GPUs available in the container, and supply the computer libraries, and utility programs (nvidia-smi).
More powerful control of GPUs is available when you use the
--contain option in conjunction with
--nvccli. SingularityCE will then look at the
NVIDIA_VISIBLE_DEVICES environment variable, and only make specified GPUs available in the container: