GPU acceleration
This document provides a complete instruction on using GPU acceleration with Concrete ML.
Concrete ML support compiling both built-in and custom models using a CUDA-accelerated backend. However, once a model is compiled for CUDA, executing it on a non-CUDA-enabled machine will raise an error.
Support
GPU support
✅
✅
✅
❌
When compiling a model for GPU, the model is assigned GPU-specific crypto-system parameters. These parameters are more constrained than the CPU-specific ones. As a result, the Concrete compiler may have difficulty finding suitable GPU-compatible crypto-parameters for some models, leading to a NoParametersFound
error.
Performance
On high-end GPUs like V100, A100, or H100, the performance gains range from 1x to 10x compared to a desktop CPU.
When compared to a high-end server CPUs(64-core or 96-core), the speed-up is typically around 1x to 3x.
On consumer grade GPUs such as GTX40xx or GTX30xx, there may be little speedup or even a slowdown compared to execution on a desktop CPU.
Prerequisites
To use the CUDA-enabled backend, install the GPU-enabled Concrete compiler:
If you already have an existing version of concrete-python
installed, it will not be re-installed automatically. In that case, manually uninstall the current version and then install the GPU-enabled version:
To switch back to the CPU-only version of the compiler, change the index-url to the CPU-only repository or remove the index-url parameter:
Checking GPU can be enabled
To check if the CUDA acceleration is available, use the following helper functions from concrete-python
:
Usage
To compile a model for CUDA, simply supply the device='cuda'
argument to its compilation function:
For built-in models, use
.compile
function.For custom models, use either
compile_torch_model
orcompile_brevitas_qat_model
.
Last updated