Concrete ML
WebsiteLibrariesProducts & ServicesDevelopersSupport
1.9
1.9
  • Welcome
  • Get Started
    • What is Concrete ML?
    • Installation
    • Key concepts
    • Inference in the cloud
  • Built-in Models
    • Linear models
    • Tree-based models
    • Neural networks
    • Nearest neighbors
    • Encrypted dataframe
    • Encrypted training
  • LLMs
    • Inference
    • Encrypted fine-tuning
  • Deep Learning
    • Using Torch
    • Using ONNX
    • Step-by-step guide
    • Debugging models
    • Optimizing inference
  • Guides
    • Prediction with FHE
    • Production deployment
    • Hybrid models
    • Serialization
    • GPU acceleration
  • Tutorials
    • See all tutorials
    • Built-in model examples
    • Deep learning examples
  • References
    • API
  • Explanations
    • Security and correctness
    • Quantization
    • Pruning
    • Compilation
    • Advanced features
    • Project architecture
      • Importing ONNX
      • Quantization tools
      • FHE Op-graph design
      • External libraries
  • Developers
    • Set up the project
    • Set up Docker
    • Documentation
    • Support and issues
    • Contributing
    • Support new ONNX node
    • Release note
    • Feature request
    • Bug report
Powered by GitBook

Libraries

  • TFHE-rs
  • Concrete
  • Concrete ML
  • fhEVM

Developers

  • Blog
  • Documentation
  • Github
  • FHE resources

Company

  • About
  • Introduction to FHE
  • Media
  • Careers
On this page
  • Support
  • Performance
  • Prerequisites
  • Built-in models and deep NNs
  • Checking GPU can be enabled
  • Usage
  • LLMs

Was this helpful?

Export as PDF
  1. Guides

GPU acceleration

This document provides a complete instruction on using GPU acceleration with Concrete ML.

Concrete ML support compiling both built-in and custom models using a CUDA-accelerated backend. However, once a model is compiled for CUDA, executing it on a non-CUDA-enabled machine will raise an error.

Support

Feature
Built-in models
Deep NNs and LLMs
Deployment
DataFrame

GPU support

✅

✅

✅

❌

When compiling a model for GPU, the model is assigned GPU-specific crypto-system parameters. These parameters are more constrained than the CPU-specific ones. As a result, the Concrete compiler may have difficulty finding suitable GPU-compatible crypto-parameters for some models, leading to a NoParametersFound error.

Performance

On high-end GPUs like V100, A100, or H100, the performance gains range from 1x to 10x compared to a desktop CPU.

When compared to a high-end server CPUs(64-core or 96-core), the speed-up is typically around 1x to 3x.

On consumer grade GPUs such as GTX40xx or GTX30xx, there may be little speedup or even a slowdown compared to execution on a desktop CPU.

Prerequisites

Built-in models and deep NNs

This section pertains to models that are compiled using the sklearn-style built-in model classes or that are compiled using compile_torch_model or compile_brevitas_qat_model.

To use the CUDA-enabled backend, install the GPU-enabled Concrete compiler:

pip install --extra-index-url https://pypi.zama.ai/gpu concrete-python

If you already have an existing version of concrete-python installed, it will not be re-installed automatically. In that case, manually uninstall the current version and then install the GPU-enabled version:

pip uninstall concrete-python
pip install --extra-index-url https://pypi.zama.ai/gpu concrete-python

To switch back to the CPU-only version of the compiler, change the index-url to the CPU-only repository or remove the index-url parameter:

pip uninstall concrete-python
pip install --extra-index-url https://pypi.zama.ai/cpu concrete-python

Checking GPU can be enabled

To check if the CUDA acceleration is available, use the following helper functions from concrete-python:

import concrete.compiler; 
print("GPU enabled: ", concrete.compiler.check_gpu_enabled())
print("GPU available: ", concrete.compiler.check_gpu_available())

Usage

To compile a model for CUDA, simply supply the device='cuda' argument to its compilation function:

  • For built-in models, use .compile function.

  • For custom models, use eithercompile_torch_model or compile_brevitas_qat_model.

LLMs

This section pertains to models that are compiled with HybridFHEModel.

PreviousSerializationNextSee all tutorials

Last updated 29 days ago

Was this helpful?

The models compiled as described in will use GPU acceleration if a GPU is available on the machine where the models are executed. No specific compilation configuration is required to enable GPU execution for these models.

the LLM section