Concrete ML
WebsiteLibrariesProducts & ServicesDevelopersSupport
1.9
1.9
  • Welcome
  • Get Started
    • What is Concrete ML?
    • Installation
    • Key concepts
    • Inference in the cloud
  • Built-in Models
    • Linear models
    • Tree-based models
    • Neural networks
    • Nearest neighbors
    • Encrypted dataframe
    • Encrypted training
  • LLMs
    • Inference
    • Encrypted fine-tuning
  • Deep Learning
    • Using Torch
    • Using ONNX
    • Step-by-step guide
    • Debugging models
    • Optimizing inference
  • Guides
    • Prediction with FHE
    • Production deployment
    • Hybrid models
    • Serialization
    • GPU acceleration
  • Tutorials
    • See all tutorials
    • Built-in model examples
    • Deep learning examples
  • References
    • API
  • Explanations
    • Security and correctness
    • Quantization
    • Pruning
    • Compilation
    • Advanced features
    • Project architecture
      • Importing ONNX
      • Quantization tools
      • FHE Op-graph design
      • External libraries
  • Developers
    • Set up the project
    • Set up Docker
    • Documentation
    • Support and issues
    • Contributing
    • Support new ONNX node
    • Release note
    • Feature request
    • Bug report
Powered by GitBook

Libraries

  • TFHE-rs
  • Concrete
  • Concrete ML
  • fhEVM

Developers

  • Blog
  • Documentation
  • Github
  • FHE resources

Company

  • About
  • Introduction to FHE
  • Media
  • Careers
On this page
  • Support
  • Performance
  • Prerequisites
  • Built-in models and deep NNs
  • Checking GPU can be enabled
  • Usage
  • LLMs

Was this helpful?

Export as PDF
  1. Guides

GPU acceleration

This document provides a complete instruction on using GPU acceleration with Concrete ML.

Concrete ML support compiling both built-in and custom models using a CUDA-accelerated backend. However, once a model is compiled for CUDA, executing it on a non-CUDA-enabled machine will raise an error.

Support

Feature
Built-in models
Deep NNs and LLMs
Deployment
DataFrame

GPU support

✅

✅

✅

❌

When compiling a model for GPU, the model is assigned GPU-specific crypto-system parameters. These parameters are more constrained than the CPU-specific ones. As a result, the Concrete compiler may have difficulty finding suitable GPU-compatible crypto-parameters for some models, leading to a NoParametersFound error.

Performance

On high-end GPUs like V100, A100, or H100, the performance gains range from 1x to 10x compared to a desktop CPU.

When compared to a high-end server CPUs(64-core or 96-core), the speed-up is typically around 1x to 3x.

On consumer grade GPUs such as GTX40xx or GTX30xx, there may be little speedup or even a slowdown compared to execution on a desktop CPU.

Prerequisites

Built-in models and deep NNs

This section pertains to models that are compiled using the sklearn-style built-in model classes or that are compiled using compile_torch_model or compile_brevitas_qat_model.

To use the CUDA-enabled backend, install the GPU-enabled Concrete compiler:

pip install --extra-index-url https://pypi.zama.ai/gpu concrete-python

If you already have an existing version of concrete-python installed, it will not be re-installed automatically. In that case, manually uninstall the current version and then install the GPU-enabled version:

pip uninstall concrete-python
pip install --extra-index-url https://pypi.zama.ai/gpu concrete-python

To switch back to the CPU-only version of the compiler, change the index-url to the CPU-only repository or remove the index-url parameter:

pip uninstall concrete-python
pip install --extra-index-url https://pypi.zama.ai/cpu concrete-python

Checking GPU can be enabled

To check if the CUDA acceleration is available, use the following helper functions from concrete-python:

import concrete.compiler; 
print("GPU enabled: ", concrete.compiler.check_gpu_enabled())
print("GPU available: ", concrete.compiler.check_gpu_available())

Usage

To compile a model for CUDA, simply supply the device='cuda' argument to its compilation function:

  • For built-in models, use .compile function.

  • For custom models, use eithercompile_torch_model or compile_brevitas_qat_model.

LLMs

This section pertains to models that are compiled with HybridFHEModel.

PreviousSerializationNextSee all tutorials

Last updated 1 month ago

Was this helpful?

The models compiled as described in will use GPU acceleration if a GPU is available on the machine where the models are executed. No specific compilation configuration is required to enable GPU execution for these models.

the LLM section