Concrete ML
WebsiteLibrariesProducts & ServicesDevelopersSupport
1.9
1.9
  • Welcome
  • Get Started
    • What is Concrete ML?
    • Installation
    • Key concepts
    • Inference in the cloud
  • Built-in Models
    • Linear models
    • Tree-based models
    • Neural networks
    • Nearest neighbors
    • Encrypted dataframe
    • Encrypted training
  • LLMs
    • Inference
    • Encrypted fine-tuning
  • Deep Learning
    • Using Torch
    • Using ONNX
    • Step-by-step guide
    • Debugging models
    • Optimizing inference
  • Guides
    • Prediction with FHE
    • Production deployment
    • Hybrid models
    • Serialization
    • GPU acceleration
  • Tutorials
    • See all tutorials
    • Built-in model examples
    • Deep learning examples
  • References
    • API
  • Explanations
    • Security and correctness
    • Quantization
    • Pruning
    • Compilation
    • Advanced features
    • Project architecture
      • Importing ONNX
      • Quantization tools
      • FHE Op-graph design
      • External libraries
  • Developers
    • Set up the project
    • Set up Docker
    • Documentation
    • Support and issues
    • Contributing
    • Support new ONNX node
    • Release note
    • Feature request
    • Bug report
Powered by GitBook

Libraries

  • TFHE-rs
  • Concrete
  • Concrete ML
  • fhEVM

Developers

  • Blog
  • Documentation
  • Github
  • FHE resources

Company

  • About
  • Introduction to FHE
  • Media
  • Careers
On this page
  • Introduction
  • Compilation
  • Server Side Deployment
  • Client Side

Was this helpful?

Export as PDF
  1. Guides

Hybrid models

PreviousProduction deploymentNextSerialization

Last updated 1 month ago

Was this helpful?

This document explains how to use Concrete ML API to deploy hybrid models in Fully Homomorphic Encryption (FHE).

Introduction

FHE allows cloud applications to process private user data securely, minimizing the risk of data leaks. Deploying machine learning (ML) models in the cloud offers several advantages:

  • Simplifies model updates.

  • Scales to large user bases by leveraging substantial compute power.

  • Protects model's Intellectual Property (IP) by keeping the model on a trusted server rather than on client devices.

However, not all applications can be easily converted to FHE computation. The high computation cost of FHE might exceed latency requirements for full conversion.

Hybrid models provide a balance between on-device deployment and cloud-based deployment. This approach involves:

  • Executing parts of the model on the client side.

  • Securely processing other parts with FHE on the server side.

Concrete ML supports hybrid deployment for various neural network models, including Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Large Language Models(LLM).

To protect model IP, carefully choose the model parts to execute in the cloud. Some black-box model stealing attacks use knowledge distillation or differential methods. Generally, the difficulty of stealing a machine learning model increases with the model's size, number of parameters, and depth.

The hybrid model deployment API simplifies integrating the into neural network style models that are compiled with or .

Compilation

To use hybrid model deployment, the first step is to define which part of the PyTorch neural network model must be executed in FHE. Ensure the model part is a nn.Module and is identified by its key in the original model's .named_modules().

Here is an example:

import numpy as np
import os
import torch

from pathlib import Path
from torch import nn

from concrete.ml.torch.hybrid_model import HybridFHEModel, tuple_to_underscore_str
from concrete.ml.deployment import FHEModelServer


class FCSmall(nn.Module):
    """Torch model for the tests."""

    def __init__(self, dim):
        super().__init__()
        self.seq = nn.Sequential(nn.Linear(dim, dim), nn.ReLU(), nn.Linear(dim, dim))

    def forward(self, x):
        return self.seq(x)

dim = 10
model = FCSmall(dim)
model_name = "FCSmall"
submodule_name = "seq.0"

inputs = torch.Tensor(np.random.uniform(size=(10, dim)))
# Prints ['', 'seq', 'seq.0', 'seq.1', 'seq.2']
print([k for (k, _) in model.named_modules()])

# Create a hybrid model
hybrid_model = HybridFHEModel(model, [submodule_name])
hybrid_model.compile_model(
    inputs,
    n_bits=8,
)

models_dir = Path(os.path.abspath('')) / "compiled_models"
models_dir.mkdir(exist_ok=True)
model_dir = models_dir / model_name
hybrid_model.save_and_clear_private_info(model_dir, via_mlir=True)

Server Side Deployment

  • Serializes the FHE circuits for the model parts chosen to be server-side.

  • Saves the client-side model, removing the weights of the layers transferred to the server.

input_shape_subdir = tuple_to_underscore_str( (1,) + inputs.shape[1:] )
MODULES = { model_name: { submodule_name: {"path":  model_dir / submodule_name / input_shape_subdir }}}
server =  FHEModelServer(str(MODULES[model_name][submodule_name]["path"]))

Client Side

You can develop a client application that deploys a model with hybrid deployment in a very similar manner to on-premise deployment: Use PyTorch to load the model normally, but specify the remote endpoint and the part of the model to be executed remotely.

# Modify model to use remote FHE server instead of local weights
hybrid_model = HybridFHEModel(
    model,  # PyTorch or Brevitas model
    submodule_name,
    server_remote_address="http://0.0.0.0:8000",
    model_name=f"{model_name}",
    verbose=False,
)
path_to_clients = Path(__file__).parent / "clients"
hybrid_model.init_client(path_to_clients=path_to_clients)

When the client application is ready to make inference requests to the server, set the operation mode of the HybridFHEModel instance to HybridFHEMode.REMOTE:

for module in hybrid_model.remote_modules.values():
    module.fhe_local_mode = HybridFHEMode.REMOTE    

For inference with the HybridFHEModel instance, hybrid_model, call the regular forward method as if the model was fully deployed locally::

hybrid_model(torch.randn((dim, )))

When calling HybridFHEModel, it handles all the necessary intermediate steps for each model part deployed remotely, including:

  • Quantizing the data.

  • Encrypting the data.

  • Making the request to the server using requests Python module.

  • Decrypting and de-quantizing the result.

The functions as follows:

Saves all necessary information required to serve these sub-models with FHE using the class.

To create a server application that serves these sub-models, use the class:

For more information about serving FHE models, see the .

Next, obtain the parameters necessary to encrypt and quantize data, as detailed in the .

standard deployment procedure
compile_brevitas_qat_model
compile_torch_model
save_and_clear_private_info
FHEModelDev
FHEModelServer
client/server section
client/server documentation