Only this pageAll pages
Powered by GitBook
1 of 48

1.6

Loading...

Get Started

Loading...

Loading...

Loading...

Loading...

Built-in Models

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Deep Learning

Loading...

Loading...

Loading...

Loading...

Loading...

Guides

Loading...

Loading...

Loading...

Loading...

Tutorials

Loading...

Loading...

Loading...

References

Loading...

Explanations

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Developers

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Linear models

Supported models for encrypted inference

Concrete ML
scikit-learn

Supported models for encrypted training

In addition to predicting on encrypted data, the following models support training on encrypted data.

Quantization parameters

The n_bits parameter controls the bit-width of the inputs and weights of the linear models. Linear models do not use table lookups and thus alllows weight and inputs to be high precision integers.

For models with input dimensions up to 300, the parameter n_bits can be set to 8 or more. When the input dimensions are larger, n_bits must be reduced to 6-7. In many cases, quantized models can preserve all performance metrics compared to the non-quantized float models from scikit-learn when n_bits is down to 6. You should validate accuracy on held-out test sets and adjust n_bits accordingly.

Pre-trained models

Example

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

from concrete.ml.sklearn import LogisticRegression

# Create the data for classification:
X, y = make_classification(
    n_features=30,
    n_redundant=0,
    n_informative=2,
    random_state=2,
    n_clusters_per_class=1,
    n_samples=250,
)

# Retrieve train and test sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

# Instantiate the model:
model = LogisticRegression(n_bits=8)

# Fit the model:
model.fit(X_train, y_train)

# Evaluate the model on the test set in clear:
y_pred_clear = model.predict(X_test)

# Compile the model:
model.compile(X_train)

# Perform the inference in FHE:
y_pred_fhe = model.predict(X_test, fhe="execute")

# Assert that FHE predictions are the same as the clear predictions:
print(
    f"{(y_pred_fhe == y_pred_clear).sum()} examples over {len(y_pred_fhe)} "
    "have an FHE inference equal to the clear inference."
)

# Output:
#  100 examples over 100 have an FHE inference equal to the clear inference

Model accuracy

Loading a pre-trained model

An alternative to the example above is to train a scikit-learn model in a separate step and then to convert it to Concrete ML.

from sklearn.linear_model import LogisticRegression as SKlearnLogisticRegression

# Instantiate the model:
model = SKlearnLogisticRegression()

# Fit the model:
model.fit(X_train, y_train)

cml_model = LogisticRegression.from_sklearn_model(model, X_train, n_bits=8)

# Compile the model:
cml_model.compile(X_train)

# Perform the inference in FHE:
y_pred_fhe = cml_model.predict(X_test, fhe="execute")

Zama 5-Question Developer Survey

What is Concrete ML?

Concrete ML is an open source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to perform:

Key features

Example usage

This example shows the typical flow of a Concrete ML model:

  1. Training the model: Train the model on unencrypted (plaintext) data using scikit-learn. Since Fully Homomorphic Encryption (FHE) operates over integers, Concrete ML quantizes the model to use only integers during inference.

  2. Compiling the model: Compile the quantized model to an FHE equivalent. Under the hood, the model is first converted to a Concrete Python program and then compiled.

It is also possible to call encryption, model prediction, and decryption functions separately as follows. Executing these steps separately is equivalent to calling predict_proba on the model instance.

Current limitations

  • Precision and accuracy: In order to run models in FHE, Concrete ML requires models to be within the precision limit, currently 16-bit integers. Thus, machine learning models must be quantized and it sometimes leads to a loss of accuracy versus the original model that operates on plaintext.

  • Models availability: Concrete ML currently only supports training on encrypted data for some models, while it supports inference for a large variety of models.

  • Processing: Concrete currently doesn't support pre-processing model inputs and post-processing model outputs. These processing stages may involve:

    • Text-to-numerical feature transformation

    • Dimensionality reduction

    • KNN or clustering

    • Featurization

    • Normalization

    • The mixing of ensemble models' results.

These issues are currently being addressed, and significant improvements are expected to be released in the near future.

Concrete stack

Online demos and tutorials

If you have built awesome projects using Concrete ML, feel free to let us know and we'll link to your work!

Additional resources

Support

Key concepts

This document explains the essential cryptographic terms and the important concepts of Concrete ML model lifecycle with Fully Homomorphic Encryption (FHE).

Concrete ML is built on top of Concrete, which enables the conversion from NumPy programs into FHE circuits.

Lifecycle of a Concrete ML model

With Concrete ML, you can train a model on clear or encrypted data, then deploy it to predict on encrypted inputs. During deployment, data can be pre-processed while being encrypted. Therefore, data stay encrypted during the entire lifecycle of the machine learning model, with some limitations.

I. Model development

  1. Training: A model is trained either using plaintext (non-encrypted) training data, or encrypted training data.

    • During training (Quantization Aware Training)

    • After training (Post-training Quantization)

  2. Inference: The compiled model can then be executed on encrypted data, once the proper keys have been generated. The model can also be deployed to a server and used to run private inference on encrypted inputs.

II. Model deployment

  1. Client/server model deployment: In a client/server setting, Concrete ML models can be exported to:

    • Allow the client to generate keys, encrypt, and decrypt.

    • Provide a compiled model that can run on the server to perform inference on encrypted data.

  2. Key generation: The data owner (client) needs to generate a set of keys:

    • A private encryption key to encrypt/decrypt their data and results

    • A public evaluation key for the model's FHE evaluation on the server.

Cryptography concepts

Concrete ML and Concrete abstract the details of the underlying cryptography scheme, TFHE. However, understanding some cryptography concepts is still useful:

  • Encryption and decryption: Encryption converts human-readable information (plaintext) into data (ciphertext) that is unreadable by a human or computer unless with the proper key. Encryption takes plaintext and an encryption key and produces ciphertext, while decryption is the reverse operation.

  • Encrypted inference: FHE allows third parties to execute a machine learning model on encrypted data. The inference result is also encrypted and can only be decrypted by the key holder.

  • Key generation: Cryptographic keys are generated using random number generators. Key generation can be time-consuming and produce large keys, but each model used by a client only requires key generation once.

    • Private encryption key: A private encryption key is a series of bits used within an encryption algorithm for encrypting data so that the corresponding ciphertext appears random.

    • Public evaluation key: A public evaluation key is used to perform homomorphic operations on encrypted data, typically by a server.

  • Guaranteed correctness of encrypted computations: To ensure security, TFHE adds random noise to ciphertexts. Depending on the noise parameters, it can cause errors during encrypted data processing. By default, Concrete ML uses parameters that guarantee the correctness of encrypted computations, so the results on encrypted data equals to those of simulations on clear data.

Model accuracy considerations under FHE constraints

FHE requires all inputs, constants, and intermediate values to be integers of maximum 16 bits. To make machine learning models compatible with FHE, Concrete ML implements some techniques with accuracy considerations:

This page explains Concrete ML linear models for both classification and regression. These models are based on linear models.

The following models are supported for training on clear data and predicting on encrypted data. Their API is similar the one of . These models are also compatible with some of scikit-learn's main workflows, such as Pipeline() and GridSearch().

| | |

For optimal results, you can use standard or min-max normalization to achieve a similar distribution of individual features. When there are many one-hot features, consider as a pre-processing stage.

For a more detailed comparison of the impact of such pre-processing, please refer to .

You can convert an already trained scikit-learn linear model to a Concrete ML one by using the method. See .

The following example shows how to train a LogisticRegression model on a simple data-set and then use FHE to perform inference on encrypted data. You can find a more complete example in the .

The figure below compares the decision boundary of the FHE classifier and a scikit-learn model executed in clear. You can find the complete code in the .

The overall accuracy scores are identical (93%) between the scikit-learn model (executed in the clear) and the Concrete ML one (executed in FHE). In fact, quantization has little impact on the decision boundaries, as linear models can use large precision numbers when quantizing inputs and weights in Concrete ML. Additionally, as the linear models do not use , the FHE computations are always exact, irrespective of the . This ensures that the FHE predictions are always identical to the quantized clear ones.

We want to hear from you! Take 1 minute to share your thoughts and helping us enhance our documentation and libraries. 👉 to participate.

Automatic model conversion: Use familiar APIs from scikit-learn and PyTorch to convert machine learning models to their FHE equivalent. This is applicable for , , and ).

Encrypted data training: directly on encrypted data to maintain privacy.

Encrypted data pre-processing: using a DataFrame paradigm.

Training on encrypted data: FHE is an encryption technique that allows computing directly on encrypted data, without needing to decrypt it. With FHE, you can build private-by-design applications without compromising on features. Learn more about FHE in or join the community.

Federated learning: Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured by using a trusted gradient aggregator, coupled with optional differential privacy instead of encryption. Concrete ML can import all types of models: linear, tree-based and neural networks, that are trained using federated learning using the and the function.

Here is a simple example of classification on encrypted data using logistic regression. You can find more examples .

Performing inference: Perform inference on encrypted data. The example above shows encrypted inference in the model-development phase. Alternatively, during in a client/server setting, the client encrypts the data, the server processes it securely, and then the client decrypts the results.

Concrete ML is built on top of Zama's .

Various tutorials are available for and . Several stand-alone demos for use cases can be found in the section.

(we answer in less than 24 hours).

Quantization: Quantization converts inputs, model weights, and all intermediate values of the inference computation to integer equivalents. More information is available . Concrete ML performs this step in two ways depending on model type:

Simulation: Simulation allows you to execute a model that was quantized, to measure its accuracy in FHE, and to determine the modifications required to make it FHE compatible. Simulation is described in more detail .

Compilation: After quantizing the model and confirming that it has good FHE accuracy through simulation, the model then needs to be compiled using Concrete's FHE Compiler to produce an equivalent FHE circuit. This circuit is represented as an MLIR program consisting of low level cryptographic operations. You can read more about FHE compilation , MLIR , and about the low-level Concrete library .

You can find examples of the model development workflow .

Pre-processing: Data owners(client) can generate keys to encrypt/decrypt data and store it in a for further processing on a server. The server can pre-process such data with pre-compiled circuits, to prepare it for encrypted training or inference.

You can find an example of the model deployment workflow .

Programmable Boostrapping (PBS) : Programmable Bootstrapping enables the homomorphic evaluation of any function of a ciphertext, with a controlled level of noise. Learn more about PBS in .

For a deeper understanding of the cryptography behind the Concrete stack, refer to the or .

Quantization: Concrete ML quantizes inputs, outputs, weights, and activations to meet FHE limitations. See for details.

Accuracy trade-off: Quantization may reduce accuracy, but careful selection of quantization parameters or of the training approach can mitigate this. Concrete ML offers built-in quantized models; users only configure parameters like bit-width. For more details of quantization configurations, see .

Additional methods: Dimensionality reduction and pruning are additional ways to make programs compatible for FHE. See for dimensionality reduction and for pruning.

scikit-learn
scikit-learn
SGDClassifier
SGDClassifier
Principal Component Analysis
the logistic regression notebook
LogisticRegression notebook
LogisticRegression notebook
Click here
from_sklearn_model
the following example
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression

# Lets create a synthetic data-set
x, y = make_classification(n_samples=100, class_sep=2, n_features=30, random_state=42)

# Split the data-set into a train and test set
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42
)

# Now we train in the clear and quantize the weights
model = LogisticRegression(n_bits=8)
model.fit(X_train, y_train)

# We can simulate the predictions in the clear
y_pred_clear = model.predict(X_test)

# We then compile on a representative set
model.compile(X_train)

# Finally we run the inference on encrypted inputs
y_pred_fhe = model.predict(X_test, fhe="execute")

print(f"In clear  : {y_pred_clear}")
print(f"In FHE    : {y_pred_fhe}")
print(f"Similarity: {(y_pred_fhe == y_pred_clear).mean():.1%}")

# Output:
    # In clear  : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # In FHE    : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # Similarity: 100.0%
# Predict probability for a single example
y_proba_fhe = model.predict_proba(X_test[[0]], fhe="execute")

# Quantize an original float input
q_input = model.quantize_input(X_test[[0]])

# Encrypt the input
q_input_enc = model.fhe_circuit.encrypt(q_input)

# Execute the linear product in FHE 
q_y_enc = model.fhe_circuit.run(q_input_enc)

# Decrypt the result (integer)
q_y = model.fhe_circuit.decrypt(q_y_enc)

# De-quantize and post-process the result
y0 = model.post_processing(model.dequantize_output(q_y))

print("Probability with `predict_proba`: ", y_proba_fhe)
print("Probability with encrypt/run/decrypt calls: ", y0)
LinearRegression
LinearRegression
LogisticRegression
LogisticRegression
LinearSVC
LinearSVC
LinearSVR
LinearSVR
PoissonRegressor
PoissonRegressor
TweedieRegressor
TweedieRegressor
GammaRegressor
GammaRegressor
Lasso
Lasso
Ridge
Ridge
ElasticNet
ElasticNet
SGDRegressor
SGDRegressor
linear models
tree-based models
neural networks
Train models
Pre-process encrypted data
this introduction
FHE.org
here
deployment
Concrete
built-in models
deep learning
Demos and Tutorials
Zama's blog
Community channels
here
here
here
here
here
DataFrame
here
this paper
whitepaper on TFHE and Programmable Boostrapping
this series of blogs
the quantization documentation
Poisson regression example
built-in neural networks
compile_torch_model
from_sklearn_model function

Nearest neighbors

This document introduces the nearest neighbors non-parametric classification models that Concrete ML provides with a scikit-learn interface through the KNeighborsClassifier class.

Concrete ML
scikit-learn

Example

from concrete.ml.sklearn import KNeighborsClassifier

concrete_classifier = KNeighborsClassifier(n_bits=2, n_neighbors=3)

Quantization parameters

The predict method of the KNeighborsClassifier performs the following steps:

  1. Quantize the test vectors on clear data

  2. Compute the top-k class indices of the closest training set vector on encrypted data

  3. Vote for the top-k class labels to find the class for each test vector, performed on clear data

Inference time considerations

The FHE inference latency of this model is heavily influenced by the n_bits and the dimensionality of the data. Additionally, the data-set size has a linear impact on the data complexity. The number of nearest neighbors (n_neighbors) also affects performance.

Encrypted training

Training on encrypted data is done through an FHE program that is generated by Concrete ML, based on the characteristics of the data that are given to the fit function. Once the FHE program associated with the SGDClassifier object has fit the encrypted data, it performs specifically to that data's distribution and dimensionality.

When deploying encrypted training services, you need to consider the type of data that future users of your services will train on:

  • The distribution of the data should match to achieve good accuracy

  • The dimensionality of the data needs to match since the deployed FHE programs are compiled for a fixed number of dimensions.

Example

The following snippet shows how to instantiate a logistic regression model that trains on encrypted data:

from concrete.ml.sklearn import SGDClassifier
parameters_range = (-1.0, 1.0)

model = SGDClassifier(
    random_state=42,
    max_iter=50,
    fit_encrypted=True,
    parameters_range=parameters_range,
)

To activate encrypted training, simply set fit_encrypted=True in the constructor. When the value is set, Concrete ML generates an FHE program which, when called through the fit function, processes encrypted training data, labels and initial weights and outputs trained model weights. If this value is not set, training is performed on clear data using scikit-learn gradient descent.

Next, to perform the training on encrypted data, call the fit function with the fhe="execute" argument:

model.fit(X_binary, y_binary, fhe="execute")

Training configuration

The max_iter parameter controls the number of batches that are processed by the training algorithm.

Capabilities and Limitations

The trainable logistic model uses Stochastic Gradient Descent (SGD) and quantizes the data, weights, gradients and the error measure. It currently supports training 6-bit models, including g both the coefficients and the bias.

The SGDClassifier does not currently support training models with other bit-width values. The execution time to train a model is proportional to the number of features and the number of training examples in the batch. The SGDClassifier training does not currently support client/server deployment for training.

Deployment

Tree-based models

Supported models

Concrete ML
scikit-learn
Concrete ML
XGboost

Using the maximum depth parameter of decision trees and tree-ensemble models strongly increases the number of nodes in the trees. Therefore, we recommend using the XGBoost models which achieve better performance with lower depth.

Pre-trained models

Example

from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from concrete.ml.sklearn.xgb import XGBClassifier


# Get data-set and split into train and test
X, y = load_breast_cancer(return_X_y=True)

# Split the train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Define our model
model = XGBClassifier(n_jobs=1, n_bits=3)

# Define the pipeline
# We normalize the data and apply a PCA before fitting the model
pipeline = Pipeline(
    [("standard_scaler", StandardScaler()), ("pca", PCA(random_state=0)), ("model", model)]
)

# Define the parameters to tune
param_grid = {
    "pca__n_components": [2, 5, 10, 15],
    "model__max_depth": [2, 3, 5],
    "model__n_estimators": [5, 10, 20],
}

# Instantiate the grid search with 5-fold cross validation on all available cores
grid = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1, scoring="accuracy")

# Launch the grid search
grid.fit(X_train, y_train)

# Print the best parameters found
print(f"Best parameters found: {grid.best_params_}")

# Output:
#  Best parameters found: {'model__max_depth': 5, 'model__n_estimators': 10, 'pca__n_components': 5}

# Currently we only focus on model inference in FHE
# The data transformation is done in clear (client machine)
# while the model inference is done in FHE on a server.
# The pipeline can be split into 2 parts:
#   1. data transformation
#   2. estimator
best_pipeline = grid.best_estimator_
data_transformation_pipeline = best_pipeline[:-1]
model = best_pipeline[-1]

# Transform test set
X_train_transformed = data_transformation_pipeline.transform(X_train)
X_test_transformed = data_transformation_pipeline.transform(X_test)

# Evaluate the model on the test set in clear
y_pred_clear = model.predict(X_test_transformed)
print(f"Test accuracy in clear: {(y_pred_clear == y_test).mean():0.2f}")

# In the output, the Test accuracy in clear should be > 0.9

# Compile the model to FHE
model.compile(X_train_transformed)

# Perform the inference in FHE
# Warning: this will take a while. It is recommended to run this with a very small batch of
# example first (e.g., N_TEST_FHE = 1)
# Note that here the encryption and decryption is done behind the scene.
N_TEST_FHE = 1
y_pred_fhe = model.predict(X_test_transformed[:N_TEST_FHE], fhe="execute")

# Assert that FHE predictions are the same as the clear predictions
print(f"{(y_pred_fhe == y_pred_clear[:N_TEST_FHE]).sum()} "
      f"examples over {N_TEST_FHE} have an FHE inference equal to the clear inference.")

# Output:
#  1 examples over 1 have an FHE inference equal to the clear inference

Quantization parameters

When using a sufficiently high bit-width, quantization has little impact on the decision boundaries of the Concrete ML FHE decision tree model, as quantization is done individually on each input feature. It means FHE models can achieve similar accuracy levels as floating point models. Using 6 bits for quantization is effective in reaching or even exceeding floating point accuracy.

To adjust the number of bits for quantization, use the n_bits parameter. Setting n_bits to a low value may introduce artifacts, potentially reducing accuracy. However, the execution speed in FHE could improve. This adjustment allows you to manage the accuracy/speed trade-off. Additionally, you can recover some accuracy by increasing the n_estimators parameter.

The following graph shows that using 5-6 bits of quantization is usually sufficient to reach the performance of a non-quantized XGBoost model on floating point data. The metrics plotted are accuracy and F1-score on the spambase data-set.

FHE Inference time considerations

The inference time in FHE is strongly dependant on the maximum circuit bit-width. For trees, in most cases, the quantization bit-width will be the same as the circuit bit-width. Therefore, reducing the quantization bit-width to 4 or less will result in fast inference times. Adding more bits will increase FHE inference time exponentially.

In some rare cases, the bit-width of the circuit can be higher than the quantization bit-width. This could happen when the quantization bit-width is low but the tree-depth is high. In such cases, the circuit bit-width is upper bounded by ceil(log2(max_depth + 1) + 1).

Using ONNX

ONNX models can be compiled by directly importing models that are already quantized with Quantization Aware Training (QAT) or by performing Post-Training Quantization (PTQ) with Concrete ML.

Simple example

The following example shows how to compile an ONNX model using PTQ. The model was initially trained using Keras before being exported to ONNX. The training code is not shown here.

While Keras was used in this example, it is not officially supported. Additional work is needed to test all of Keras's types of layers and models.

Quantization Aware Training

Supported operators

The following operators are supported for evaluation and conversion to an equivalent FHE circuit. Other operators were not implemented, either due to FHE constraints or because they are rarely used in PyTorch activations or scikit-learn models.

  • Abs

  • Acos

  • Acosh

  • Add

  • Asin

  • Asinh

  • Atan

  • Atanh

  • AveragePool

  • BatchNormalization

  • Cast

  • Celu

  • Clip

  • Concat

  • Constant

  • ConstantOfShape

  • Conv

  • Cos

  • Cosh

  • Div

  • Elu

  • Equal

  • Erf

  • Exp

  • Expand

  • Flatten

  • Floor

  • Gather

  • Gemm

  • Greater

  • GreaterOrEqual

  • HardSigmoid

  • HardSwish

  • Identity

  • LeakyRelu

  • Less

  • LessOrEqual

  • Log

  • MatMul

  • Max

  • MaxPool

  • Min

  • Mul

  • Neg

  • Not

  • Or

  • PRelu

  • Pad

  • Pow

  • ReduceSum

  • Relu

  • Reshape

  • Round

  • Selu

  • Shape

  • Sigmoid

  • Sign

  • Sin

  • Sinh

  • Slice

  • Softplus

  • Squeeze

  • Sub

  • Tan

  • Tanh

  • ThresholdedRelu

  • Transpose

  • Unfold

  • Unsqueeze

  • Where

  • onnx.brevitas.Quant

Programmable Boostrapping

The KNeighborsClassifier class quantizes the training data-set provided to .fit using the specified number of bits (n_bits). To comply with , you must keep this value low. The model's accuracy will depend significantly on a well-chosen n_bits value and the dimensionality of the data.

The KNN computation executes in FHE in O(Nlog2k)O(Nlog^2k)O(Nlog2k) steps, where NNN is the training data-set size and kkk is n_neighbors. Each step requires several , with their runtime affected by the factors listed above. These factors determine the precision needed to represent the distances between test vectors and training data-set vectors. The PBS input precision required by the circuit is related to the precision of the distance values.

This document explains how to train on encrypted data.

See the section for more details.

Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured by using a trusted gradient aggregator, coupled with optional differential privacy instead of encryption. Concrete ML can import models trained through federated learning using 3rd party tools. All model types are supported - linear, tree-based and neural networks - through the and the function.

The example shows logistic regression training on encrypted data in action.

The parameters_range parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range [−1,1][-1, 1][−1,1].

Once you have tested an SGDClassifier that trains on encrypted data, you can build an FHE training service by deploying the FHE training program of the SGDClassifier. See the page for more details on how to the Concrete ML deployment utility classes. To deploy an FHE training program, you must pass the mode='training' parameter to the FHEModelDev class.

This document introduces several 's linear models for classification and regression tree models that Concrete ML provides.

Concrete ML also supports 's XGBClassifier and XGBRegressor:

For a formal explanation of the mechanisms that enable FHE-compatible decision trees, please see the following paper:

You can convert an already trained scikit-learn tree-based model to a Concrete ML one by using the method.

Here's an example of how to use this model in FHE on a popular data-set using some of scikit-learn's pre-processing tools. You can find a more complete example in the .

We can plot and compare the decision boundaries of the Concrete ML model and the classical XGBoost model executed in the clear. Here we show a 6-bit model to illustrate the impact of quantization on classification. You will find similar plots in the .

For more information on the inference time of FHE decision trees and tree-ensemble models please see .

In addition to Concrete ML models and , it is also possible to directly compile models. This can be particularly appealing, notably to import models trained with Keras.

This example uses Post-Training Quantization, i.e., the quantization is not performed during training. This model would not have good performance in FHE. Quantization Aware Training should be added by the model developer. Additionally, importing QAT ONNX models can be done .

Models trained using contain quantizers in the ONNX graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized. Since these QAT models have quantizers that are configured during training to a specific number of bits, the ONNX graph will need to be imported using the same settings:

SGD Logistic Regression
logistic regression training
Production Deloyment
scikit-learn
XGBoost
Privacy-Preserving Tree-Based Inference with Fully Homomorphic Encryption, arXiv:2303.01254
from_sklearn_model
XGBClassifier notebook
Classifier Comparison notebook
Privacy-Preserving Tree-Based Inference with Fully Homomorphic Encryption, arXiv:2303.01254
accumulator size constraints
deployment
compile_torch_model
from_sklearn_model function
import numpy
import onnx
import tensorflow
import tf2onnx

from concrete.ml.torch.compile import compile_onnx_model
from concrete.fhe.compilation import Configuration


class FC(tensorflow.keras.Model):
    """A fully-connected model."""

    def __init__(self):
        super().__init__()
        hidden_layer_size = 10
        output_size = 5

        self.dense1 = tensorflow.keras.layers.Dense(
            hidden_layer_size,
            activation=tensorflow.nn.relu,
        )
        self.dense2 = tensorflow.keras.layers.Dense(output_size, activation=tensorflow.nn.relu6)
        self.flatten = tensorflow.keras.layers.Flatten()

    def call(self, inputs):
        """Forward function."""
        x = self.flatten(inputs)
        x = self.dense1(x)
        x = self.dense2(x)
        return self.flatten(x)


n_bits = 6
input_output_feature = 2
input_shape = (input_output_feature,)
num_inputs = 1
n_examples = 5000

# Define the Keras model
keras_model = FC()
keras_model.build((None,) + input_shape)
keras_model.compute_output_shape(input_shape=(None, input_output_feature))

# Create random input
input_set = numpy.random.uniform(-100, 100, size=(n_examples, *input_shape))

# Convert to ONNX
tf2onnx.convert.from_keras(keras_model, opset=14, output_path="tmp.model.onnx")

onnx_model = onnx.load("tmp.model.onnx")
onnx.checker.check_model(onnx_model)

# Compile
quantized_module = compile_onnx_model(
    onnx_model, input_set, n_bits=2
)

# Create test data from the same distribution and quantize using
# learned quantization parameters during compilation
x_test = tuple(numpy.random.uniform(-100, 100, size=(1, *input_shape)) for _ in range(num_inputs))

y_clear = quantized_module.forward(*x_test, fhe="disable")
y_fhe = quantized_module.forward(*x_test, fhe="execute")

print("Execution in clear: ", y_clear)
print("Execution in FHE:   ", y_fhe)
print("Equality:           ", numpy.sum(y_clear == y_fhe), "over", numpy.size(y_fhe), "values")
# Define the number of bits to use for quantizing weights and activations during training
n_bits_qat = 3  

quantized_numpy_module = compile_onnx_model(
    onnx_model,
    input_set,
    import_qat=True,
    n_bits=n_bits_qat,
)
PBS operations
KNeighborsClassifier
KNeighborsClassifier
DecisionTreeClassifier
DecisionTreeClassifier
DecisionTreeRegressor
DecisionTreeRegressor
RandomForestClassifier
RandomForestClassifier
RandomForestRegressor
RandomForestRegressor
XGBClassifier
XGBClassifier
XGBRegressor
XGBRegressor
custom models in torch
ONNX
Quantization Aware Training
as shown below

Step-by-step guide

This guide provides a complete example of converting a PyTorch neural network into its FHE-friendly, quantized counterpart. It focuses on Quantization Aware Training a simple network on a synthetic data-set.

In general, quantization can be carried out in two different ways: either during Quantization Aware Training (QAT) or after the training phase with Post-Training Quantization (PTQ).

For a formal explanation of the mechanisms that enable FHE-compatible neural networks, please see the the following paper.

Baseline PyTorch model

In PyTorch, using standard layers, a fully connected neural network (FCNN) would look like this:

import torch
from torch import nn

IN_FEAT = 2
OUT_FEAT = 2

class SimpleNet(nn.Module):
    """Simple MLP with PyTorch"""

    def __init__(self, n_hidden = 30):
        super().__init__()
        self.fc1 = nn.Linear(in_features=IN_FEAT, out_features=n_hidden)
        self.fc2 = nn.Linear(in_features=n_hidden, out_features=n_hidden)
        self.fc3 = nn.Linear(in_features=n_hidden, out_features=OUT_FEAT)


    def forward(self, x):
        """Forward pass."""
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

The network was trained using different numbers of neurons in the hidden layers, and quantized using 3-bits weights and activations. The mean accumulator size shown below is measured as the mean over 10 runs of the experiment. An accumulator of 6.6 means that 4 times out of 10 the accumulator measured was 6 bits while 6 times it was 7 bits.

neurons
10
30
100

fp32 accuracy

68.70%

83.32%

88.06%

3-bit accuracy

56.44%

55.54%

56.50%

mean accumulator size

6.6

6.9

7.4

This shows that the fp32 accuracy and accumulator size increases with the number of hidden neurons, while the 3-bits accuracy remains low irrespective of the number of neurons. While all the configurations tried here were FHE-compatible (accumulator < 16 bits), it is often preferable to have a lower accumulator size in order to speed up inference time.

Accumulator size is determined by Concrete as being the maximum bit-width encountered anywhere in the encrypted circuit.

Quantization Aware Training:

Brevitas provides a quantized version of almost all PyTorch layers (Linear layer becomes QuantLinear, ReLU layer becomes QuantReLU and so on), plus some extra quantization parameters, such as :

  • bit_width: precision quantization bits for activations

  • act_quant: quantization protocol for the activations

  • weight_bit_width: precision quantization bits for weights

  • weight_quant: quantization protocol for the weights

In order to use FHE, the network must be quantized from end to end, and thanks to the Brevitas's QuantIdentity layer, it is possible to quantize the input by placing it at the entry point of the network. Moreover, it is also possible to combine PyTorch and Brevitas layers, provided that a QuantIdentity is placed after this PyTorch layer. The following table gives the replacements to be made to convert a PyTorch NN for Concrete ML compatibility.

PyTorch fp32 layer
Concrete ML model with PyTorch/Brevitas

torch.nn.Linear

brevitas.quant.QuantLinear

torch.nn.Conv2d

brevitas.quant.Conv2d

torch.nn.AvgPool2d

torch.nn.AvgPool2d + brevitas.quant.QuantIdentity

torch.nn.ReLU

brevitas.quant.QuantReLU

Some PyTorch operators (from the PyTorch functional API), require a brevitas.quant.QuantIdentity to be applied on their inputs.

PyTorch ops that require QuantIdentity

torch.transpose

torch.add (between two activation tensors)

torch.reshape

torch.flatten

The QAT import tool in Concrete ML is a work in progress. While it has been tested with some networks built with Brevitas, it is possible to use other tools to obtain QAT networks.

With Brevitas, the network above becomes:

from brevitas import nn as qnn
from brevitas.core.quant import QuantType
from brevitas.quant import Int8ActPerTensorFloat, Int8WeightPerTensorFloat

N_BITS = 3
IN_FEAT = 2
OUT_FEAT = 2

class QuantSimpleNet(nn.Module):
    def __init__(
        self,
        n_hidden,
        qlinear_args={
            "weight_bit_width": N_BITS,
            "weight_quant": Int8WeightPerTensorFloat,
            "bias": True,
            "bias_quant": None,
            "narrow_range": True
        },
        qidentity_args={"bit_width": N_BITS, "act_quant": Int8ActPerTensorFloat},
    ):
        super().__init__()

        self.quant_inp = qnn.QuantIdentity(**qidentity_args)
        self.fc1 = qnn.QuantLinear(IN_FEAT, n_hidden, **qlinear_args)
        self.relu1 = qnn.QuantReLU(bit_width=qidentity_args["bit_width"])
        self.fc2 = qnn.QuantLinear(n_hidden, n_hidden, **qlinear_args)
        self.relu2 = qnn.QuantReLU(bit_width=qidentity_args["bit_width"])
        self.fc3 = qnn.QuantLinear(n_hidden, OUT_FEAT, **qlinear_args)

        for m in self.modules():
            if isinstance(m, qnn.QuantLinear):
                torch.nn.init.uniform_(m.weight.data, -1, 1)

    def forward(self, x):
        x = self.quant_inp(x)
        x = self.relu1(self.fc1(x))
        x = self.relu2(self.fc2(x))
        x = self.fc3(x)
        return x       

In the network above, biases are used for linear layers but are not quantized ("bias": True, "bias_quant": None). The addition of the bias is a univariate operation and is fused into the activation function.

Training this network with pruning (see below) with 30 out of 100 total non-zero neurons gives good accuracy while keeping the accumulator size low.

Non-zero neurons
30

3-bit accuracy brevitas

95.4%

3-bit accuracy in Concrete ML

95.4%

Accumulator size

7

The PyTorch QAT training loop is the same as the standard floating point training loop, but hyper-parameters such as learning rate might need to be adjusted.

Quantization Aware Training is somewhat slower than normal training. QAT introduces quantization during both the forward and backward passes. The quantization process is inefficient on GPUs as its computational intensity is low with respect to data transfer time.

Pruning using Torch

Considering that FHE only works with limited integer precision, there is a risk of overflowing in the accumulator, which will make Concrete ML raise an error.

To understand how to overcome this limitation, consider a scenario where 2 bits are used for weights and layer inputs/outputs. The Linear layer computes a dot product between weights and inputs y=∑iwixiy = \sum_i w_i x_iy=∑i​wi​xi​. With 2 bits, no overflow can occur during the computation of the Linear layer as long the number of neurons does not exceed 14, as in the sum of 14 products of 2-bits numbers does not exceed 7 bits.

By default, Concrete ML uses symmetric quantization for model weights, with values in the interval [−2nbits−1,2nbits−1−1]\left[-2^{n_{bits}-1}, 2^{n_{bits}-1}-1\right][−2nbits​−1,2nbits​−1−1]. For example, for nbits=2n_{bits}=2nbits​=2 the possible values are [−2,−1,0,1][-2, -1, 0, 1][−2,−1,0,1]; for nbits=3n_{bits}=3nbits​=3, the values can be [−4,−3,−2,−1,0,1,2,3][-4,-3,-2,-1,0,1,2,3][−4,−3,−2,−1,0,1,2,3].

In a typical setting, the weights will not all have the maximum or minimum values (e.g., −2nbits−1-2^{n_{bits}-1}−2nbits​−1). Weights typically have a normal distribution around 0, which is one of the motivating factors for their symmetric quantization. A symmetric distribution and many zero-valued weights are desirable because opposite sign weights can cancel each other out and zero weights do not increase the accumulator size.

The following code shows how to use pruning in the previous example:

import torch.nn.utils.prune as prune

class PrunedQuantNet(SimpleNet):
    """Simple MLP with PyTorch"""

    pruned_layers = set()

    def prune(self, max_non_zero):
        # Linear layer weight has dimensions NumOutputs x NumInputs
        for name, layer in self.named_modules():
            if isinstance(layer, nn.Linear):
                print(name, layer)
                num_zero_weights = (layer.weight.shape[1] - max_non_zero) * layer.weight.shape[0]
                if num_zero_weights <= 0:
                    continue
                print(f"Pruning layer {name} factor {num_zero_weights}")
                prune.l1_unstructured(layer, "weight", amount=num_zero_weights)
                self.pruned_layers.add(name)

    def unprune(self):
        for name, layer in self.named_modules():
            if name in self.pruned_layers:
                prune.remove(layer, "weight")
                self.pruned_layers.remove(name)

Results with PrunedQuantNet, a pruned version of the QuantSimpleNet with 100 neurons on the hidden layers, are given below, showing a mean accumulator size measured over 10 runs of the experiment:

Non-zero neurons
10
30

3-bit accuracy

82.50%

88.06%

Mean accumulator size

6.6

6.8

This shows that the fp32 accuracy has been improved while maintaining constant mean accumulator size.

When pruning a larger neural network during training, it is easier to obtain a low bit-width accumulator while maintaining better final accuracy. Thus, pruning is more robust than training a similar, smaller network.

Production deployment

Concrete ML provides functionality to deploy FHE machine learning models in a client/server setting. The deployment workflow and model serving pattern is as follows:

Deployment

The diagram above shows the steps that a developer goes through to prepare a model for encrypted inference in a client/server setting. The training of the model and its compilation to FHE are performed on a development machine. Three different files are created when saving the model:

  • client.zip contains client.specs.json which lists the secure cryptographic parameters needed for the client to generate private and evaluation keys. It also contains serialized_processing.json which describes the pre-processing and post-processing required by the machine learning model, such as quantization parameters to quantize the input and de-quantize the output.

  • server.zip contains the compiled model. This file is sufficient to run the model on a server. The compiled model is machine-architecture specific (i.e., a model compiled on x86 cannot run on ARM).

The compiled model (server.zip) is deployed to a server and the cryptographic parameters (client.zip) are shared with the clients. In some settings, such as a phone application, the client.zip can be directly deployed on the client device and the server does not need to host it.

Important Note: In a client-server production using FHE, the server's output format depends on the model type. For regressors, the output matches the predict() method from scikit-learn, providing direct predictions. For classifiers, the output uses the predict_proba() method format, offering probability scores for each class, which allows clients to determine class membership by applying a threshold (commonly 0.5).

Using the API Classes

The FHEModelDev, FHEModelClient, and FHEModelServer classes in the concrete.ml.deployment module make it easy to deploy and interact between the client and server:

  • FHEModelClient: This class is used on the client side to generate and serialize the cryptographic keys, encrypt the data before sending it to the server, and decrypt the results received from the server. It also handles the loading of quantization parameters and pre/post-processing from serialized_processing.json.

  • FHEModelServer: This class is used on the server side to load the FHE circuit from server.zip and execute the model on encrypted data received from the client.

Example Usage

from concrete.ml.sklearn import DecisionTreeClassifier
from concrete.ml.deployment import FHEModelDev, FHEModelClient, FHEModelServer
import numpy as np

# Define the directory for FHE client/server files
fhe_directory = '/tmp/fhe_client_server_files/'

# Initialize the Decision Tree model
model = DecisionTreeClassifier()

# Generate some random data for training
X = np.random.rand(100, 20)
y = np.random.randint(0, 2, size=100)

# Train and compile the model
model.fit(X, y)
model.compile(X)

# Setup the development environment
dev = FHEModelDev(path_dir=fhe_directory, model=model)
dev.save()

# Setup the client
client = FHEModelClient(path_dir=fhe_directory, key_dir="/tmp/keys_client")
serialized_evaluation_keys = client.get_serialized_evaluation_keys()

# Client pre-processes new data
X_new = np.random.rand(1, 20)
encrypted_data = client.quantize_encrypt_serialize(X_new)

# Setup the server
server = FHEModelServer(path_dir=fhe_directory)
server.load()

# Server processes the encrypted data
encrypted_result = server.run(encrypted_data, serialized_evaluation_keys)

# Client decrypts the result
result = client.deserialize_decrypt_dequantize(encrypted_result)

Data Transfer Overview:

  • From Client to Server: serialized_evaluation_keys (once), encrypted_data.

  • From Server to Client: encrypted_result.

These objects are serialized into bytes to streamline the data transfer between the client and server.

Serving

The client-side deployment of a secured inference machine learning model follows the schema above. First, the client obtains the cryptographic parameters (stored in client.zip) and generates a private encryption/decryption key as well as a set of public evaluation keys. The public evaluation keys are then sent to the server, while the secret key remains on the client.

The private data is then encrypted by the client as described in the serialized_processing.json file in client.zip, and it is then sent to the server. Server-side, the FHE model inference is run on encrypted inputs using the public evaluation keys.

The encrypted result is then returned by the server to the client, which decrypts it using its private key. Finally, the client performs any necessary post-processing of the decrypted result as specified in serialized_processing.json (part of client.zip).

The server-side implementation of a Concrete ML model follows the diagram above. The public evaluation keys sent by clients are stored. They are then retrieved for the client that is querying the service and used to evaluate the machine learning model stored in server.zip. Finally, the server sends the encrypted result of the computation back to the client.

Example notebook

Inference in the cloud

This document illustrate how Concrete ML model and DataFrames are deployed in client/server setting when creating privacy-preserving services in the cloud.

Communication protocols

The overall communications protocol to enable cloud deployment of machine learning services can be summarized in the following diagram:

The steps detailed above are:

  1. Model Deployment: The model developer deploys the compiled machine learning model to the server. This model includes the cryptographic parameters. The server is now ready to provide private inference. Cryptographic parameters and compiled programs for DataFrames are included directly in Concrete ML.

  2. Client request: The client requests the cryptographic parameters (client specs). Once the client receives them from the server, the secret and evaluation keys are generated.

  3. Key exchanges: The client sends the evaluation key to the server. The server is now ready to accept requests from this client. The client sends their encrypted data. Serialized DataFrames include client evaluation keys.

  4. Private inference: The server uses the evaluation key to securely run prediction, training and pre-processing on the user's data and sends back the encrypted result.

  5. Decryption: The client now decrypts the result and can send back new requests.

See all tutorials

Start here

Go further

Live demos on Hugging Face:

Code examples on Github:

Blog tutorials:

Video tutorials

Zama 5-Question Developer Survey

Serialization

Concrete ML has support for serializing all available built-in models. Using this feature, one can dump a fitted and compiled model into a JSON string or file. The estimator can then be loaded back using the JSON object.

Saving Models

All built-in models provide the following methods:

  • dumps: dumps the model as a string.

  • dump: dumps the model into a file.

For example, a logistic regression model can be dumped in a string as below.

Similarly, it can be dumped into a file.

Alternatively, Concrete ML provides two equivalent global functions.

Some parameters used for instantiating Quantized Neural Network models are not supported for serialization. In particular, one cannot serialize a model that was instantiated using callable objects for the train_split and predict_nonlinearity parameters or with callbacks being enabled.

Loading Models

Loading a built-in model is possible through the following functions:

  • loads: loads the model from a string.

  • load: loads the model from a file.

A loaded model is required to be compiled once again in order for a user to be able to execute the inference in FHE or with simulation. This is because the underlying FHE circuit is currently not serialized. There is not required when FHE mode is disabled.

The above logistic regression model can therefore be loaded as below.

Built-in model examples

FHE constraints

In Concrete ML, built-in linear models are exact equivalents to their scikit-learn counterparts. As they do not apply any non-linearity during inference, these models are very fast (~1ms FHE inference time) and can use high-precision integers (between 20-25 bits).

Tree-based models apply non-linear functions that enable comparisons of inputs and trained thresholds. Thus, they are limited with respect to the number of bits used to represent the inputs. But as these examples show, in practice 5-6 bits are sufficient to exactly reproduce the behavior of their scikit-learn counterpart models.

In the examples below, built-in neural networks can be configured to work with user-specified accumulator sizes, which allow the user to adjust the speed/accuracy trade-off.

List of examples

1. Linear models

These examples show how to use the built-in linear models on synthetic data, which allows for easy visualization of the decision boundaries or trend lines. Executing these 1D and 2D models in FHE takes around 1 millisecond.

2. Generalized linear models

3. Decision tree

4. XGBoost and Random Forest classifier

5. XGBoost regression

6. Fully connected neural network

7. Comparison of models

Based on three different synthetic data-sets, all the built-in classifiers are demonstrated in this notebook, showing accuracies, inference times, accumulator bit-widths, and decision boundaries.

7. Training on encrypted data

This example shows how to configure a training algorithm that works on encrypted data and how to deploy it in a client/server application.

Regarding FHE-friendly neural networks, QAT is the best way to reach optimal accuracy under . This technique allows weights and activations to be reduced to very low bit-widths (e.g., 2-3 bits), which, combined with pruning, can keep accumulator bit-widths low.

Concrete ML uses the third-party library to perform QAT for PyTorch NNs, but options exist for other frameworks such as Keras/Tensorflow.

Several that use Brevitas are available in the Concrete ML library, such as the .

This guide is based on a , from which some code blocks are documented.

For a more formal description of the usage of Brevitas to build FHE-compatible neural networks, please see the .

The , example shows how to train a FCNN, similarly to the one above, on a synthetic 2D data-set with a checkerboard grid pattern of 100 x 100 points. The data is split into 9500 training and 500 test samples.

Once trained, this PyTorch network can be imported using the function. This function uses simple PTQ.

using is the best way to guarantee a good accuracy for Concrete ML compatible neural networks.

This fact can be leveraged to train a network with more neurons, while not overflowing the accumulator, using a technique called where the developer can impose a number of zero-valued weights. Torch out of the box.

FHEModelDev: Use the save method of this class during the development phase to prepare and save the model artifacts (client.zip and server.zip). This class handles the serialization of the underlying FHE circuit as well as the crypto-parameters used for generating the keys. By changing the mode parameter of the save method, you can deploy a trained model or a .

For a complete example, see or .

Once compiled to FHE, a Concrete ML model or DataFrame generates machine code that execute prediction, training or pre-processing on encrypted data. During this process, Concrete ML generates and .

For more information on how to implement this basic secure inference protocol, refer to the and to the . For information on training on encrypted data, see .

: Encrypted anonymization uses Fully Homomorphic Encryption (FHE) to anonymize personally identifiable information (PII) within encrypted documents, enabling computations to be performed on the encrypted data.

Check the

: Predicting credit scoring card approval application in which sensitive data can be shared and analyzed without exposing the actual information to neither the three parties involved, nor the server processing it.

Check the

: predicting if an encrypted tweet / short message is positive, negative or neutral, using FHE.

Check the and the

: giving a diagnosis using FHE to preserve the privacy of the patient based on a patient's symptoms, history and other health factors.

Check the

: filtering encrypted images by applying filters such as black-and-white, ridge detection, or your own filter.

Check the

: Privacy-preserving text generation based on a user's prompt

: Train an XGB classifier that can perform encrypted prediction for the

: Use federated learning to train a Logistic Regression while preserving training data confidentiality. Import the model into Concrete ML and perform encrypted prediction

: Fine-tune a VGG network to classify the CIFAR image data-sets and predict on encrypted data

:A Hugging Face space that securely analyzes the sentiment expressed in a short text

: Predict the chance of a given loan applicant defaulting on loan repayment

- April 2024

- February 2024

- June 2023

- June 2023

- May 2023

- February 2023

- November 2022

- August 2022

- May 2024

- February 2024

- June 2023

We want to hear from you! Take 1 minute to share your thoughts and helping us enhance our documentation and libraries. 👉 to participate.

These examples illustrate the basic usage of built-in Concrete ML models. For more examples showing how to train high-accuracy models on more complex data-sets, see the section.

It is recommended to use to configure the speed/accuracy trade-off for tree-based models and neural networks, using grid-search or your own heuristics.

These two examples show generalized linear models (GLM) on the real-world data-set. As the non-linear, inverse-link functions are computed, these models do not use , and are, thus, very fast (~1ms execution time).

Using the data-set, this example shows how to train a classifier that detects spam, based on features extracted from email messages. A grid-search is performed over decision-tree hyper-parameters to find the best ones.

Using the data-set, this example shows how to train regressor that predicts house prices.

This example shows how to train tree-ensemble models (either XGBoost or Random Forest), first on a synthetic data-set, and then on the data-set. Grid-search is used to find the best number of trees in the ensemble.

Privacy-preserving prediction of house prices is shown in this example, using the data-set. Using 50 trees in the ensemble, with 5 bits of precision for the input features, the FHE regressor obtains an score of 0.90 and an execution time of 7-8 seconds.

Two different configurations of the built-in, fully-connected neural networks are shown. First, a small bit-width accumulator network is trained on and compared to a PyTorch floating point network. Second, a larger accumulator (>8 bits) is demonstrated on .

Brevitas
demos and tutorials
CIFAR classification tutorial
notebook tutorial
Deep Neural Networks for Encrypted Inference with TFHE, 7th International Symposium, CSCML 2023
notebook tutorial
compile_torch_model
Quantization Aware Training
Brevitas
pruning
provides support for pruning
training FHE program
the client-server notebook
the use-case examples
Production Deployment section
client/server example
the corresponding section
FHE constraints
the private encryption keys
the pubic evaluation keys
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

from concrete.ml.sklearn import LogisticRegression

# Create the data for classification:
X, y = make_classification()

# Retrieve train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)

# Instantiate, train and compile the model
model = LogisticRegression()
model.fit(X_train, y_train)
model.compile(X_train)

# Run the inference in FHE
y_pred_fhe = model.predict(X_test, fhe="execute")

# Dump the model in a string
dumped_model_str = model.dumps()
from pathlib import Path

dumped_model_path = Path("logistic_regression_model.json")

# Any kind of file-like object can be used 
with dumped_model_path.open("w") as f:

    # Dump the model in a file
    model.dump(f)
from concrete.ml.common.serialization.dumpers import dump, dumps

# Dump the model in a string
dumped_model_str = dumps(model)

# Any kind of file-like object can be used 
with dumped_model_path.open("w") as f:

    # Dump the model in a file
    dump(model, f)
import numpy
from concrete.ml.common.serialization.loaders import load, loads

# Load the model from a string
loaded_model = loads(dumped_model_str)

# Any kind of file-like object can be used 
with dumped_model_path.open("r") as f:

    # Load the model from a file
    loaded_model = load(f)

# Compile the model
loaded_model.compile(X_train)

# Run the inference in FHE using the loaded model
y_pred_fhe_loaded = loaded_model.predict(X_test, fhe="execute")

print("Predictions are equal:", numpy.array_equal(y_pred_fhe, y_pred_fhe_loaded))

# Output:
#   Predictions are equal: True
Build-in model examples
Deep learning examples
Encrypted anonymization
code
Credit card approval
code
Sentiment analysis with transformers
code
blog post
Health diagnosis
code
Encrypted image filtering
code
GPT-2 in FHE
Titanic
Kaggle Titanic competition
Federated learning and private inference
Neutral network fine-tuning
Encrypted sentiment analysis
Credit scoring
Running privacy-preserving inferences on Hugging Face endpoints
Build an end-to-end encrypted Shazam application using Concrete ML
Linear regression over encrypted data with homomorphic encryption
Comparison of Concrete ML regressors
How to deploy a machine learning model with Concrete ML
Encrypted image filtering using homomorphic encryption
Sentiment analysis over encrypted data
Titanic Competition with Privacy Preserving Machine Learning
Work with encrypted DataFrames using Concrete ML
Train a linear classifier on encrypted data using Concrete ML and Fully Homomorphic Encryption (FHE)
How to convert a scikit-learn model into its homomorphic equivalent
Click here
Demos and Tutorials
Linear Regression example
Logistic Regression example
Linear Support Vector Regression example
Linear SVM classification
Poisson Regression example
Generalized Linear Models comparison
Decision Tree Classifier
OpenML spams
Decision Tree Regressor
House Price prediction
XGBoost/Random Forest example
Diabetes
XGBoost Regression example
R2R^2R2
House Prices
NN Iris example
NN MNIST example
Iris
MNIST
Classifier comparison
Regressor comparison
LogisticRegression training
OpenML insurance
PBS
the advanced quantization guide
here
simulation
Brevitas usage reference

Security and correctness

Security model

Correctness of computations

References

[1] Li, Baiyu, et al. “Securing approximate homomorphic encryption using differential privacy.” Annual International Cryptology Conference. Cham: Springer Nature Switzerland, 2022. https://eprint.iacr.org/2022/816.pdf

Quantization

Quantization is the process of constraining an input from a continuous or otherwise large set of values (such as real numbers) to a discrete set (such as integers).

This means that some accuracy in the representation is lost (e.g., a simple approach is to eliminate least-significant bits). In many cases in machine learning, it is possible to adapt the models to give meaningful results while using these smaller data types. This significantly reduces the number of bits necessary for intermediary results during the execution of these machine learning models.

Since FHE is currently limited to 16-bit integers, it is necessary to quantize models to make them compatible. As a general rule, the smaller the bit-width of integer values used in models, the better the FHE performance. This trade-off should be taken into account when designing models, especially neural networks.

Overview of quantization in Concrete ML

Quantization implemented in Concrete ML is applied in two ways:

  1. Built-in models apply quantization internally and the user only needs to configure some quantization parameters. This approach requires little work by the user but may not be a one-size-fits-all solution for all types of models. The final quantized model is FHE-friendly and ready to predict over encrypted data. In this setting, Post-Training Quantization (PTQ) is used for linear models, data quantization is used for tree-based models and, finally, Quantization Aware Training (QAT) is included in the built-in neural network models.

While Concrete ML quantizes machine learning models, the data that the client has is often in floating point. Concrete ML models provide APIs to quantize inputs and de-quantize outputs.

Note that the floating point input is quantized in the clear, meaning it is converted to integers before being encrypted. The model's outputs are also integers and decrypted before de-quantization.

Basics of quantization

Let [α,β][\alpha, \beta ][α,β] be the range of a value to quantize where α\alphaα is the minimum and β\betaβ is the maximum. To quantize a range of floating point values (in R\mathbb{R}R) to integer values (in Z\mathbb{Z}Z), the first step is to choose the data type that is going to be used. Many ML models work with weights and activations represented as 8-bit integers, so this will be the value used in this example. Knowing the number of bits that can be used for a value in the range [α,β][\alpha, \beta ][α,β], the scale SSS can be computed :

S=β−α2n−1S = \frac{\beta - \alpha}{2^n - 1}S=2n−1β−α​

where nnn is the number of bits (n≤8n \leq 8n≤8). In the following, n=8n = 8n=8 is assumed.

In practice, the quantization scale is then S=β−α255S = \frac{\beta - \alpha}{255}S=255β−α​. This means the gap between consecutive representable values cannot be smaller than SSS, which, in turn, means there can be a substantial loss of precision. Every interval of length SSS will be represented by a value within the range [0..255][0..255][0..255].

The other important parameter from this quantization schema is the zero point ZpZ_pZp​ value. This essentially brings the 0 floating point value to a specific integer. If the quantization scheme is asymmetric (quantized values are not centered in 0), the resulting ZpZ_pZp​ will be in Z\mathbb{Z}Z.

Zp=round(−αS)Z_p = \mathtt{round} \left(- \frac{\alpha}{S} \right)Zp​=round(−Sα​)

Quantization special cases

Machine learning acceleration solutions are often based on integer computation of activations. To make quantization computations hardware-friendly, a popular approach is to ensure that scales are powers-of-two, which allows the replacement of the division in the equations above with a shift-right operation. TFHE also has a fast primitive for right bit-shift that enables acceleration in the special case of power-of-two scales.

Configuring model quantization parameters

Built-in models provide a simple interface for configuring quantization parameters, most notably the number of bits used for inputs, model weights, intermediary values, and output values.

For linear models, n_bits is used to quantize both model inputs and weights. Depending on the number of features, you can use a single integer value for the n_bits parameter (e.g., a value between 2 and 7). When the number of features is high, the n_bits parameter should be decreased if you encounter compilation errors. It is also possible to quantize inputs and weights with different numbers of bits by passing a dictionary to n_bits containing the op_inputs and op_weights keys.

Tree-based models can directly control the accumulator bit-width used. If 6 or 7 bits are not sufficient to obtain good accuracy on your data-set, one option is to use an ensemble model (RandomForest or XGBoost) and increase the number of trees in the ensemble. This, however, will have a detrimental impact on FHE execution speed.

For built-in neural networks, the maximum accumulator bit-width cannot be precisely controlled. To use many input features and a high number of bits is beneficial for model accuracy, but it can conflict with the 16-bit accumulator constraint. Finding the best quantization parameters to maximize accuracy, while keeping the accumulator size down, can only be accomplished through experimentation.

Quantizing model inputs and outputs

The models implemented in Concrete ML provide features to let the user quantize the input data and de-quantize the output data.

Here is a simple example showing how to perform inference, starting from float values and ending up with float values. The FHE engine that is compiled for ML models does not support data batching.

# Assume 
#   quantized_module : QuantizedModule
#   x: numpy.ndarray (of float)

# Quantization is done in the clear
x_q = quantized_module.quantize_input(x)

# Forward in FHE (here with simulation)
q_y_proba = quantized_module.quantized_forward(x_q, fhe="simulate")

# De-quantization is done in the clear
y_proba = quantized_module.dequantize_output(q_y_proba)

# For classifiers with multi-class outputs, the arg max is done in the clear
y_pred = np.argmax(y_proba, 1)

Alternatively, the forward method groups the quantization, FHE execution and de-quantization steps all together.

# Assume 
#   quantized_module : QuantizedModule
#   x: numpy.ndarray (of float)

# Forward in FHE (here with simulation). Quantization and de-quantization steps are still done in 
# the clear 
y_proba = quantized_module.forward(x, fhe="simulate")

# For classifiers with multi-class outputs, the arg max is done in the clear
y_pred = np.argmax(y_proba, 1)

Resources

Prediction with FHE

Concrete ML has APIs that make it easy, during model development and testing, to perform encryption, execution in FHE, and decryption in a single step. For more control, these individual steps can be executed separately. The APIs used to accomplish this are different for:

Built-in models

The following example shows how to create a synthetic data-set and how to use it to train a LogisticRegression model from Concrete ML. Next, we will discuss the dedicated functions for encryption, inference, and decryption.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression
import numpy

# Create a synthetic data-set for a classification problem
x, y = make_classification(n_samples=100, class_sep=2, n_features=3, n_informative=3, n_redundant=0, random_state=42)

# Split the data-set into a train and test set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Instantiate and train the model
model = LogisticRegression()
model.fit(x_train,y_train)

# Simulate the predictions in the clear (optional)
y_pred_clear = model.predict(x_test)

# Compile the model on a representative set
fhe_circuit = model.compile(x_train)

All Concrete ML built-in models have a monolithic predict method that performs the encryption, FHE execution, and decryption with a single function call. Concrete ML models follow the same API as scikit-learn models, transparently performing the steps related to encryption for convenience.

# Predict in FHE
y_pred_fhe = model.predict(x_test, fhe="execute")

Regarding this LogisticRegression model, as with scikit-learn, it is possible to predict the logits as well as the class probabilities by respectively using the decision_function or predict_proba methods instead.

Alternatively, it is possible to execute all main steps (key generation, quantization, encryption, FHE execution, decryption) separately.

# Generate the keys (set force to True in order to generate new keys at each execution)
fhe_circuit.keygen(force=True)

y_pred_fhe_step = []

for f_input in x_test:
    # Quantize an input (float)
    q_input = model.quantize_input([f_input])
    
    # Encrypt the input
    q_input_enc = fhe_circuit.encrypt(q_input)

    # Execute the linear product in FHE 
    q_y_enc = fhe_circuit.run(q_input_enc)

    # Decrypt the result (integer)
    q_y = fhe_circuit.decrypt(q_y_enc)

    # De-quantize the result
    y = model.dequantize_output(q_y)

    # Apply either the sigmoid if it is a binary classification task, which is the case in this 
    # example, or a softmax function in order to get the probabilities (in the clear)
    y_proba = model.post_processing(y)

    # Since this model does classification, apply the argmax to get the class predictions (in the clear)
    # Note that regression models won't need the following line
    y_class = numpy.argmax(y_proba, axis=1)

    y_pred_fhe_step += list(y_class)

y_pred_fhe_step = numpy.array(y_pred_fhe_step)

print("Predictions in clear:", y_pred_clear)
print("Predictions in FHE  :", y_pred_fhe_step)
print(f"Similarity: {int((y_pred_fhe_step == y_pred_clear).mean()*100)}%")

Custom models

For custom models, the API to execute inference in FHE or simulation is illustrated as:

from torch import nn
from brevitas import nn as qnn
from concrete.ml.torch.compile import compile_brevitas_qat_model

class FCSmall(nn.Module):
    """A small QAT NN."""

    def __init__(self, input_output):
        super().__init__()
        self.quant_input = qnn.QuantIdentity(bit_width=3)
        self.fc1 = qnn.QuantLinear(in_features=input_output, out_features=input_output, weight_bit_width=3, bias=True)
        self.act_f = nn.ReLU()
        self.fc2 = qnn.QuantLinear(in_features=input_output, out_features=input_output, weight_bit_width=3, bias=True)

    def forward(self, x):
        return self.fc2(self.act_f(self.fc1(self.quant_input(x))))

torch_model = FCSmall(3)

quantized_module = compile_brevitas_qat_model(
    torch_model,
    x_train,
)

x_test_q = quantized_module.quantize_input(x_test)
y_pred = quantized_module.quantized_forward(x_test_q, fhe="simulate")
y_pred = quantized_module.dequantize_output(y_pred)

y_pred = numpy.argmax(y_pred, axis=1)

Debugging models

This section provides a set of tools and guidelines to help users build optimized FHE-compatible models. It discusses FHE simulation, the key-cache functionality that helps speed-up FHE result debugging, and gives a guide to evaluate circuit complexity.

Simulation

The simulation mode can be useful when developing and iterating on an ML model implementation. As FHE non-linear models work with integers up to 16 bits, with a trade-off between the number of bits and the FHE execution speed, the simulation can help to find the optimal model design.

The following example shows how to use the simulation mode in Concrete ML.

from sklearn.datasets import fetch_openml, make_circles
from concrete.ml.sklearn import RandomForestClassifier

n_bits = 2
X, y = make_circles(n_samples=1000, noise=0.1, factor=0.6, random_state=0)
concrete_clf = RandomForestClassifier(
    n_bits=n_bits, n_estimators=10, max_depth=5
)
concrete_clf.fit(X, y)

concrete_clf.compile(X)

# Running the model using FHE-simulation
y_preds_clear = concrete_clf.predict(X, fhe="simulate")

Caching keys during debugging

It is possible to avoid re-generating the keys of the models you are debugging. This feature is unsafe and should not be used in production. Here is an example that shows how to enable key-caching:

from sklearn.datasets import fetch_openml, make_circles
from concrete.ml.sklearn import RandomForestClassifier
from concrete.fhe import Configuration
debug_config = Configuration(
    enable_unsafe_features=True,
    use_insecure_key_cache=True,
    insecure_key_cache_location="~/.cml_keycache",
)

n_bits = 2
X, y = make_circles(n_samples=1000, noise=0.1, factor=0.6, random_state=0)
concrete_clf = RandomForestClassifier(
    n_bits=n_bits, n_estimators=10, max_depth=5
)
concrete_clf.fit(X, y)

concrete_clf.compile(X, debug_config)

Common compilation errors

1. TLU input maximum bit-width is exceeded

Error message: this [N]-bit value is used as an input to a table lookup

Cause: This error can occur when rounding_threshold_bits is not used and accumulated intermediate values in the computation exceed 16 bits.

Possible solutions:

2. No crypto-parameters can be found

Error message: RuntimeError: NoParametersFound

Cause: This error occurs when using rounding_threshold_bits in the compile_torch_model function.

Possible solutions: The solutions in this case are similar to the ones for the previous error.

3. Quantization import failed

Error message: Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!.

A common example is related to the concatenation operator. Suppose two tensors x and y are produced by two layers and need to be concatenated:

x = self.dense1(x)
y = self.dense2(y)
z = torch.cat([x, y])

In the example above, the x and y layers need quantization before being concatenated.

Possible solutions:

  1. If the error occurs for the first layer of the model: Add a QuantIdentity layer in your model and apply it on the input of the forward function, before the first layer is computed.

  2. If the error occurs for a concatenation or addition layer: Add a new QuantIdentity layer in your model. Suppose it is called quant_concat. In the forward function, before concatenation of x and y, apply it to both tensors that are concatenated. The usage of a common Quantidentity layer to quantize both tensors that are concatenated ensures that they have the same scale:

z = torch.cat([self.quant_concat(x), self.quant_concat(y)])

Debugging compilation errors

Compilation errors due to FHE incompatible models, such as maximum bit-width exceeded or NoParametersFound can be debugged by examining the bit-widths associated with various intermediate values of the FHE computation.

The following produces a neural network that is not FHE-compatible:

import numpy
import torch

from torch import nn
from concrete.ml.torch.compile import compile_torch_model

N_FEAT = 2
class SimpleNet(nn.Module):
    """Simple MLP with PyTorch"""

    def __init__(self, n_hidden=30):
        super().__init__()
        self.fc1 = nn.Linear(in_features=N_FEAT, out_features=n_hidden)
        self.fc2 = nn.Linear(in_features=n_hidden, out_features=n_hidden)
        self.fc3 = nn.Linear(in_features=n_hidden, out_features=2)


    def forward(self, x):
        """Forward pass."""
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x


torch_input = torch.randn(100, N_FEAT)
torch_model = SimpleNet(120)
try:
    quantized_numpy_module = compile_torch_model(
        torch_model,
        torch_input,
        n_bits=7,
    )
except RuntimeError as err:
    print(err)

Upon execution, the Compiler will raise the following error within the graph representation:

Function you are trying to compile cannot be compiled:

%0 = _x                               # EncryptedTensor<int7, shape=(1, 2)>           ∈ [-64, 63]
%1 = [[ -9  18  ...   30  34]]        # ClearTensor<int7, shape=(2, 120)>             ∈ [-62, 63]              @ /fc1/Gemm.matmul
%2 = matmul(%0, %1)                   # EncryptedTensor<int14, shape=(1, 120)>        ∈ [-5834, 5770]          @ /fc1/Gemm.matmul
%3 = subgraph(%2)                     # EncryptedTensor<uint7, shape=(1, 120)>        ∈ [0, 127]
%4 = [[-36   6  ...   27 -11]]        # ClearTensor<int7, shape=(120, 120)>           ∈ [-63, 63]              @ /fc2/Gemm.matmul
%5 = matmul(%3, %4)                   # EncryptedTensor<int17, shape=(1, 120)>        ∈ [-34666, 37702]        @ /fc2/Gemm.matmul
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this 17-bit value is used as an input to a table lookup

Fixing compilation errors

To make this network FHE-compatible one can apply several techniques:

torch_model = SimpleNet(20)

quantized_numpy_module = compile_torch_model(
    torch_model,
    torch_input,
    n_bits=6,
    rounding_threshold_bits=7,
)
  1. reduce the accumulator bit-width of the second layer named fc2. To do this, a simple solution is to reduce the number of neurons, as it is proportional to the bit-width.

torch_model = SimpleNet(10)

quantized_numpy_module = compile_torch_model(
    torch_model,
    torch_input,
    n_bits=7,
)
torch_model = SimpleNet(10)

quantized_numpy_module = compile_torch_model(
    torch_model,
    torch_input,
    n_bits=7,
    p_error=0.01
)

Complexity analysis

In FHE, univariate functions are encoded as table lookups, which are then implemented using Programmable Bootstrapping (PBS). PBS is a powerful technique but will require significantly more computing resources, and thus time, compared to simpler encrypted operations such as matrix multiplications, convolution, or additions.

Furthermore, the cost of PBS will depend on the bit-width of the compiled circuit. Every additional bit in the maximum bit-width raises the complexity of the PBS by a significant factor. It may be of interest to the model developer, then, to determine the bit-width of the circuit and the amount of PBS it performs.

This can be done by inspecting the MLIR code produced by the Compiler:

print(quantized_numpy_module.fhe_circuit.mlir)
MLIR
--------------------------------------------------------------------------------
module {
  func.func @main(%arg0: tensor<1x2x!FHE.eint<15>>) -> tensor<1x2x!FHE.eint<15>> {
    %cst = arith.constant dense<16384> : tensor<1xi16>
    %0 = "FHELinalg.sub_eint_int"(%arg0, %cst) : (tensor<1x2x!FHE.eint<15>>, tensor<1xi16>) -> tensor<1x2x!FHE.eint<15>>
    %cst_0 = arith.constant dense<[[-13, 43], [-31, 63], [1, -44], [-61, 20], [31, 2]]> : tensor<5x2xi16>
    %cst_1 = arith.constant dense<[[-45, 57, 19, 50, -63], [32, 37, 2, 52, -60], [-41, 25, -1, 31, -26], [-51, -40, -53, 0, 4], [20, -25, 56, 54, -23]]> : tensor<5x5xi16>
    %cst_2 = arith.constant dense<[[-56, -50, 57, 37, -22], [14, -1, 57, -63, 3]]> : tensor<2x5xi16>
    %c16384_i16 = arith.constant 16384 : i16
    %1 = "FHELinalg.matmul_eint_int"(%0, %cst_2) : (tensor<1x2x!FHE.eint<15>>, tensor<2x5xi16>) -> tensor<1x5x!FHE.eint<15>>
    %cst_3 = tensor.from_elements %c16384_i16 : tensor<1xi16>
    %cst_4 = tensor.from_elements %c16384_i16 : tensor<1xi16>
    %2 = "FHELinalg.add_eint_int"(%1, %cst_4) : (tensor<1x5x!FHE.eint<15>>, tensor<1xi16>) -> tensor<1x5x!FHE.eint<15>>
    %cst_5 = arith.constant

: tensor<5x32768xi64>
    %cst_6 = arith.constant dense<[[0, 1, 2, 3, 4]]> : tensor<1x5xindex>
    %3 = "FHELinalg.apply_mapped_lookup_table"(%2, %cst_5, %cst_6) : (tensor<1x5x!FHE.eint<15>>, tensor<5x32768xi64>, tensor<1x5xindex>) -> tensor<1x5x!FHE.eint<15>>
    %4 = "FHELinalg.matmul_eint_int"(%3, %cst_1) : (tensor<1x5x!FHE.eint<15>>, tensor<5x5xi16>) -> tensor<1x5x!FHE.eint<15>>
    %5 = "FHELinalg.add_eint_int"(%4, %cst_3) : (tensor<1x5x!FHE.eint<15>>, tensor<1xi16>) -> tensor<1x5x!FHE.eint<15>>
    %cst_7 = arith.constant

: tensor<5x32768xi64>
    %6 = "FHELinalg.apply_mapped_lookup_table"(%5, %cst_7, %cst_6) : (tensor<1x5x!FHE.eint<15>>, tensor<5x32768xi64>, tensor<1x5xindex>) -> tensor<1x5x!FHE.eint<15>>
    %7 = "FHELinalg.matmul_eint_int"(%6, %cst_0) : (tensor<1x5x!FHE.eint<15>>, tensor<5x2xi16>) -> tensor<1x2x!FHE.eint<15>>
    return %7 : tensor<1x2x!FHE.eint<15>>

  }
}
--------------------------------------------------------------------------------

There are several calls to FHELinalg.apply_mapped_lookup_table and FHELinalg.apply_lookup_table. These calls apply PBS to the cells of their input tensors. Their inputs in the listing above are: tensor<1x2x!FHE.eint<8>> for the first and last call and tensor<1x50x!FHE.eint<8>> for the two calls in the middle. Thus, PBS is applied 104 times.

Retrieving the bit-width of the circuit is then simply:

print(quantized_numpy_module.fhe_circuit.graph.maximum_integer_bit_width())

Decreasing the number of bits and the number of PBS applications induces large reductions in the computation time of the compiled circuit.

Importing ONNX

As ONNX is becoming the standard exchange format for neural networks, this allows Concrete ML to be flexible while also making model representation manipulation easy. In addition, it allows for straight-forward mapping to NumPy operators, supported by Concrete to use Concrete stack's FHE-conversion capabilities.

Torch to NumPy conversion using ONNX

The diagram below gives an overview of the steps involved in the conversion of an ONNX graph to an FHE-compatible format (i.e., a format that can be compiled to FHE through Concrete).

All Concrete ML built-in models follow the same pattern for FHE conversion:

  1. The models are trained with sklearn or PyTorch.

  2. The Concrete ML ONNX parser checks that all the operations in the ONNX graph are supported and assigns reference NumPy operations to them. This step produces a NumpyModule.

  3. Once the QuantizedModule is built, Concrete is used to trace the ._forward() function of the QuantizedModule.

Once an ONNX model is imported, it is converted to a NumpyModule, then to a QuantizedModule and, finally, to an FHE circuit. However, as the diagram shows, it is perfectly possible to stop at the NumpyModule level if you just want to run the PyTorch model as NumPy code without doing quantization.

Inspecting the ONNX models

import onnx
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

from concrete.ml.sklearn import LogisticRegression

# Create the data for classification
x, y = make_classification(n_samples=250, class_sep=2, n_features=30, random_state=42)

# Retrieve train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.4, random_state=42
)

# Fix the number of bits to used for quantization
model = LogisticRegression(n_bits=8)

# Fit the model
model.fit(X_train, y_train)

# Access to the model
onnx_model = model.onnx_model

# Print the model
print(onnx.helper.printable_graph(onnx_model.graph))

# Save the model
onnx.save(onnx_model, "tmp.onnx")

# And then visualize it with Netron

Compilation

Compilation of a model produces machine code that executes the model on encrypted data. In some cases, notably in the client/server setting, the compilation can be done by the server when loading the model for serving.

As FHE execution is much slower than execution on non-encrypted data, Concrete ML has a simulation mode which can help to quickly evaluate the impact of FHE execution on models.

Compilation to FHE

From the perspective of the Concrete ML user, the compilation process performed by Concrete can be broken up into 3 steps:

  1. tracing the NumPy program and creating a Concrete op-graph

  2. checking the op-graph for FHE compatibility

  3. producing machine code for the op-graph (this step automatically determines cryptographic parameters)

Built-in models

Compilation is performed for built-in models with the compile method :

    clf.compile(X_train)

scikit-learn pipelines

When using a pipeline, the Concrete ML model can predict with FHE during the pipeline execution, but it needs to be compiled beforehand. The compile function must be called on the Concrete ML model:

import numpy
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

# Create the data for classification:
X, y = make_classification(
    n_features=30,
    n_redundant=0,
    n_informative=2,
    random_state=2,
    n_clusters_per_class=1,
    n_samples=250,
)

# Retrieve train and test sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

model_pca = Pipeline(
    [
        ("preprocessor", PCA()),
        ("cml_model", LogisticRegression(n_bits=8))
    ]
)

model_pca.fit(X_train, y_train)

# Compile the Concrete ML model
model_pca["cml_model"].compile(X_train)

model_pca.predict(X_test[[0]], fhe="execute")

Custom models

For custom models, with one of the compile_brevitas_qat_model (for Brevitas models with Quantization Aware Training) or compile_torch_model (PyTorch models using Post-Training Quantization) functions:

    quantized_numpy_module = compile_brevitas_qat_model(torch_model, X_train)

FHE simulation

The result of this single step of the compilation pipeline allows the:

  • execution of the op-graph, which includes TLUs, on clear non-encrypted data. This is not secure, but it is much faster than executing in FHE. This mode is useful for debugging, especially when looking for appropriate model hyper-parameters

  • verification of the maximum bit-width of the op-graph and the intermediary bit-widths of model layers, to evaluate their impact on FHE execution latency

Simulation is enabled for all Concrete ML models once they are compiled as shown above. Obtaining the simulated predictions of the models is done by setting the fhe="simulate" argument to prediction methods:

    Z = clf.predict_proba(X, fhe="simulate")

Moreover, the maximum accumulator bit-width is determined as follows:

    bit_width = clf.quantized_module_.fhe_circuit.graph.maximum_integer_bit_width()

A simple Concrete example

import numpy
from concrete.fhe import compiler

# Assume Quantization has been applied and we are left with integers only. This is essentially the work of Concrete ML

# Some parameters (weight and bias) for our model taking a single feature
w = [2]
b = 2

# The function that implements our model
@compiler({"x": "encrypted"})
def linear_model(x):
    return w @ x + b

# A representative input-set is needed to compile the function (used for tracing)
n_bits_input = 2
inputset = numpy.arange(0, 2**n_bits_input).reshape(-1, 1)
circuit = linear_model.compile(inputset)

# Use the API to get the maximum bit-width in the circuit
max_bit_width = circuit.graph.maximum_integer_bit_width()
print("Max bit_width = ", max_bit_width)
# Max bit_width = 4

# Test our FHE inference
circuit.encrypt_run_decrypt(numpy.array([3]))
# 8

# Print the graph of the circuit
print(circuit)
# %0 = 2                     # ClearScalar<uint2>
# %1 = [2]                   # ClearTensor<uint2, shape=(1,)>
# %2 = x                     # EncryptedTensor<uint2, shape=(1,)>
# %3 = matmul(%1, %2)        # EncryptedScalar<uint3>
# %4 = add(%3, %0)           # EncryptedScalar<uint4>
# return %4

Pruning

Overview of pruning in Concrete ML

Pruning is used in Concrete ML for two types of neural networks:

Basics of pruning

In neural networks, a neuron computes a linear combination of inputs and learned weights, then applies an activation function.

The neuron computes:

yk=ϕ(∑iwixi)y_k = \phi\left(\sum_i w_ix_i\right)yk​=ϕ(∑i​wi​xi​)

When building a full neural network, each layer will contain multiple neurons, which are connected to the inputs or to the neuron outputs of a previous layer.

For every neuron shown in each layer of the figure above, the linear combinations of inputs and learned weights are computed. Depending on the values of the inputs and weights, the sum vk=∑iwixiv_k = \sum_i w_ix_ivk​=∑i​wi​xi​ - which for Concrete ML neural networks is computed with integers - can take a range of different values.

Pruning a neural network entails fixing some of the weights wkw_kwk​ to be zero during training. This is advantageous to meet FHE constraints, as irrespective of the distribution of xix_ixi​, multiplying these input values by 0 does not increase the accumulator value.

Fixing some of the weights to 0 makes the network graph look more similar to the following:

Pruning in practice

In the formula above, in the worst case, the maximum number of the input and weights that can make the result exceed nnn bits is given by:

Ω=floor(2nmax−1(2nweights−1)(2ninputs−1))\Omega = \mathsf{floor} \left( \frac{2^{n_{\mathsf{max}}} - 1}{(2^{n_{\mathsf{weights}}} - 1)(2^{n_{\mathsf{inputs}}} - 1)} \right)Ω=floor((2nweights​−1)(2ninputs​−1)2nmax​−1​)

Here, nmax=16n_{\mathsf{max}} = 16nmax​=16 is the maximum precision allowed.

For example, if nweights=2n_{\mathsf{weights}} = 2nweights​=2 and ninputs=2n_{\mathsf{inputs}} = 2ninputs​=2 with nmax=16n_{\mathsf{max}} = 16nmax​=16, the worst case scenario occurs when all inputs and weights are equal to their maximal value 22−1=32^2-1=322−1=3. There can be at most Ω=7281\Omega = 7281Ω=7281 elements in the multi-sums.

The distribution of the weights of a neural network is Gaussian, with many weights either 0 or having a small value. This enables exceeding the worst case number of active neurons without having to risk overflowing the bit-width. In built-in neural networks, the parameter n_hidden_neurons_multiplier is multiplied with Ω\OmegaΩ to determine the total number of non-zero weights that should be kept in a neuron.

External libraries

Hummingbird

Concrete ML allows the conversion of an ONNX inference to NumPy inference (note that NumPy is always the entry point to run models in FHE with Concrete ML).

Hummingbird exposes a convert function that can be imported as follows from the hummingbird.ml package:

# Disable Hummingbird warnings for pytest.
import warnings
warnings.filterwarnings("ignore")
from hummingbird.ml import convert

This function can be used to convert a machine learning model to an ONNX as follows:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

# Instantiate the logistic regression from sklearn
model = LogisticRegression()

# Create synthetic data
X, y = make_classification(
    n_samples=100, n_features=20, n_classes=2
)

# Fit the model
model.fit(X, y)

# Convert the model to ONNX
onnx_model = convert(model, backend="onnx", test_input=X).model

In theory, the resulting onnx_model could be used directly within Concrete ML's get_equivalent_numpy_forward method (as long as all operators present in the ONNX model are implemented in NumPy) and get the NumPy inference.

In practice, there are some steps needed to clean the ONNX output and make the graph compatible with Concrete ML, such as applying quantization where needed or deleting/replacing non-FHE friendly ONNX operators (such as Softmax and ArgMax).

skorch

This wrapper implements Torch training boilerplate code, lessening the work required of the user. It is possible to add hooks during the training phase, for example once an epoch is finished.

class SparseQuantNeuralNetImpl(nn.Module):
    """Sparse Quantized Neural Network classifier.

Brevitas

While Brevitas provides many types of quantization, for Concrete ML, a custom "mixed integer" quantization applies. This "mixed integer" quantization is much simpler than the "integer only" mode of Brevitas. The "mixed integer" network design is defined as:

  • all weights and activations of convolutional, linear and pooling layers must be quantized (e.g., using Brevitas layers, QuantConv2D, QuantAvgPool2D, QuantLinear)

For "mixed integer" quantization to work, the first layer of a Brevitas nn.Module must be a QuantIdentity layer. However, you can then use functions such as torch.sigmoid on the result of such a quantizing operation.

import torch.nn as nn

class QATnetwork(nn.Module):
    def __init__(self):
        super(QATnetwork, self).__init__()
        self.quant_inp = qnn.QuantIdentity(
            bit_width=4, return_quant_tensor=True)
        # ...

    def forward(self, x):
        out = self.quant_inp(x)
        return torch.sigmoid(out)
        # ...

For examples of such a "mixed integer" network design, please see the Quantization Aware Training examples:

API

Modules

Classes

Functions

The default parameters for Concrete ML are chosen considering the security model, and are selected with a of 2−402^-402−40. In particular, it is assumed that the results of decrypted computations are not shared by the secret key owner with any third parties, as such an action can lead to leakage of the secret encryption key. If you are designing an application where decryptions must be shared, you will need to craft custom encryption parameters which are chosen in consideration of the IND-CPA^D security model [1].

The section explains how Concrete ML can ensure guaranteed correctness of encrypted computations. In this approach, a quantized machine learning model will be converted to an FHE circuit that produces the same result on encrypted data as the original model on clear data.

However, the can be configured by the user. Raising this probability results in lower latency when executing on encrypted data, but higher values cancel the correctness guarantee of the default setting. In practice this may not be an issue, as the accuracy of the model may be maintained, even though slight differences are observed in the model outputs. Moreover, as noted in the , raising the off-by-one error probability may negatively impact the security model.

Furthermore, a second approach to reduce latency at the expense of correctness is approximate computation of univariate functions. This mode is enabled by using the . When using the rounding method, off-by-one errors are always induced in the computation of activation functions, irrespective of the bootstrapping off-by-one error probability.

When trading-off better latency for correctness, it is highly recommended to use the to measure accuracy on a drawn-out test-set. In many cases the accuracy of the model is only slightly impacted by approximate computations.

For custom neural networks with more complex topology, obtaining FHE-compatible models with good accuracy requires QAT. Concrete ML offers the possibility for the user to perform quantization before compiling to FHE. This can be achieved through a third-party library that offers QAT tools, such as for PyTorch. In this approach, the user is responsible for implementing a full-integer model, respecting FHE constraints. Please refer to the for tips on designing FHE neural networks.

When using quantized values in a matrix multiplication or convolution, the equations for computing the result become more complex. The IntelLabs Distiller documentation provides a more of the maths used to quantize values and how to keep computations consistent.

For , the quantization is done post-training. Thus, the model is trained in floating point, and then, the best integer weight representations are found, depending on the distribution of inputs and weights. For these models, the user selects the value of the n_bits parameter.

For , the training and test data is quantized. The maximum accumulator bit-width for a model trained with n_bits=n for this type of model is known beforehand: It will need n+1 bits. Through experimentation, it was determined that, in many cases, a value of 5 or 6 bits gives the same accuracy as training in floating point and values above n=7 do not increase model performance (but rather induce a strong slowdown).

For built-in , several linear layers are used. Thus, the outputs of a layer are used as inputs to a new layer. Built-in neural networks use Quantization Aware Training. The parameters controlling the maximum accumulator bit-width are the number of weights and activation bits ( module__n_w_bits, module__n_a_bits ), but also the pruning factor. This factor is determined automatically by specifying a desired accumulator bit-width module__n_accum_bits and, optionally, a multiplier factor, module__n_hidden_neurons_multiplier.

In a client/server setting, the client is responsible for quantizing inputs before sending them, encrypted, to the server. The client must then de-quantize the encrypted integer results received from the server. See the section for more details.

IntelLabs distiller explanation of quantization:

The of Concrete ML provides a way to evaluate, using clear data, the results that ML models produce on encrypted data. The simulation includes any probabilistic behavior FHE may induce. The simulation is implemented with .

Simulation is much faster than FHE execution. This allows for faster debugging and model optimization. For example, this was used for the red/blue contours in the , as computing in FHE for the whole grid and all the classifiers would take significant time.

Reduce quantization n_bits. However, this may reduce accuracy. When quantization n_bits must be below 6, it is best to use .

Use rounding_threshold_bits. This feature is described . It is recommended to use the setting, and set the rounding bits to 1 or 2 bits higher than the quantization n_bits

Use

Cause: This error occurs when the model imported as a quantized-aware training model lacks quantization operators. See on how to use Brevitas layers. This error message indicates that some layers do not take inputs quantized through QuantIdentity layers.

The error this 17-bit value is used as an input to a table lookup indicates that the 16-bit limit on the input of the Table Lookup (TLU) operation has been exceeded. To pinpoint the model layer that causes the error, Concrete ML provides the helper function. First, the model must be compiled so that it can be .

On the other hand, NoParametersFound is encountered when using rounding_threshold_bits. When using this setting, the 16-bit accumulator limit is relaxed. However, reducing bit-width, or reducing the rounding_threshold_bits, or using using the rounding method can help.

use by specifying the rounding_threshold_bits parameter. Please evaluate the accuracy of the model using simulation if you use this feature, as it may impact accuracy. Setting a value 2-bit higher than the quantization n_bits should be a good start.

adjust the tolerance for one-off errors using the p_error parameter. See on this tolerance.

Internally, Concrete ML uses operators as intermediate representation (or IR) for manipulating machine learning models produced through export for , , and .

All models have a PyTorch implementation for inference. This implementation is provided either by a third-party tool such as or implemented directly in Concrete ML.

The PyTorch model is exported to ONNX. For more information on the use of ONNX in Concrete ML, see .

Quantization is performed on the , producing a . Two steps are performed: calibration and assignment of equivalent objects to each ONNX operation. The QuantizedModule class is the quantized counterpart of the NumpyModule.

Moreover, by passing a user provided nn.Module to step 2 of the above process, Concrete ML supports custom user models. See the associated for instructions about working with such models.

Note that the NumpyModule interpreter currently .

In order to better understand how Concrete ML works under the hood, it is possible to access each model in their ONNX format and then either print it or visualize it by importing the associated file in . For example, with LogisticRegression:

Concrete ML implements model inference using Concrete as a backend. In order to execute in FHE, a numerical program written in Concrete needs to be compiled. This functionality is , and Concrete ML hides away most of the complexity of this step, completing the entire compilation process itself.

Additionally, the packages the result of the last step in a way that allows the deployment of the encrypted circuit to a server, as well as key generation, encryption, and decryption on the client side.

The first step in the list above takes a Python function implemented using the Concrete and transforms it into an executable operation graph.

While Concrete ML hides away all the Concrete code that performs model inference, it can be useful to understand how Concrete code works. Here is a toy example for a simple linear regression model on integers to illustrate compilation concepts. Generally, it is recommended to use the , which provide linear regression out of the box.

Pruning is a method to reduce neural network complexity, usually applied in order to reduce the computation cost or memory size. Pruning is used in Concrete ML to control the size of accumulators in neural networks, thus making them FHE-compatible. See for an explanation of accumulator bit-width constraints.

Built-in include a pruning mechanism that can be parameterized by the user. The pruning type is based on L1-norm. To comply with FHE constraints, Concrete ML uses unstructured pruning, as the aim is not to eliminate neurons or convolutional filters completely, but to decrease their accumulator bit-width.

Custom neural networks, to work well under FHE constraints, should include pruning. When implemented with PyTorch, you can use the (e.g., L1-Unstructured) to good effect.

To respect the bit-width constraint of the FHE , the values of the accumulator vkv_kvk​ must remain small to be representable using a maximum of 16 bits. In other words, the values must be between 0 and 216−12^{16}-1216−1.

While pruning weights can reduce the prediction performance of the neural network, studies show that a high level of pruning (above 50%) can often be applied. See here how Concrete ML uses pruning in .

is a third-party, open-source library that converts machine learning models into tensor computations, and it can export these models to ONNX. The list of supported models can be found in .

Concrete ML uses to implement multi-layer, fully-connected PyTorch neural networks in a way that is compatible with the scikit-learn API.

skorch allows the user to easily create a classifier or regressor around a neural network (NN), implemented in Torch as a nn.Module, which is used by Concrete ML to provide a fully-connected, multi-layer NN with a configurable number of layers and optional pruning (see and the for more information).

Under the hood, Concrete ML uses a skorch wrapper around a single PyTorch module, SparseQuantNeuralNetwork. More information can be found .

is a quantization aware learning toolkit built on top of PyTorch. It provides quantization layers that are one-to-one equivalents to PyTorch layers, but also contain operations that perform the quantization during training.

PyTorch floating-point versions of univariate functions can be used (e.g., torch.relu, nn.BatchNormalization2D, torch.max (encrypted vs. constant), torch.add, torch.exp). See the for a full list.

The "mixed integer" mode used in Concrete ML neural networks is based on the that makes both weights and activations representable as integers during training. However, through the use of lookup tables in Concrete ML, floating point univariate PyTorch functions are supported.

You can also refer to the class, which is the basis of the built-in NeuralNetworkClassifier.

: Module for shared data structures and code.

: Check and conversion tools.

: Module for debugging.

: Provide some variants of assert.

: Serialization module.

: Custom decoder for serialization.

: Dump functions for serialization.

: Custom encoder for serialization.

: Load functions for serialization.

: Utils that can be re-used by other pieces of code in the module.

: Module for deployment of the FHE model.

: APIs for FHE deployment.

: ONNX module.

: ONNX conversion related code.

: Utility functions for onnx operator implementations.

: Some code to manipulate models.

: Utils to interpret an ONNX model with numpy.

: ONNX ops implementation in Python + NumPy.

: Public API for encrypted data-frames.

: Define the framework used for managing keys (encrypt, decrypt) for encrypted data-frames.

: Define the encrypted data-frame framework.

: Module which is used to contain common functions for pytest.

: Torch modules for our pytests.

: Common functions or lists for test files, which can't be put in fixtures.

: Modules for quantization.

: Base Quantized Op class that implements quantization for a float numpy op.

: Post Training Quantization methods.

: QuantizedModule API.

: Optimization passes for QuantizedModules.

: Quantized versions of the ONNX operators for post training quantization.

: Quantization utilities for a numpy array/tensor.

: Modules for p_error search.

: p_error binary search for classification and regression tasks.

: Import sklearn models.

: Base classes for all estimators.

: Implement sklearn's Generalized Linear Models (GLM).

: Implement sklearn linear model.

: Implement sklearn neighbors model.

: Scikit-learn interface for fully-connected quantized neural networks.

: Sparse Quantized Neural Network torch module.

: Implement RandomForest models.

: Implement Support Vector Machine.

: Implement DecisionTree models.

: Implements the conversion of a tree model to a numpy function.

: Implements XGBoost models.

: Modules for torch to numpy conversion.

: torch compilation function.

: Implement the conversion of a torch model to a hybrid fhe/torch inference.

: A torch to numpy module.

: File to manage the version of the package.

: Custom json decoder to handle non-native types found in serialized Concrete ML objects.

: Custom json encoder to handle non-native types found in serialized Concrete ML objects.

: Enum representing the execution mode.

: Mode for the FHE API.

: Client API to encrypt and decrypt FHE data.

: Dev API to save the model and then load and run the FHE circuit.

: Server API to load and run the FHE circuit.

: A mixed quantized-raw valued onnx function.

: Type construct that marks an ndarray as a raw output of a quantized op.

: Define a framework that manages keys.

: Define an encrypted data-frame framework that supports Pandas operators and parameters.

: Torch model that performs a simple addition between two inputs.

: Torch model with some branching and skip connections.

: Torch model with some branching and skip connections.

: Torch CNN model for the tests.

: Torch CNN model with grouped convolution for compile torch tests.

: Torch CNN model for the tests.

: Torch CNN model for the tests with a max pool.

: Torch CNN model for the tests.

: Concat with fancy indexing.

: Small model that uses a 1D convolution operator.

: Torch model that with two different quantizers on the input.

: PyTorch module for performing matrix multiplication between two encrypted values.

: Minimalist network that expands the input tensor to a larger size.

: Torch model for the tests.

: Torch model that should generate MatMul->Add ONNX patterns.

: Torch model that should generate MatMul->Add ONNX patterns.

: Torch model for the tests.

: Model that only adds an empty dimension at axis 0.

: Model that only adds an empty dimension at axis 0, and returns the initial input as well.

: PyTorch module for performing SGD training.

: Torch model to test multiple inputs forward.

: Torch model to test multiple inputs forward.

: Torch model to test multiple inputs with different shape in the forward pass.

: Network that applies two quantized operations on a single input.

: Multi-output model.

: Torch model to test the concat and unsqueeze operators.

: Torch QAT model that does not quantize the inputs.

: Torch model, where we reuse some elements in a loop.

: Torch QAT model that applies various padding patterns.

: A model with a QAT Module.

: Torch model that implements a simple non-uniform quantizer.

: A small quantized network with Brevitas, trained on make_classification.

: Torch QAT model that reshapes the input.

: Fake torch model used to generate some onnx.

: Torch model implements a step function that needs Greater, Cast and Where.

: Torch model that with a single conv layer that produces the output, e.g., a blur filter.

: Torch model implements a step function that needs Greater, Cast and Where.

: A very small CNN.

: A very small QAT CNN to classify the sklearn digits data-set.

: A small network with Brevitas, trained on make_classification.

: Torch model to test the ReduceSum ONNX operator in a leveled circuit.

: Torch model that calls univariate and shape functions of torch.

: An operator that mixes (adds or multiplies) together encrypted inputs.

: Base class for quantized ONNX ops implemented in numpy.

: An univariate operator of an encrypted value.

: Base ONNX to Concrete ML computation graph conversion class.

: Post-training Affine Quantization.

: Converter of Quantization Aware Training networks.

: Inference for a quantized model.

: Detect neural network patterns that can be optimized with round PBS.

: ConstantOfShape operator.

: Gather operator.

: Shape operator.

: Slice operator.

: Quantized Abs op.

: Quantized Addition operator.

: Quantized Average Pooling op.

: Quantized Batch normalization with encrypted input and in-the-clear normalization params.

: Brevitas uniform quantization with encrypted input.

: Cast the input to the required data type.

: Quantized Celu op.

: Quantized clip op.

: Concatenate operator.

: Quantized Conv op.

: Div operator /.

: Quantized Elu op.

: Comparison operator ==.

: Quantized erf op.

: Quantized Exp op.

: Expand operator for quantized tensors.

: Quantized flatten for encrypted inputs.

: Quantized Floor op.

: Quantized Gemm op.

: Comparison operator >.

: Comparison operator >=.

: Quantized HardSigmoid op.

: Quantized Hardswish op.

: Quantized Identity op.

: Quantized LeakyRelu op.

: Comparison operator <.

: Comparison operator <=.

: Quantized Log op.

: Quantized MatMul op.

: Quantized Max op.

: Quantized Max Pooling op.

: Quantized Min op.

: Multiplication operator.

: Quantized Neg op.

: Quantized Not op.

: Or operator ||.

: Quantized PRelu op.

: Quantized Padding op.

: Quantized pow op.

: ReduceSum with encrypted input.

: Quantized Relu op.

: Quantized Reshape op.

: Quantized round op.

: Quantized Selu op.

: Quantized sigmoid op.

: Quantized Neg op.

: Quantized Softplus op.

: Squeeze operator.

: Subtraction operator.

: Quantized Tanh op.

: Transpose operator for quantized inputs.

: Quantized Unfold op.

: Unsqueeze operator.

: Where operator on quantized arrays.

: Calibration set statistics.

: Options for quantization.

: Abstraction of quantized array.

: Quantization parameters for uniform quantization.

: Uniform quantizer.

: Class for p_error hyper-parameter search for classification and regression tasks.

: Base class for linear and tree-based classifiers in Concrete ML.

: Base class for all estimators in Concrete ML.

: Mixin class for tree-based classifiers.

: Mixin class for tree-based estimators.

: Mixin class for tree-based regressors.

: Mixin that provides quantization for a torch module and follows the Estimator API.

: A Mixin class for sklearn KNeighbors classifiers with FHE.

: A Mixin class for sklearn KNeighbors models with FHE.

: A Mixin class for sklearn linear classifiers with FHE.

: A Mixin class for sklearn linear models with FHE.

: A Mixin class for sklearn linear regressors with FHE.

: A Mixin class for sklearn SGD classifiers with FHE.

: A Mixin class for sklearn SGD regressors with FHE.

: A Gamma regression model with FHE.

: A Poisson regression model with FHE.

: A Tweedie regression model with FHE.

: An ElasticNet regression model with FHE.

: A Lasso regression model with FHE.

: A linear regression model with FHE.

: A logistic regression model with FHE.

: A Ridge regression model with FHE.

: An FHE linear classifier model fitted with stochastic gradient descent.

: An FHE linear regression model fitted with stochastic gradient descent.

: A k-nearest neighbors classifier model with FHE.

: A Fully-Connected Neural Network classifier with FHE.

: A Fully-Connected Neural Network regressor with FHE.

: Sparse Quantized Neural Network.

: Implements the RandomForest classifier.

: Implements the RandomForest regressor.

: A Classification Support Vector Machine (SVM).

: A Regression Support Vector Machine (SVM).

: Implements the sklearn DecisionTreeClassifier.

: Implements the sklearn DecisionTreeClassifier.

: Implements the XGBoost classifier.

: Implements the XGBoost regressor.

: Simple enum for different modes of execution of HybridModel.

: Convert a model to a hybrid model.

: Hybrid FHE Model Server.

: Placeholder type for a typical logger like the one from loguru.

: A wrapper class for the modules to be evaluated remotely with FHE.

: General interface to transform a torch.nn.Module to numpy module.

: sklearn.utils.check_X_y with an assert.

: sklearn.utils.check_X_y with an assert and multi-output handling.

: sklearn.utils.check_array with an assert.

: Provide a custom assert to check that the condition is False.

: Provide a custom assert to check that a piece of code is never reached.

: Provide a custom assert to check that the condition is True.

: Define a custom object hook that enables loading any supported serialized values.

: Dump any Concrete ML object in a file.

: Dump any object as a string.

: Dump the value into a custom dict format.

: Load any Concrete ML object that provide a load_dict method.

: Load any Concrete ML object that provide a dump_dict method.

: Indicate if all unpacked values are of a supported float dtype.

: Indicate if all unpacked values are of a supported integer dtype.

: Indicate if all unpacked values are of the specified dtype(s).

: Check if two numpy arrays are equal within a tolerances and have the same shape.

: Convert any allowed type into an array and cast it if required.

: Check the user did not set p_error or global_p_error in configuration.

: Compute the number of bits required to represent x.

: Generate a proxy function for a function accepting only *args type arguments.

: Return the class of the model (instantiated or not), which can be a partial() instance.

: Return the name of the model, which can be a partial() instance.

: Return the ONNX opset_version.

: Check if a model is a Brevitas type.

: Indicate if the model class represents a classifier.

: Indicate if a model class, which can be a partial() instance, is an element of a_list.

: Indicate if the input container is a Pandas DataFrame.

: Indicate if the input container is a Pandas Series.

: Indicate if the input container is a Pandas DataFrame or Series.

: Indicate if the model class represents a regressor.

: Return (p_error, global_p_error) that we want to give to Concrete.

: Check and process the rounding_threshold_bits parameter.

: Sanitize arg_name, replacing invalid chars by _.

: Make the input a tuple if it is not already the case.

: Check that current versions match the ones used in development.

: Fuse sequence of matmul -> add into a gemm node.

: Get the numpy equivalent forward of the provided ONNX model.

: Get the numpy equivalent forward of the provided ONNX model for tree-based models only.

: Get the numpy equivalent forward of the provided torch Module.

: Get the numpy equivalent forward of the provided ONNX model.

: Compute the output shape of a pool or conv operation.

: Compute any additional padding needed to compute pooling layers.

: Pad a tensor according to ONNX spec, using an optional custom pad value.

: Compute the average pooling normalization constant.

: Comparison operation using round_bit_pattern function.

: Remove the nodes following first node matching node_op_type from the ONNX graph.

: Remove the first node matching node_op_type and its following nodes from the ONNX graph.

: Keep the outputs given in outputs_to_keep and remove the others from the model.

: Remove identity nodes from a model.

: Remove unnecessary nodes from the ONNX graph.

: Remove unused Constant nodes in the provided onnx model.

: Simplify an ONNX model, removes unused Constant nodes and Identity nodes.

: Execute the provided ONNX graph on the given inputs.

: Execute the provided ONNX graph on the given inputs for tree-based models only.

: Get the attribute from an ONNX AttributeProto.

: Construct the qualified type name of the ONNX operator.

: Remove initializers from model inputs.

: Cast values to floating points.

: Compute abs in numpy according to ONNX spec.

: Compute acos in numpy according to ONNX spec.

: Compute acosh in numpy according to ONNX spec.

: Compute add in numpy according to ONNX spec.

: Compute asin in numpy according to ONNX spec.

: Compute sinh in numpy according to ONNX spec.

: Compute atan in numpy according to ONNX spec.

: Compute atanh in numpy according to ONNX spec.

: Compute Average Pooling using Torch.

: Compute the batch normalization of the input tensor.

: Execute ONNX cast in Numpy.

: Compute celu in numpy according to ONNX spec.

: Apply concatenate in numpy according to ONNX spec.

: Return the constant passed as a kwarg.

: Compute N-D convolution using Torch.

: Compute cos in numpy according to ONNX spec.

: Compute cosh in numpy according to ONNX spec.

: Compute div in numpy according to ONNX spec.

: Compute elu in numpy according to ONNX spec.

: Compute equal in numpy according to ONNX spec.

: Compute equal in numpy according to ONNX spec and cast outputs to floats.

: Compute erf in numpy according to ONNX spec.

: Compute exponential in numpy according to ONNX spec.

: Flatten a tensor into a 2d array.

: Compute Floor in numpy according to ONNX spec.

: Compute Gemm in numpy according to ONNX spec.

: Compute greater in numpy according to ONNX spec.

: Compute greater in numpy according to ONNX spec and cast outputs to floats.

: Compute greater or equal in numpy according to ONNX spec.

: Compute greater or equal in numpy according to ONNX specs and cast outputs to floats.

: Compute hardsigmoid in numpy according to ONNX spec.

: Compute hardswish in numpy according to ONNX spec.

: Compute identity in numpy according to ONNX spec.

: Compute leakyrelu in numpy according to ONNX spec.

: Compute less in numpy according to ONNX spec.

: Compute less in numpy according to ONNX spec and cast outputs to floats.

: Compute less or equal in numpy according to ONNX spec.

: Compute less or equal in numpy according to ONNX spec and cast outputs to floats.

: Compute log in numpy according to ONNX spec.

: Compute matmul in numpy according to ONNX spec.

: Compute Max in numpy according to ONNX spec.

: Compute Max Pooling using Torch.

: Compute Min in numpy according to ONNX spec.

: Compute mul in numpy according to ONNX spec.

: Compute Negative in numpy according to ONNX spec.

: Compute not in numpy according to ONNX spec.

: Compute not in numpy according to ONNX spec and cast outputs to floats.

: Compute or in numpy according to ONNX spec.

: Compute or in numpy according to ONNX spec and cast outputs to floats.

: Compute pow in numpy according to ONNX spec.

: Compute relu in numpy according to ONNX spec.

: Compute round in numpy according to ONNX spec.

: Compute selu in numpy according to ONNX spec.

: Compute sigmoid in numpy according to ONNX spec.

: Compute Sign in numpy according to ONNX spec.

: Compute sin in numpy according to ONNX spec.

: Compute sinh in numpy according to ONNX spec.

: Compute softmax in numpy according to ONNX spec.

: Compute softplus in numpy according to ONNX spec.

: Compute sub in numpy according to ONNX spec.

: Compute tan in numpy according to ONNX spec.

: Compute tanh in numpy according to ONNX spec.

: Compute thresholdedrelu in numpy according to ONNX spec.

: Transpose in numpy according to ONNX spec.

: Compute Unfold using Torch.

: Compute the equivalent of numpy.where.

: Compute the equivalent of numpy.where.

: Decorate a numpy onnx function to flag the raw/non quantized inputs.

: Compute rounded equal in numpy according to ONNX spec for tree-based models only.

: Compute rounded less in numpy according to ONNX spec for tree-based models only.

: Compute rounded less or equal in numpy according to ONNX spec for tree-based models only.

: Load a serialized encrypted data-frame.

: Merge two encrypted data-frames in FHE using Pandas parameters.

: Check that the given object can properly be serialized.

: Reduce size of the given data-set.

: Select n_sample random elements from a 2D NumPy array.

: Get the pytest parameters to use for testing all models available in Concrete ML.

: Get the pytest parameters to use for testing linear models.

: Get the pytest parameters to use for testing neighbor models.

: Get the pytest parameters to use for testing neural network models.

: Get the pytest parameters to use for testing tree-based models.

: Instantiate any Concrete ML model type.

: Load an object saved with torch.save() from a file or dict.

: Determine if both data-frames are identical.

: Indicate if two values are equal.

: Convert the n_bits parameter into a proper dictionary.

: Fill a parameter set structure from kwargs parameters.

: Get the quantized module of a given model in FHE, simulated or not.

: Add transpose after last node.

: Assert if an Add node with a specific constant exists in the ONNX graph.

: Create ONNX model with Hummingbird convert method.

: Build a FHE-compliant onnx-model using a fitted scikit-learn model.

: Apply post-processing from the graph.

: Apply pre-processing onto the ONNX graph.

: Convert the tree inference to a numpy functions using Hummingbird.

: Pre-process tree values.

: Workaround to fix torch issue that does not export the proper axis in the ONNX squeeze node.

: Build a quantized module from a Torch or ONNX model.

: Compile a Brevitas Quantization Aware Training model.

: Compile a torch module into an FHE equivalent.

: Compile a torch module into an FHE equivalent.

: Convert a torch tensor or a numpy array to a numpy array.

: Check if a torch model has QNN layers.

: Convert all Conv1D layers in a module or a Conv1D layer itself to nn.Linear.

: Convert a tuple to a string representation.

: Convert a a string representation of a tuple to a tuple.

Brevitas
advanced QAT tutorial
detailed explanation
linear models
tree-based models
neural networks
Production Deployment
Distiller documentation
Classifier Comparison notebook
Quantization Aware Training
pruning
this guide
fhe.Exactness.APPROXIMATE
ONNX
PyTorch
Hummingbird
skorch
NumpyModule
QuantizedModule
QuantizedOp
FHE-friendly model documentation
Netron
described here
client/server API
supported operation set
built-in models
neural networks
framework's pruning mechanism
table lookup
Fully Connected Neural Networks
Hummingbird
the Hummingbird documentation
skorch
pruning
neural network documentation
in the API guide
Brevitas
PyTorch supported layers page
"integer only" Brevitas quantization
QuantizationAwareTraining.ipynb
ConvolutionalNeuralNetwork.ipynb
SparseQuantNeuralNetImpl
cryptography concepts
FHE simulation feature
here
Built-in models
Custom models
Concrete's simulation
simulation functionality
bitwidth_and_range_report
simulated
Hummingbird
here
supports the following ONNX operators

Documentation

Using GitBook

Documentation with GitBook is done mainly by pushing content on GitHub. GitBook then pulls the docs from the repository and publishes. In most cases, GitBook is just a mirror of what is available in GitHub.

There are, however, some use-cases where documentation can be modified directly in GitBook (and, then, push the modifications to GitHub), for example when the documentation is modified by a person outside of Zama. In this case, a GitHub branch is created, and a GitHub space is associated to it: modifications are done in this space and automatically pushed to the branch. Once the modifications have been completed, one can simply create a pull-request, to finally merge modifications on the main branch.

concrete.ml.common
concrete.ml.common.check_inputs
concrete.ml.common.debugging
concrete.ml.common.debugging.custom_assert
concrete.ml.common.serialization
concrete.ml.common.serialization.decoder
concrete.ml.common.serialization.dumpers
concrete.ml.common.serialization.encoder
concrete.ml.common.serialization.loaders
concrete.ml.common.utils
concrete.ml.deployment
concrete.ml.deployment.fhe_client_server
concrete.ml.onnx
concrete.ml.onnx.convert
concrete.ml.onnx.onnx_impl_utils
concrete.ml.onnx.onnx_model_manipulations
concrete.ml.onnx.onnx_utils
concrete.ml.onnx.ops_impl
concrete.ml.pandas
concrete.ml.pandas.client_engine
concrete.ml.pandas.dataframe
concrete.ml.pytest
concrete.ml.pytest.torch_models
concrete.ml.pytest.utils
concrete.ml.quantization
concrete.ml.quantization.base_quantized_op
concrete.ml.quantization.post_training
concrete.ml.quantization.quantized_module
concrete.ml.quantization.quantized_module_passes
concrete.ml.quantization.quantized_ops
concrete.ml.quantization.quantizers
concrete.ml.search_parameters
concrete.ml.search_parameters.p_error_search
concrete.ml.sklearn
concrete.ml.sklearn.base
concrete.ml.sklearn.glm
concrete.ml.sklearn.linear_model
concrete.ml.sklearn.neighbors
concrete.ml.sklearn.qnn
concrete.ml.sklearn.qnn_module
concrete.ml.sklearn.rf
concrete.ml.sklearn.svm
concrete.ml.sklearn.tree
concrete.ml.sklearn.tree_to_numpy
concrete.ml.sklearn.xgb
concrete.ml.torch
concrete.ml.torch.compile
concrete.ml.torch.hybrid_model
concrete.ml.torch.numpy_module
concrete.ml.version
decoder.ConcreteDecoder
encoder.ConcreteEncoder
utils.FheMode
fhe_client_server.DeploymentMode
fhe_client_server.FHEModelClient
fhe_client_server.FHEModelDev
fhe_client_server.FHEModelServer
ops_impl.ONNXMixedFunction
ops_impl.RawOpOutput
client_engine.ClientEngine
dataframe.EncryptedDataFrame
torch_models.AddNet
torch_models.BranchingGemmModule
torch_models.BranchingModule
torch_models.CNN
torch_models.CNNGrouped
torch_models.CNNInvalid
torch_models.CNNMaxPool
torch_models.CNNOther
torch_models.ConcatFancyIndexing
torch_models.Conv1dModel
torch_models.DoubleQuantQATMixNet
torch_models.EncryptedMatrixMultiplicationModel
torch_models.ExpandModel
torch_models.FC
torch_models.FCSeq
torch_models.FCSeqAddBiasVec
torch_models.FCSmall
torch_models.IdentityExpandModel
torch_models.IdentityExpandMultiOutputModel
torch_models.ManualLogisticRegressionTraining
torch_models.MultiInputNN
torch_models.MultiInputNNConfigurable
torch_models.MultiInputNNDifferentSize
torch_models.MultiOpOnSingleInputConvNN
torch_models.MultiOutputModel
torch_models.NetWithConcatUnsqueeze
torch_models.NetWithConstantsFoldedBeforeOps
torch_models.NetWithLoops
torch_models.PaddingNet
torch_models.PartialQATModel
torch_models.QATTestModule
torch_models.QuantCustomModel
torch_models.ShapeOperationsNet
torch_models.SimpleNet
torch_models.SimpleQAT
torch_models.SingleMixNet
torch_models.StepActivationModule
torch_models.TinyCNN
torch_models.TinyQATCNN
torch_models.TorchCustomModel
torch_models.TorchSum
torch_models.UnivariateModule
base_quantized_op.QuantizedMixingOp
base_quantized_op.QuantizedOp
base_quantized_op.QuantizedOpUnivariateOfEncrypted
post_training.ONNXConverter
post_training.PostTrainingAffineQuantization
post_training.PostTrainingQATImporter
quantized_module.QuantizedModule
quantized_module_passes.PowerOfTwoScalingRoundPBSAdapter
quantized_ops.ONNXConstantOfShape
quantized_ops.ONNXGather
quantized_ops.ONNXShape
quantized_ops.ONNXSlice
quantized_ops.QuantizedAbs
quantized_ops.QuantizedAdd
quantized_ops.QuantizedAvgPool
quantized_ops.QuantizedBatchNormalization
quantized_ops.QuantizedBrevitasQuant
quantized_ops.QuantizedCast
quantized_ops.QuantizedCelu
quantized_ops.QuantizedClip
quantized_ops.QuantizedConcat
quantized_ops.QuantizedConv
quantized_ops.QuantizedDiv
quantized_ops.QuantizedElu
quantized_ops.QuantizedEqual
quantized_ops.QuantizedErf
quantized_ops.QuantizedExp
quantized_ops.QuantizedExpand
quantized_ops.QuantizedFlatten
quantized_ops.QuantizedFloor
quantized_ops.QuantizedGemm
quantized_ops.QuantizedGreater
quantized_ops.QuantizedGreaterOrEqual
quantized_ops.QuantizedHardSigmoid
quantized_ops.QuantizedHardSwish
quantized_ops.QuantizedIdentity
quantized_ops.QuantizedLeakyRelu
quantized_ops.QuantizedLess
quantized_ops.QuantizedLessOrEqual
quantized_ops.QuantizedLog
quantized_ops.QuantizedMatMul
quantized_ops.QuantizedMax
quantized_ops.QuantizedMaxPool
quantized_ops.QuantizedMin
quantized_ops.QuantizedMul
quantized_ops.QuantizedNeg
quantized_ops.QuantizedNot
quantized_ops.QuantizedOr
quantized_ops.QuantizedPRelu
quantized_ops.QuantizedPad
quantized_ops.QuantizedPow
quantized_ops.QuantizedReduceSum
quantized_ops.QuantizedRelu
quantized_ops.QuantizedReshape
quantized_ops.QuantizedRound
quantized_ops.QuantizedSelu
quantized_ops.QuantizedSigmoid
quantized_ops.QuantizedSign
quantized_ops.QuantizedSoftplus
quantized_ops.QuantizedSqueeze
quantized_ops.QuantizedSub
quantized_ops.QuantizedTanh
quantized_ops.QuantizedTranspose
quantized_ops.QuantizedUnfold
quantized_ops.QuantizedUnsqueeze
quantized_ops.QuantizedWhere
quantizers.MinMaxQuantizationStats
quantizers.QuantizationOptions
quantizers.QuantizedArray
quantizers.UniformQuantizationParameters
quantizers.UniformQuantizer
p_error_search.BinarySearch
base.BaseClassifier
base.BaseEstimator
base.BaseTreeClassifierMixin
base.BaseTreeEstimatorMixin
base.BaseTreeRegressorMixin
base.QuantizedTorchEstimatorMixin
base.SklearnKNeighborsClassifierMixin
base.SklearnKNeighborsMixin
base.SklearnLinearClassifierMixin
base.SklearnLinearModelMixin
base.SklearnLinearRegressorMixin
base.SklearnSGDClassifierMixin
base.SklearnSGDRegressorMixin
glm.GammaRegressor
glm.PoissonRegressor
glm.TweedieRegressor
linear_model.ElasticNet
linear_model.Lasso
linear_model.LinearRegression
linear_model.LogisticRegression
linear_model.Ridge
linear_model.SGDClassifier
linear_model.SGDRegressor
neighbors.KNeighborsClassifier
qnn.NeuralNetClassifier
qnn.NeuralNetRegressor
qnn_module.SparseQuantNeuralNetwork
rf.RandomForestClassifier
rf.RandomForestRegressor
svm.LinearSVC
svm.LinearSVR
tree.DecisionTreeClassifier
tree.DecisionTreeRegressor
xgb.XGBClassifier
xgb.XGBRegressor
hybrid_model.HybridFHEMode
hybrid_model.HybridFHEModel
hybrid_model.HybridFHEModelServer
hybrid_model.LoggerStub
hybrid_model.RemoteModule
numpy_module.NumpyModule
check_inputs.check_X_y_and_assert
check_inputs.check_X_y_and_assert_multi_output
check_inputs.check_array_and_assert
custom_assert.assert_false
custom_assert.assert_not_reached
custom_assert.assert_true
decoder.object_hook
dumpers.dump
dumpers.dumps
encoder.dump_name_and_value
loaders.load
loaders.loads
utils.all_values_are_floats
utils.all_values_are_integers
utils.all_values_are_of_dtype
utils.array_allclose_and_same_shape
utils.check_dtype_and_cast
utils.check_there_is_no_p_error_options_in_configuration
utils.compute_bits_precision
utils.generate_proxy_function
utils.get_model_class
utils.get_model_name
utils.get_onnx_opset_version
utils.is_brevitas_model
utils.is_classifier_or_partial_classifier
utils.is_model_class_in_a_list
utils.is_pandas_dataframe
utils.is_pandas_series
utils.is_pandas_type
utils.is_regressor_or_partial_regressor
utils.manage_parameters_for_pbs_errors
utils.process_rounding_threshold_bits
utils.replace_invalid_arg_name_chars
utils.to_tuple
fhe_client_server.check_concrete_versions
convert.fuse_matmul_bias_to_gemm
convert.get_equivalent_numpy_forward_from_onnx
convert.get_equivalent_numpy_forward_from_onnx_tree
convert.get_equivalent_numpy_forward_from_torch
convert.preprocess_onnx_model
onnx_impl_utils.compute_conv_output_dims
onnx_impl_utils.compute_onnx_pool_padding
onnx_impl_utils.numpy_onnx_pad
onnx_impl_utils.onnx_avgpool_compute_norm_const
onnx_impl_utils.rounded_comparison
onnx_model_manipulations.clean_graph_after_node_op_type
onnx_model_manipulations.clean_graph_at_node_op_type
onnx_model_manipulations.keep_following_outputs_discard_others
onnx_model_manipulations.remove_identity_nodes
onnx_model_manipulations.remove_node_types
onnx_model_manipulations.remove_unused_constant_nodes
onnx_model_manipulations.simplify_onnx_model
onnx_utils.execute_onnx_with_numpy
onnx_utils.execute_onnx_with_numpy_trees
onnx_utils.get_attribute
onnx_utils.get_op_type
onnx_utils.remove_initializer_from_input
ops_impl.cast_to_float
ops_impl.numpy_abs
ops_impl.numpy_acos
ops_impl.numpy_acosh
ops_impl.numpy_add
ops_impl.numpy_asin
ops_impl.numpy_asinh
ops_impl.numpy_atan
ops_impl.numpy_atanh
ops_impl.numpy_avgpool
ops_impl.numpy_batchnorm
ops_impl.numpy_cast
ops_impl.numpy_celu
ops_impl.numpy_concatenate
ops_impl.numpy_constant
ops_impl.numpy_conv
ops_impl.numpy_cos
ops_impl.numpy_cosh
ops_impl.numpy_div
ops_impl.numpy_elu
ops_impl.numpy_equal
ops_impl.numpy_equal_float
ops_impl.numpy_erf
ops_impl.numpy_exp
ops_impl.numpy_flatten
ops_impl.numpy_floor
ops_impl.numpy_gemm
ops_impl.numpy_greater
ops_impl.numpy_greater_float
ops_impl.numpy_greater_or_equal
ops_impl.numpy_greater_or_equal_float
ops_impl.numpy_hardsigmoid
ops_impl.numpy_hardswish
ops_impl.numpy_identity
ops_impl.numpy_leakyrelu
ops_impl.numpy_less
ops_impl.numpy_less_float
ops_impl.numpy_less_or_equal
ops_impl.numpy_less_or_equal_float
ops_impl.numpy_log
ops_impl.numpy_matmul
ops_impl.numpy_max
ops_impl.numpy_maxpool
ops_impl.numpy_min
ops_impl.numpy_mul
ops_impl.numpy_neg
ops_impl.numpy_not
ops_impl.numpy_not_float
ops_impl.numpy_or
ops_impl.numpy_or_float
ops_impl.numpy_pow
ops_impl.numpy_relu
ops_impl.numpy_round
ops_impl.numpy_selu
ops_impl.numpy_sigmoid
ops_impl.numpy_sign
ops_impl.numpy_sin
ops_impl.numpy_sinh
ops_impl.numpy_softmax
ops_impl.numpy_softplus
ops_impl.numpy_sub
ops_impl.numpy_tan
ops_impl.numpy_tanh
ops_impl.numpy_thresholdedrelu
ops_impl.numpy_transpose
ops_impl.numpy_unfold
ops_impl.numpy_where
ops_impl.numpy_where_body
ops_impl.onnx_func_raw_args
ops_impl.rounded_numpy_equal_for_trees
ops_impl.rounded_numpy_less_for_trees
ops_impl.rounded_numpy_less_or_equal_for_trees
pandas.load_encrypted_dataframe
pandas.merge
utils.check_serialization
utils.data_calibration_processing
utils.get_random_samples
utils.get_sklearn_all_models_and_datasets
utils.get_sklearn_linear_models_and_datasets
utils.get_sklearn_neighbors_models_and_datasets
utils.get_sklearn_neural_net_models_and_datasets
utils.get_sklearn_tree_models_and_datasets
utils.instantiate_model_generic
utils.load_torch_model
utils.pandas_dataframe_are_equal
utils.values_are_equal
post_training.get_n_bits_dict
quantizers.fill_from_kwargs
p_error_search.compile_and_simulated_fhe_inference
tree_to_numpy.add_transpose_after_last_node
tree_to_numpy.assert_add_node_and_constant_in_xgboost_regressor_graph
tree_to_numpy.get_onnx_model
tree_to_numpy.onnx_fp32_model_to_quantized_model
tree_to_numpy.preprocess_tree_predictions
tree_to_numpy.tree_onnx_graph_preprocessing
tree_to_numpy.tree_to_numpy
tree_to_numpy.tree_values_preprocessing
tree_to_numpy.workaround_squeeze_node_xgboost
compile.build_quantized_module
compile.compile_brevitas_qat_model
compile.compile_onnx_model
compile.compile_torch_model
compile.convert_torch_tensor_or_numpy_array_to_numpy_array
compile.has_any_qnn_layers
hybrid_model.convert_conv1d_to_linear
hybrid_model.tuple_to_underscore_str
hybrid_model.underscore_str_to_tuple

Support new ONNX node

Concrete ML supports a wide range of models through the integration of ONNX nodes. In case a specific ONNX node is missing, developers need to add support for the new ONNX nodes.

Operator Implementation

Floating-point Implementation

Operator Mapping

Quantized Operator

There exist two types of quantized operators:

  • Univariate Non-Linear Operators: Such operator applies transformation on every element of the input without changing its shape. Sigmoid, Tanh, ReLU are examples of such operation. The sigmoid in this file is simply supported as follows:

class QuantizedSigmoid(QuantizedOp):
    """Quantized sigmoid op."""

    _impl_for_op_named: str = "Sigmoid"
  • Linear Layers: Linear layers like Gemm and Conv require specific implementations for integer arithmetic. Please refer to the QuantizedGemm and QuantizedConv implementations for reference.

Adding Tests

Proper testing is essential to ensure the correctness of the new ONNX node support.

There are many locations where tests can be added:

Update Documentation

Finally, update the documentation to reflect the newly supported ONNX node.

Installation

This document provides guides on how to install Concrete ML using PyPi or Docker.

Prerequisite

Before you start, determine your environment:

  • Hardware platform

  • Operating System (OS) version

  • Python version

OS/HW support

Depending on your OS/HW, Concrete ML may be installed with Docker or with pip:

OS / HW
Available on Docker
Available on pip

Linux

Yes

Yes

Windows

Yes

No

Windows Subsystem for Linux

Yes

Yes

macOS 11+ (Intel)

Yes

Yes

macOS 11+ (Apple Silicon: M1, M2, etc.)

Coming soon

Yes

Python support

  • Version: In the current release, Concrete ML supports only 3.8, 3.9 and 3.10 versions of python.

  • Linux requirement: The Concrete ML Python package requires glibc >= 2.28. On Linux, you can check your glibc version by running ldd --version.

Most of these limits are shared with the rest of the Concrete stack (namely Concrete Python). Support for more platforms will be added in the future.

Installation using PyPi

Requirements

Installing Concrete ML using PyPi requires a Linux-based OS or macOS (both x86 and Apple Silicon CPUs are supported).

If you need to install on Windows, use Docker or WSL. On WSL, Concrete ML will work as long as the package is not installed in the `/mnt/c/` directory, which corresponds to the host OS filesystem.

Installation

To install Concrete ML from PyPi, run the following:

pip install -U pip wheel setuptools
pip install concrete-ml

This will automatically install all dependencies, notably Concrete.

Installation using Docker

You can install Concrete ML using Docker by either pulling the latest image or a specific version:

docker pull zamafhe/concrete-ml:latest
# or
docker pull zamafhe/concrete-ml:v0.4.0
# Without local volume:
docker run --rm -it -p 8888:8888 zamafhe/concrete-ml

# With local volume to save notebooks on host:
docker run --rm -it -p 8888:8888 -v /host/path:/data zamafhe/concrete-ml

This will launch a Concrete ML enabled Jupyter server in Docker that can be accessed directly from a browser.

Alternatively, you can launch a shell in Docker, with or without volumes:

docker run --rm -it zamafhe/concrete-ml /bin/bash

Zama 5-Question Developer Survey

Welcome

Concrete ML is an open-source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE).

Get started

Learn the basics of Concrete ML, set it up, and make it run with ease.

Build with Concrete ML

Start building with Concrete ML by exploring its core features, discovering essential guides, and learning more with user-friendly tutorials.

Explore more

Access to additional resources and join the Zama community.

References & Explanations

Refer to the API, review product architecture, and access additional resources for in-depth explanations while working with Concrete ML.

Support channels

Ask technical questions and discuss with the community. Our team of experts usually answers within 24 hours in working days.

Developers

Collaborate with us to advance the FHE spaces and drive innovation together.


Zama 5-Question Developer Survey

Encrypted dataframe

This document introduces how to construct and perform operations on encrypted DataFrames using Fully Homomorphic Encryption (FHE).

Introduction

Encrypted DataFrames are a storage format for encrypted tabular data. You can exchange encrypted DataFrames with third parties to collaborate without privacy risks. Potential applications include:

  • Encrypt storage of tabular data-sets

  • Joint data analysis efforts between multiple parties

  • Data preparation steps before machine learning tasks, such as inference or training

  • Secure outsourcing of data analysis to untrusted third parties

Encryption and decryption

To encrypt a pandas DataFrame, construct a ClientEngine that manages keys and then call the encrypt_from_pandas function:

from concrete.ml.pandas import ClientEngine
from io import StringIO
import pandas

data_left = """index,total_bill,tip,sex,smoker
1,12.54,2.5,Male,No
2,11.17,1.5,Female,No
3,20.29,2.75,Female,No
"""

# Load your pandas DataFrame
df = pandas.read_csv(StringIO(data_left))

# Obtain client object
client = ClientEngine(keys_path="my_keys")

# Encrypt the DataFrame
df_encrypted = client.encrypt_from_pandas(df)

# Decrypt the DataFrame to produce a pandas DataFrame
df_decrypted = client.decrypt_to_pandas(df_encrypted)

Supported data types and schema definition

  • Integer: Integers are supported within a specific range determined by the encryption scheme's quantization parameters. Default range is 1 to 15. 0 being used for the NaN. Values outside this range will cause a ValueError to be raised during the pre-processing stage.

  • Quantized Float: Floating-point numbers are quantized to integers within the supported range. This is achieved by computing a scale and zero point for each column, which are used to map the floating-point numbers to the quantized integer space.

  • String Enum: String columns are mapped to integers starting from 1. This mapping is stored and later used for de-quantization. If the number of unique strings exceeds 15, a ValueError is raised.

Using a user-defined schema

Before encryption, the data is preprocessed. For example string enums first need to be mapped to integers, and floating point values must be quantized. By default, this mapping is done automatically. However, when two different clients encrypt their data separately, the automatic mappings may differ, possibly due to some missing values in one of the client's DataFrame. Thus the column can not be selected when merging encrypted DataFrames.

The encrypted DataFrame supports user-defined mappings. These schemas are defined as a dictionary where keys represent column names and values contain meta-data about the column. Supported column meta-data are:

  • string columns: mapping between string values and integers.

  • float columns: the min/max range that the column values lie in.

schema = {
    "string_column": {"abc": 1, "bcd": 2 },
    "float_column": {"min": 0.1, "max": 0.5 }
}

Supported operations

Encrypted DataFrame is designed to support a subset of operations that are available for pandas DataFrames. For now, only the merge operation is supported. More operations will be added in the future releases.

Merge operation

Merge operation allows you to left or right join two DataFrames.

[!NOTE] The merge operation on Encrypted DataFrames can be securely performed on a third-party server, meaning that the server can execute the merge without ever having access to the unencrypted data. The server only requires the encrypted DataFrames.

df_right = """index,day,time,size
2,Thur,Lunch,2
5,Sat,Dinner,3
9,Sun,Dinner,2"""

# Encrypt the DataFrame
df_encrypted2 = client.encrypt_from_pandas(pandas.read_csv(StringIO(df_right)))

df_encrypted_merged = df_encrypted.merge(df_encrypted2, how="left", on="index")

Serialization

To save or load an encrypted DataFrame from a file, use the following commands:

from concrete.ml.pandas import load_encrypted_dataframe

# Save
df_encrypted_merged.save("df_encrypted_merged")

# Load
df_encrypted_merged = load_encrypted_dataframe("df_encrypted_merged")

# Decrypt the DataFrame
df_decrypted = client.decrypt_to_pandas(df_encrypted)

Error handling

During the pre-processing and post-processing stages, the ValueError can happen in the following situations:

  • A column contains values outside the allowed range for integers

  • Too many unique strings

  • Unsupported data type by Concrete ML

  • Unsupported data type by the operation attempted

Example workflow

Current limitations

While this API offers a new secure way to work on remotely stored and encrypted data, it has some strong limitations at the moment:

  • Precision of Values: The precision for numerical values is limited to 4 bits.

  • Supported Operations: The merge operation is the only one available.

  • Index Handling: Index values are not preserved; users should move any relevant data from the index to a dedicated new column before encrypting.

  • Integer Range: The range of integers that can be encrypted is between 1 and 15.

  • Uniqueness for merge: The merge operation requires that the columns to merge on contain unique values. Currently this means that data-frames are limited to 15 rows.

  • Metadata Security: Column names and the mapping of strings to integers are not encrypted and are sent to the server in clear text.

Neural networks

This document introduces the simple built-in neural networks models that Concrete ML provides with a scikit-learn interface through the NeuralNetClassifier and NeuralNetRegressor classes.

Supported models

Concrete ML
scikit-learn

Concrete ML models are multi-layer, fully-connected, networks with customizable activation functions and have a number of neurons in each layer. This approach is similar to what is available in scikit-learn when using the MLPClassifier/MLPRegressor classes. The built-in models train easily with a single call to .fit(), which will automatically quantize weights and activations. These models use Quantization Aware Training, allowing good performance for low precision (down to 2-3 bits) weights and activations.

Example

To create an instance of a Fully Connected Neural Network (FCNN), you need to instantiate one of the NeuralNetClassifier and NeuralNetRegressor classes and configure a number of parameters that are passed to their constructor.

Note that some parameters need to be prefixed by module__, while others don't. The parameters related to the model must have the prefix, such as the underlying nn.Module. The parameters related to training options do not require the prefix.

from concrete.ml.sklearn import NeuralNetClassifier
import torch.nn as nn

n_inputs = 10
n_outputs = 2
params = {
    "module__n_layers": 2,
    "max_epochs": 10,
}

concrete_classifier = NeuralNetClassifier(**params)

The folowing figure shows the Concrete ML neural network trained with Quantization Aware Training in an FHE-compatible configuration and compares it to the floating-point equivalent trained with scikit-learn.

Architecture parameters

  • module__n_layers: number of layers in the FCNN.

    • This parameter must be at least 1. Note that this is the total number of layers. For a single, hidden layer NN model, set module__n_layers=2

  • module__activation_function: can be one of the Torch activations (such as nn.ReLU)

    • Neural networks with nn.ReLU activation benefit from specific optimizations that make them around 10x faster than networks with other activation functions.

Quantization parameters

  • n_w_bits (default 3): number of bits for weights

  • n_a_bits (default 3): number of bits for activations and inputs

  • n_accum_bits: maximum accumulator bit-width that is desired

  • power_of_two_scaling (default True): forces quantization scales to be powers-of-two

    • When coupled with the ReLU activation, this optimize strongly the FHE inference time.

Training parameters (from skorch)

  • max_epochs (default 10): The number of epochs to train the network

  • verbose (default: False): Whether to log loss/metrics during training

  • lr (default 0.001): Learning rate

Advanced parameters

  • module__n_hidden_neurons_multiplier (default 4): The number of hidden neurons.

    • This parameter will be automatically set proportional to the dimensionality of the input. It controls the proportionality factor. This value gives good accuracy while avoiding accumulator overflow.

Class weights

You can give weights to each class to use in training. Note that this must be supported by the underlying PyTorch loss function.

    from sklearn.utils.class_weight import compute_class_weight
    params["criterion__weight"] = compute_class_weight("balanced", classes=classes, y=y_train)

Overflow errors

The n_accum_bits parameter influences training accuracy by controlling the number of non-zero neurons allowed in each layer. You can increase n_accum_bits to improve accuracy, but must consider the precision limitations to avoid an overflow in the accumulator. The default value is a balanced choice that generally avoids overflow, but you may need to adjust it to reduce the network breadth if you encounter overflow errors.

The number of neurons in intermediate layers is controlled by the n_hidden_neurons_multiplier parameter. A value of 1 makes intermediate layers have the same number of neurons as the number as the input data dimensions.

Using Torch

  • Post-training Quantization: This mode allows a vanilla PyTorch model to be compiled. However, when quantizing weights & activations to fewer than 7 bits, the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with compile_torch_model.

Quantization-aware training

The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy. To use QAT, Brevitas QuantIdentity nodes must be inserted in the PyTorch model, including one that quantizes the input of the forward function.

import brevitas.nn as qnn
import torch.nn as nn
import torch

N_FEAT = 12
n_bits = 3

class QATSimpleNet(nn.Module):
    def __init__(self, n_hidden):
        super().__init__()

        self.quant_inp = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
        self.fc1 = qnn.QuantLinear(N_FEAT, n_hidden, True, weight_bit_width=n_bits, bias_quant=None)
        self.quant2 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
        self.fc2 = qnn.QuantLinear(n_hidden, n_hidden, True, weight_bit_width=n_bits, bias_quant=None)
        self.quant3 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
        self.fc3 = qnn.QuantLinear(n_hidden, 2, True, weight_bit_width=n_bits, bias_quant=None)

    def forward(self, x):
        x = self.quant_inp(x)
        x = self.quant2(torch.relu(self.fc1(x)))
        x = self.quant3(torch.relu(self.fc2(x)))
        x = self.fc3(x)
        return x
from concrete.ml.torch.compile import compile_brevitas_qat_model
import numpy

torch_input = torch.randn(100, N_FEAT)
torch_model = QATSimpleNet(30)
quantized_module = compile_brevitas_qat_model(
    torch_model, # our model
    torch_input, # a representative input-set to be used for both quantization and compilation
    rounding_threshold_bits={"n_bits": 6, "method": "approximate"}
)

Post-training quantization

The following example uses a simple PyTorch model that implements a fully connected neural network with two hidden layers. The model is compiled to use FHE using compile_torch_model.

import torch.nn as nn
import torch

N_FEAT = 12
n_bits = 6

class PTQSimpleNet(nn.Module):
    def __init__(self, n_hidden):
        super().__init__()

        self.fc1 = nn.Linear(N_FEAT, n_hidden)
        self.fc2 = nn.Linear(n_hidden, n_hidden)
        self.fc3 = nn.Linear(n_hidden, 2)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

from concrete.ml.torch.compile import compile_torch_model
import numpy

torch_input = torch.randn(100, N_FEAT)
torch_model = PTQSimpleNet(5)
quantized_module = compile_torch_model(
    torch_model, # our model
    torch_input, # a representative input-set to be used for both quantization and compilation
    n_bits=6,
    rounding_threshold_bits={"n_bits": 6, "method": "approximate"}
)

Configuring quantization parameters

With QAT (the PyTorch/Brevitas models created following the example above), you need to configure quantization parameters such as bit_width (activation bit-width) and weight_bit_width. When using this mode, set n_bits=None in the compile_brevitas_qat_model.

With PTQ, you need to set the n_bits value in the compile_torch_model function and must manually determine the trade-off between accuracy, FHE compatibility, and latency.

The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.

Running encrypted inference

The model can now perform encrypted inference.

x_test = numpy.array([numpy.random.randn(N_FEAT)])

y_pred = quantized_module.forward(x_test, fhe="execute")

In this example, the input values x_test and the predicted values y_pred are floating points. The quantization (resp. de-quantization) step is done in the clear within the forward method, before (resp. after) any FHE computations.

Simulated FHE Inference in the clear

  • quantized_module.forward(quantized_x, fhe="simulate"): simulates FHE execution taking into account Table Lookup errors. De-quantization must be done in a second step as for actual FHE execution. Simulation takes into account the p_error/global_p_error parameters

  • quantized_module.forward(quantized_x, fhe="disable"): computes predictions in the clear on quantized data, and then de-quantize the result. The return value of this function contains the de-quantized (float) output of running the model in the clear. Calling this function on clear data is useful when debugging, but this does not perform actual FHE simulation.

Supported operators and activations

Concrete ML supports a variety of PyTorch operators that can be used to build fully connected or convolutional neural networks, with normalization and activation layers. Moreover, many element-wise operators are supported.

Operators

Univariate operators

Shape modifying operators

Tensor operators

Multi-variate operators: encrypted input and unencrypted constants

Concrete ML also supports some of their QAT equivalents from Brevitas.

  • brevitas.nn.QuantLinear

  • brevitas.nn.QuantConv1d

  • brevitas.nn.QuantConv2d

Multi-variate operators: encrypted+unencrypted or encrypted+encrypted inputs

Quantizers

  • brevitas.nn.QuantIdentity

Activation functions

The equivalent versions from torch.functional are also supported.

Zama 5-Question Developer Survey

Optimizing inference

Neural networks pose unique challenges with regards to encrypted inference. Each neuron in a network applies an activation function that requires a PBS operation. The latency of a single PBS depends on the bit-width of the input of the PBS.

Several approaches can be used to reduce the overall latency of a neural network.

Circuit bit-width optimization

Structured pruning

Rounded activations and quantizers

TLU error tolerance adjustment

Deep learning examples

FHE constraints considerations

Some examples constrain accumulators to 7-8 bits, which can be sufficient for simple data-sets. Up to 16-bit accumulators can be used, but this introduces a slowdown of 4-5x compared to 8-bit accumulators.

List of Examples

1. Step-by-step guide to building a custom NN

This shows how to use Quantization Aware Training and pruning when starting out from a classical PyTorch network. This example uses a simple data-set and a small NN, which achieves good accuracy with low accumulator size.

Hybrid models

FHE enables cloud applications to process private user data without running the risk of data leaks. Furthermore, deploying ML models in the cloud is advantageous as it eases model updates, allows to scale to large numbers of users by using large amounts of compute power, and protects model IP by keeping the model on a trusted server instead of the client device.

However, not all applications can be easily converted to FHE computation and the computation cost of FHE may make a full conversion exceed latency requirements.

Hybrid models provide a balance between on-device deployment and cloud-based deployment. This approach entails executing parts of the model directly on the client side, while other parts are securely processed with FHE on the server side. Concrete ML facilitates the hybrid deployment of various neural network models, including MLP (multilayer perceptron), CNN (convolutional neural network), and Large Language Models.

If model IP protection is important, care must be taken in choosing the parts of a model to be executed on the cloud. Some black-box model stealing attacks rely on knowledge distillation or on differential methods. As a general rule, the difficulty to steal a machine learning model is proportional to the size of the model, in terms of numbers of parameters and model depth.

Compilation

To use hybrid model deployment, the first step is to define what part of the PyTorch neural network model must be executed in FHE. The model part must be a nn.Module and is identified by its key in the original model's .named_modules().

Server Side Deployment

Client Side

A client application that deploys a model with hybrid deployment can be developed in a very similar manner to on-premise deployment: the model is loaded normally with PyTorch, but an extra step is required to specify the remote endpoint and the model parts that are to be executed remotely.

When the client application is ready to make inference requests to the server, it must set the operation mode of the HybridFHEModel instance to HybridFHEMode.REMOTE:

When performing inference with the HybridFHEModel instance, hybrid_model, only the regular forward method is called, as if the model was fully deployed locally:

When calling forward, the HybridFHEModel handles, for each model part that is deployed remotely, all the necessary intermediate steps: quantizing the data, encrypting it, makes the request to the server using requests Python module, decrypting and de-quantizing the result.

IND-CPA
fhe.Exactness.APPROXIMATE
fhe.Exactness.APPROXIMATE
paragraph above

The file is responsible for implementing the computation of ONNX operators using floating-point arithmetic. The implementation should mirror the behavior of the corresponding ONNX operator precisely. This includes adhering to the expected inputs, outputs, and operational semantics.

Refer to the to grasp the expected behavior, inputs and outputs of the operator.

After implementing the operator in , you need to import it into and map it within the ONNX_OPS_TO_NUMPY_IMPL dictionary. This mapping is crucial for the framework to recognize and utilize the new operator.

Quantized operators are defined in and are used to handle integer arithmetic. Their implementation is required for the new ONNX to be executed in FHE.

: Tests the implementation of the ONNX node in floating points.

: Tests the implementation of the ONNX node in integer arithmetic.

Optional: : Tests the implementation of a specific torch model that contains the new ONNX operator. The model needs to be added in .

Kaggle installation: Concrete ML can be installed on Kaggle () and on Google Colab.

If you encounter any issue during installation on Apple Silicon mac, please visit this .

You can use the image with Docker volumes, . Use the following command:

We want to hear from you! Take 1 minute to share your thoughts and helping us enhance our documentation and libraries. 👉 to participate.

We want to hear from you! Take 1 minute to share your thoughts and helping us enhance our documentation and libraries. 👉 to participate.

You can serialize encrypted DataFrame objects to a file format for storage or transfer. When serialized, they contain the encrypted data and necessary to perform computations.

[!NOTE] Serialized DataFrames do not contain any . The DataFrames can be exchanged with any third-party without any risk.

An example workflow where two clients encrypt two DataFrame objects, perform a merge operation on the server side, and then decrypt the results is available in the notebook .

The neural network models are implemented with , which provides a scikit-learn-like interface to Torch models (more ).

While NeuralNetClassifier and NeuralNetClassifier provide scikit-learn-like models, their architecture is somewhat restricted to make training easy and robust. If you need more advanced models, you can convert custom neural networks as described in the .

Good quantization parameter values are critical to make models . Weights and activations should be quantized to low precision (e.g., 2-4 bits). The sparsity of the network can be tuned to avoid accumulator overflow.

Using nn.ReLU as the activation function benefits from an optimization where . This results in much faster inference times in FHE, thanks to a TFHE primitive that performs fast division by powers of two.

The shows the behavior of built-in neural networks on several synthetic data-sets.

See the full list of Torch activations .

By default, this is unbounded, which, for weight and activation bit-width settings, . When used, the implementation will attempt to keep accumulators under this bit-width through (for example, setting some weights to zero).

See this in the quantization documentation for more details.

You can find other parameters from skorch in the .

See the and sections for more info.

In addition to the built-in models, Concrete ML supports generic machine learning models implemented with Torch, or .

There are two approaches to build :

requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with , a library providing QAT support for PyTorch. To use this mode, compile models using compile_brevitas_qat_model

Both approaches require the rounding_threshold_bits parameter to be set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is 6. See for more details.

See the for an explanation of some error messages that the compilation function may raise.

Once the model is trained, calling the from Concrete ML will automatically perform conversion and compilation of a QAT network. Here, 3-bit quantization is used for both the weights and activations. The compile_brevitas_qat_model function automatically identifies the number of quantization bits used in the Brevitas model.

If QuantIdentity layers are missing for any input or intermediate value, the compile function will raise an error. See the for an explanation.

You can perform the inference on clear data in order to evaluate the impact of quantization and of FHE computation on the accuracy of their model. See for more details. Two approaches exist:

FHE simulation allows to measure the impact of the Table Lookup error on the model accuracy. The Table Lookup error can be adjusted using p_error/global_p_error, as described in the section.

-- for casting to dtype

-- partial support

We want to hear from you! Take 1 minute to share your thoughts and helping us enhance our documentation and libraries. 👉 to participate.

and introduce specific hyper-parameters that influence the accumulator sizes. It is possible to chose quantization and pruning configurations that reduce the accumulator size. A trade-off between latency and accuracy can be obtained by varying these hyper-parameters as described in the .

While un-structured pruning is used to ensure the accumulator bit-width stays low, can eliminate entire neurons from the network. Many neural networks are over-parametrized (since this enables easier training) and some neurons can be removed. Structured pruning, applied to a trained network as a fine-tuning step, can be applied to built-in neural networks using the helper function as shown in . To apply structured pruning to custom models, it is recommended to use the package.

Reducing the bit-width of the inputs to the Table Lookup (TLU) operations is a major source of improvements in the latency. Post-training, it is possible to leverage some properties of the fused activation and quantization functions expressed in the TLUs to further reduce the accumulator. This is achieved through the rounded PBS feature as described in the . Adjusting the rounding amount, relative to the initial accumulator size, can bring large improvements in latency while maintaining accuracy.

Finally, the TFHE scheme exposes a TLU error tolerance parameter that has an impact on crypto-system parameters that influence latency. A higher tolerance of TLU off-by-one errors results in faster computations but may reduce accuracy. One can think of the error of obtaining as a Gaussian distribution centered on : is obtained with probability of 1 - p_error, while , are obtained with much lower probability, etc. In Deep NNs, these type of errors can be tolerated up to some point. See the and more specifically the usage example of .

These examples illustrate the basic usage of Concrete ML to build various types of neural networks. They use simple data-sets, focusing on the syntax and usage of Concrete ML. For examples showing how to train high-accuracy models on more complex data-sets, see the section.

The examples listed here make use of to perform evaluation over large test sets. Since FHE execution can be slow, only a few FHE executions can be performed. The of Concrete ML ensure that accuracy measured with simulation is the same as that which will be obtained during FHE execution.

2. Custom convolutional NN on the data-set

Following the , this notebook implements a Quantization Aware Training convolutional neural network on the MNIST data-set. It uses 3-bit weights and activations, giving a 7-bit accumulator.

The hybrid model deployment API provides an easy way to integrate the into neural network style models that are compiled with or .

The function serializes the FHE circuits corresponding to the various parts of the model that were chosen to be moved server-side. It also saves the client-side model, removing the weights of the layers that are transferred server-side. Furthermore it saves all necessary information required to serve these sub-models with FHE, using the class.

The class should be used to create a server application that creates end-points to serve these sub-models:

For more information about serving FHE models, see the .

Next, the client application must obtain the parameters necessary to encrypt and quantize data, as detailed in the .

ops_impl.py
ONNX documentation
ops_impl.py
onnx_utils.py
quantized_ops.py
test_onnx_ops_impl.py
test_quantized_ops.py
test_compile_torch.py
torch_models.py
see question on community for more details
troubleshooting guide on community
see the Docker documentation here
Click here
Security and correctness
API
Quantization
Pruning
Compilation
Advanced features
Project architecture
Community channels
Contribute to Concrete ML
Check the latest release note
Request a feature
Report a bug
Click here
encrypted_pandas.ipynb
FHE-friendly models documentation
Classifier Comparison notebook
skorch documentation
pruning
quantization
exported as ONNX graphs
compile_brevitas_qat_model
torch.nn.identity
torch.clip
torch.clamp
torch.round
torch.floor
torch.min
torch.max
torch.abs
torch.neg
torch.sign
torch.logical_or, torch.Tensor operator ||
torch.logical_not
torch.gt, torch.greater
torch.ge, torch.greater_equal
torch.lt, torch.less
torch.le, torch.less_equal
torch.eq
torch.where
torch.exp
torch.log
torch.pow
torch.sum
torch.mul, torch.Tensor operator *
torch.div, torch.Tensor operator /
torch.nn.BatchNorm2d
torch.nn.BatchNorm3d
torch.erf, torch.special.erf
torch.nn.functional.pad
torch.reshape
torch.Tensor.view
torch.flatten
torch.unsqueeze
torch.squeeze
torch.transpose
torch.concat, torch.cat
torch.nn.Unfold
torch.Tensor.expand
torch.Tensor.to
torch.nn.Linear
torch.conv1d, torch.nn.Conv1D
torch.conv2d, torch.nn.Conv2D
torch.nn.AvgPool2d
torch.nn.MaxPool2d
torch.add, torch.Tensor operator +
torch.sub, torch.Tensor operator -
torch.matmul
torch.nn.CELU
torch.nn.ELU
torch.nn.GELU
torch.nn.Hardshrink
torch.nn.HardSigmoid
torch.nn.Hardswish
torch.nn.HardTanh
torch.nn.LeakyReLU
torch.nn.LogSigmoid
torch.nn.Mish
torch.nn.PReLU
torch.nn.ReLU6
torch.nn.ReLU
torch.nn.SELU
torch.nn.Sigmoid
torch.nn.SiLU
torch.nn.Softplus
torch.nn.Softshrink
torch.nn.Softsign
torch.nn.Tanh
torch.nn.Tanhshrink
torch.nn.Threshold
Click here
public evaluation keys
private encryption keys
FHE-compatible deep networks
skorch
here
Quantization Aware Training (QAT)
Brevitas
respect FHE constraints
as described below
pruning
may make the trained networks fail in compilation
quantization uses powers-of-two scales
section
here
common compilation errors page
common compilation errors page
this section
import numpy as np
import os
import torch

from pathlib import Path
from torch import nn

from concrete.ml.torch.hybrid_model import HybridFHEModel, tuple_to_underscore_str
from concrete.ml.deployment import FHEModelServer


class FCSmall(nn.Module):
    """Torch model for the tests."""

    def __init__(self, dim):
        super().__init__()
        self.seq = nn.Sequential(nn.Linear(dim, dim), nn.ReLU(), nn.Linear(dim, dim))

    def forward(self, x):
        return self.seq(x)

model = FCSmall(10)
model_name = "FCSmall"
submodule_name = "seq.0"

inputs = torch.Tensor(np.random.uniform(size=(10, 10)))
# Prints ['', 'seq', 'seq.0', 'seq.1', 'seq.2']
print([k for (k, _) in model.named_modules()])

# Create a hybrid model
hybrid_model = HybridFHEModel(model, [submodule_name])
hybrid_model.compile_model(
    inputs,
    n_bits=8,
)


models_dir = Path(os.path.abspath('')) / "compiled_models"
models_dir.mkdir(exist_ok=True)
model_dir = models_dir / model_name
hybrid_model.save_and_clear_private_info(model_dir, via_mlir=True)
input_shape_subdir = tuple_to_underscore_str( (1,) + inputs.shape[1:] )
MODULES = { model_name: { submodule_name: {"path":  model_dir / submodule_name / input_shape_subdir }}}
server =  FHEModelServer(str(MODULES[model_name][submodule_name]["path"]))
# Modify model to use remote FHE server instead of local weights
hybrid_model = HybridFHEModel(
    model,
    submodule_name,
    server_remote_address="http://0.0.0.0:8000",
    model_name=f"{model_name}",
    verbose=False,
)
path_to_clients = Path(__file__).parent / "clients"
hybrid_model.init_client(path_to_clients=path_to_clients)
for module in hybrid_model.remote_modules.values():
    module.fhe_local_mode = HybridFHEMode.REMOTE    
hybrid_model.forward(torch.randn((dim, )))
NeuralNetClassifier
MLPClassifier
NeuralNetRegressor
MLPRegressor
structured pruning
prune
this example
torch-pruning
Demos and Tutorials
Quantization aware training example
Digits
Convolutional Neural Network
Step-by-step guide
standard deployment procedure
compile_brevitas_qat_model
compile_torch_model
save_and_clear_private_info
FHEModelDev
FHEModelServer
Quantization Aware Training
pruning
deep learning design guide
simulation
correctness guarantees
client/server section
client/server documentation

FHE Op-graph design

Float vs. quantized operations

Concrete, the underlying implementation of TFHE that powers Concrete ML, enables two types of operations on integers:

  1. arithmetic operations: the addition of two encrypted values and multiplication of encrypted values with clear scalars. These are used, for example, in dot-products, matrix multiplication (linear layers), and convolution.

  2. table lookup operations (TLU): using an encrypted value as an index, return the value of a lookup table at that index. This is implemented using Programmable Bootstrapping. This operation is used to perform any non-linear computation such as activation functions, quantization, and normalization.

Alternatively, it is possible to use a table lookup to avoid the quantization of the entire graph, by converting floating-point ONNX subgraphs into lambdas and computing their corresponding lookup tables to be evaluated directly in FHE. This operator-fusion technique only requires the input and output of the lambdas to be integers.

For example, in the following graph there is a single input, which must be an encrypted integer tensor. The following series of univariate functions is then fed into a matrix multiplication (MatMul) and fused into a single table lookup with integer inputs and outputs.

ONNX operations

Concrete ML implements ONNX operations using Concrete, which can handle floating point operations, as long as they can be fused to an integer lookup table. The ONNX operations implementations are based on the QuantizedOp class.

There are two modes of creation of a single table lookup for a chain of ONNX operations:

  1. float mode: when the operation can be fused

  2. mixed float/integer: when the ONNX operation needs to perform arithmetic operations

Thus, QuantizedOp instances may need to quantize their inputs or the result of their computation, depending on their position in the graph.

The QuantizedOp class provides a generic implementation of an ONNX operation, including the quantization of inputs and outputs, with the computation implemented in NumPy in ops_impl.py. It is possible to picture the architecture of the QuantizedOp as the following structure:

Operations that can fuse to a TLU

Depending on the position of the op in the graph and its inputs, the QuantizedOp can be fully fused to a TLU.

Many ONNX ops are trivially univariate, as they multiply variable inputs with constants or apply univariate functions such as ReLU, Sigmoid, etc. This includes operations between the input and the MatMul in the graph above (subtraction, comparison, multiplication, etc. between inputs and constants).

Operations that work on integers

Operations, such as matrix multiplication of encrypted inputs with a constant matrix or convolution with constant weights, require that the encrypted inputs be integers. In this case, the input quantizer of the QuantizedOp is applied. These types of operations are implemented with a class that derives from QuantizedOp and implements q_impl, such as QuantizedGemm and QuantizedConv.

Operations that produce graph outputs

Finally, some operations produce graph outputs, which must be integers. These operations need to quantize their outputs as follows:

The diagram above shows that both float ops and integer ops need to quantize their outputs to integers when placed at the end of the graph.

Putting it all together

To chain the operation types described above following the ONNX graph, Concrete ML constructs a function that calls the q_impl of the QuantizedOp instances in the graph in sequence, and uses Concrete to trace the execution and compile to FHE. Thus, in this chain of function calls, all groups of that instruction that operate in floating point will be fused to TLUs. In FHE, this lookup table is computed with a PBS.

The red contours show the groups of elementary Concrete instructions that will be converted to TLUs.

Note that the input is slightly different from the QuantizedOp. Since the encrypted function takes integers as inputs, the input needs to be de-quantized first.

Implementing a QuantizedOp

QuantizedOp is the base class for all ONNX-quantized operators. It abstracts away many things to allow easy implementation of new quantized ops.

Determining if the operation can be fused

The QuantizedOp class exposes a function can_fuse that:

  • helps to determine the type of implementation that will be traced.

  • determines whether operations further in the graph, that depend on the results of this operation, can fuse.

In most cases, ONNX ops have a single variable input and one or more constant inputs.

When the op implements element-wise operations between the inputs and constants (addition, subtract, multiplication, etc), the operation can be fused to a TLU. Thus, by default in QuantizedOp, the can_fuse function returns True.

When the op implements operations that mix the various scalars in the input encrypted tensor, the operation cannot fuse, as table lookups are univariate. Thus, operations such as QuantizedGemm and QuantizedConv return False in can_fuse.

Some operations may be found in both settings above. A mechanism is implemented in Concrete ML to determine if the inputs of a QuantizedOp are produced by a unique integer tensor. Therefore, the can_fuse function of some QuantizedOp types (addition, subtraction) will allow fusion to take place if both operands are produced by a unique integer tensor:

def can_fuse(self) -> bool:
    return len(self._int_input_names) == 1

Case 1: A floating point version of the op is sufficient

You can check ops_impl.py to see how some operations are implemented in NumPy. The declaration convention for these operations is as follows:

  • The required inputs should be positional arguments only before the /, which marks the limit of the positional arguments.

  • The optional inputs should be positional or keyword arguments between the / and *, which marks the limits of positional or keyword arguments.

  • The operator attributes should be keyword arguments only after the *.

The proper use of positional/keyword arguments is required to allow the QuantizedOp class to properly populate metadata automatically. It uses Python inspect modules and stores relevant information for each argument related to its positional/keyword status. This allows using the Concrete implementation as specifications for QuantizedOp, which removes some data duplication and generates a single source of truth for QuantizedOp and ONNX-NumPy implementations.

In that case (unless the quantized implementation requires special handling like QuantizedGemm), you can just set _impl_for_op_named to the name of the ONNX op for which the quantized class is implemented (this uses the mapping ONNX_OPS_TO_NUMPY_IMPL in onnx_utils.py to get the correct implementation).

Case 2: An integer implementation of the op is necessary

Providing an integer implementation requires sub-classing QuantizedOp to create a new operation. This sub-class must override q_impl in order to provide an integer implementation. QuantizedGemm is an example of such a case where quantized matrix multiplication requires proper handling of scales and zero points. The q_impl of that class reflects this.

In the body of q_impl, you can use the _prepare_inputs_with_constants function in order to obtain quantized integer values:

from concrete.ml.quantization import QuantizedArray

def q_impl(
    self,
    *q_inputs: QuantizedArray,
    **attrs,
) -> QuantizedArray:

    # Retrieve the quantized inputs
    prepared_inputs = self._prepare_inputs_with_constants(
        *q_inputs, calibrate=False, quantize_actual_values=True
    )

Here, prepared_inputs will contain one or more QuantizedArray, of which the qvalues are the quantized integers.

Once the required integer processing code is implemented, the output of the q_impl function must be implemented as a single QuantizedArray. Most commonly, this is built using the de-quantized results of the processing done in q_impl.

    result = (
        sum_result.astype(numpy.float32) - q_input.quantizer.zero_point
    ) * q_input.quantizer.scale

    return QuantizedArray(
        self.n_bits,
        result,
        value_is_float=True,
        options=self.input_quant_opts,
        stats=self.output_quant_stats,
        params=self.output_quant_params,
    )

Case 3: Both a floating point and an integer implementation are necessary

In this case, in q_impl you can check whether the current operation can be fused by calling self.can_fuse(). You can then have both a floating-point and an integer implementation. The traced execution path will depend on can_fuse():


def q_impl(
    self,
    *q_inputs: QuantizedArray,
    **attrs,
) -> QuantizedArray:

    execute_in_float = len(self.constant_inputs) > 0 or self.can_fuse()

    # a floating point implementation that can fuse
    if execute_in_float:
        prepared_inputs = self._prepare_inputs_with_constants(
            *q_inputs, calibrate=False, quantize_actual_values=False
        )

        result = prepared_inputs[0] + self.b_sign * prepared_inputs[1]
        return QuantizedArray(
            self.n_bits,
            result,
            # ......
        )
    else:
        prepared_inputs = self._prepare_inputs_with_constants(
            *q_inputs, calibrate=False, quantize_actual_values=True
        )
        # an integer implementation follows, see Case 2
        # ....

Project architecture

Contributing

There are three ways to contribute to Concrete ML:

  • You can open issues to report bugs and typos and to suggest ideas.

  • You can become an official contributor but you need to sign our Contributor License Agreement (CLA) on your first contribution. Our CLA-bot will guide you through the process when you will open a Pull Request on Github.

  • You can also provide new tutorials or use-cases, showing what can be done with the library. The more examples we have, the better and clearer it is for the other users.

1. Setting up the project

2. Creating a new branch

When creating your branch, make sure the name follows the expected format :

git checkout -b {feat|fix|docs|chore}/short_description_$(issue_id)
git checkout -b {feat|fix|docs|chore}/$(issue_id)_short_description

For example:

git checkout -b feat/add_avgpool_operator_470
git checkout -b feat/470_add_avgpool_operator

3. Before committing

3.1 Conformance

Each commit to Concrete ML should conform to the standards of the project. You can let the development tools fix some issues automatically with the following command:

make conformance

Additionally, you will need to make sure that the following command does not return any error (pcc: pre-commit checks):

make pcc

3.2 Testing

Your code must be well documented, provide extensive tests if any feature has been added and must not break other tests. To execute all tests, please run the following command. Be aware that running all tests can take up to an hour.

make pytest

You need to make sure you get 100% code coverage. The make pytest command checks that by default and will fail with a coverage report at the end should some lines of your code not be executed during testing.

If your coverage is below 100%, you should write more tests and then create the pull request. If you ignore this warning and create the PR, checks will fail and your PR will not be merged.

There may be cases where covering your code is not possible (an exception that cannot be triggered in normal execution circumstances). In those cases, you may be allowed to disable coverage for some specific lines. This should be the exception rather than the rule, and reviewers will ask why some lines are not covered. If it appears they can be covered, then the PR won't be accepted in that state.

4. Committing

Concrete ML uses a consistent commit naming scheme and you are expected to follow it as well. The accepted format can be printed to your terminal by running:

make show_commit_rules

For example:

git commit -m "feat: support AVGPool2d operator"
git commit -m "fix: fix AVGPool2d operator"

5. Rebasing

You should rebase on top of the repository's main branch before you create your pull request. Merge commits are not allowed, so rebasing on main before pushing gives you the best chance of to avoid rewriting parts of your PR later if conflicts arise with other PRs being merged. After you commit changes to your forked repository, you can use the following commands to rebase your main branch with Concrete ML's one:

# Add the Concrete ML repository as remote, named "upstream" 
git remote add upstream git@github.com:zama-ai/concrete-ml.git

# Fetch all last branches and changes from Concrete ML
git fetch upstream

# Checkout to your local main branch
git checkout main

# Rebase on top of main
git rebase upstream/main

# If there are conflicts during the rebase, resolve them
# and continue the rebase with the following command
git rebase --continue

# Push the latest version of your local main to your remote forked repository
git push --force origin main

6. Open a pull-request

Quantization tools

Quantizing data

Concrete ML has support for quantized ML models and also provides quantization tools for Quantization Aware Training and Post-Training Quantization. The core of this functionality is the conversion of floating point values to integers and back. This is done using QuantizedArray in concrete.ml.quantization.

  • n_bits defines the precision used in quantization

  • values are floating point values that will be converted to integers

  • is_signed determines if the quantized integer values should allow negative values

  • is_symmetric determines if the range of floating point values to be quantized should be taken as symmetric around zero

from concrete.ml.quantization import QuantizedArray
import numpy
numpy.random.seed(0)
A = numpy.random.uniform(-2, 2, 10)
print("A = ", A)
# array([ 0.19525402,  0.86075747,  0.4110535,  0.17953273, -0.3053808,
#         0.58357645, -0.24965115,  1.567092 ,  1.85465104, -0.46623392])
q_A = QuantizedArray(7, A)
print("q_A.qvalues = ", q_A.qvalues)
# array([ 37,          73,          48,         36,          9,
#         58,          12,          112,        127,         0])
# the quantized integers values from A.
print("q_A.quantizer.scale = ", q_A.quantizer.scale)
# 0.018274684777173276, the scale S.
print("q_A.quantizer.zero_point = ", q_A.quantizer.zero_point)
# 26, the zero point Z.
print("q_A.dequant() = ", q_A.dequant())
# array([ 0.20102153,  0.85891018,  0.40204307,  0.18274685, -0.31066964,
#         0.58478991, -0.25584559,  1.57162289,  1.84574316, -0.4751418 ])
# Dequantized values.

It is also possible to use symmetric quantization, where the integer values are centered around 0:

q_A = QuantizedArray(3, A)
print("Unsigned: q_A.qvalues = ", q_A.qvalues)
print("q_A.quantizer.zero_point = ", q_A.quantizer.zero_point)
# Unsigned: q_A.qvalues =  [2 4 2 2 0 3 0 6 7 0]
# q_A.quantizer.zero_point =  1

q_A = QuantizedArray(3, A, is_signed=True, is_symmetric=True)
print("Signed Symmetric: q_A.qvalues = ", q_A.qvalues)
print("q_A.quantizer.zero_point = ", q_A.quantizer.zero_point)
# Signed Symmetric: q_A.qvalues =  [ 0  1  1  0  0  1  0  3  3 -1]
# q_A.quantizer.zero_point =  0

In the following example, showing the de-quantization of model outputs, the QuantizedArray class is used in a different way. Here it uses pre-quantized integer values and has the scale and zero-point set explicitly. Once the QuantizedArray is constructed, calling dequant() will compute the floating point values corresponding to the integer values qvalues, which are the output of the fhe_circuit.encrypt_run_decrypt(..) call.

import numpy
from concrete.ml.quantization.quantizers import QuantizationOptions

q_values = [0, 0, 1, 2, 3, -1]
QuantizedArray(
        q_A.quantizer.n_bits,
        q_values,
        value_is_float=False,
        options=q_A.quantizer.quant_options,
        stats=q_A.quantizer.quant_stats,
        params=q_A.quantizer.quant_params,
).dequant()

Quantized modules

Machine learning models are implemented with a diverse set of operations, such as convolution, linear transformations, activation functions, and element-wise operations. When working with quantized values, these operations cannot be carried out in an equivalent way to floating point values. With quantization, it is necessary to re-scale the input and output values of each operation to fit in the quantization domain.

In Concrete ML, the quantized equivalent of a scikit-learn model or a PyTorch nn.Module is the QuantizedModule. Note that only inference is implemented in the QuantizedModule, and it is built through a conversion of the inference function of the corresponding scikit-learn or PyTorch module.

Built-in neural networks expose the quantized_module member, while a QuantizedModule is also the result of the compilation of custom models through compile_torch_model and compile_brevitas_qat_model.

Calibration is the process of determining the typical distributions of values encountered for the intermediate values of a model during inference.

Resources

Set up Docker

Building the image

Once you do that, you can get inside the Docker environment using the following command:

After you finish your work, you can leave Docker by using the exit command or by pressing CTRL + D.

T[x]T[x]T[x]
xxx
TLU[x]TLU[x]TLU[x]
T[x−1]T[x-1]T[x−1]
T[x+1]T[x+1]T[x+1]

The section gave an overview of the conversion of a generic ONNX graph to an FHE-compatible Concrete ML op-graph. This section describes the implementation of operations in the Concrete ML op-graph and the way floating point can be used in some parts of the op-graphs through table lookup operations.

Since machine learning models use floating point inputs and weights, they first need to be converted to integers using .

This figure shows that the QuantizedOp has a body that implements the computation of the operation, following the . The operation's body can take either integer or float inputs and can output float or integer values. Two quantizers are attached to the operation: one that takes float inputs and produces integer inputs and one that does the same for the output.

First, you need to the repository and properly set up the project by following the steps provided .

Just a reminder that commit messages are checked in the conformance step and are rejected if they don't follow the rules. To learn more about conventional commits, check .

You can learn more about rebasing .

You can now open a pull-request . For more details on how to do so from a forked repository, please read GitHub's on the subject.

The class takes several arguments that determine how float values are quantized:

See also the reference for more information:

The quantized versions of floating point model operations are stored in the QuantizedModule. The ONNX_OPS_TO_QUANTIZED_IMPL dictionary maps ONNX floating point operators (e.g., Gemm) to their quantized equivalent (e.g., QuantizedGemm). For more information on implementing these operations, please see the .

The computation graph is taken from the corresponding floating point ONNX graph exported from scikit-learn , or from the ONNX graph exported by PyTorch. Calibration is used to obtain quantized parameters for the operations in the QuantizedModule. Parameters are also determined for the quantization of inputs during model deployment.

To perform calibration, an interpreter goes through the ONNX graph in and stores the intermediate results as it goes. The statistics of these values determine quantization parameters.

That QuantizedModule generates the Concrete function that is compiled to FHE. The compilation will succeed if the intermediate values conform to the 16-bits precision limit of the Concrete stack. See for details.

Lei Mao's blog on quantization:

Google paper on neural network quantization and integer-only inference:

Before you start this section, you must install Docker by following official guide.

Once you have access to this repository and the dev environment is installed on your host OS (via make setup_env once ), you should be able to launch the commands to build the dev Docker image with make docker_build.

ONNX import
quantization
ONNX spec

Importing ONNX

Quantization tools

FHE op-graph design

External libraries

fork
Concrete ML
here
this page
here
in the Concrete ML repository
official documentation
QuantizedArray
UniformQuantizer
FHE-compatible op-graph section
topological order
the compilation section
Quantization for Neural Networks
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
using HummingBird
make docker_start

# or build and start at the same time
make docker_build_and_start

# or equivalently but shorter
make docker_bas
this
you followed the steps here

Set up the project

Concrete ML is a Python library, so Python should be installed to develop Concrete ML. v3.8 and v3.9 are the only supported versions. Concrete ML also uses Poetry and Make.

First of all, you need to git clone the project:

git clone https://github.com/zama-ai/concrete-ml
git lfs pull

On the contrary, to disable downloading all these files (which represents up to several hundreds of MB) when cloning the repository, simply run :

GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/zama-ai/concrete-ml

Automatic installation

For Windows users, the setup_os_deps.sh script does not install dependencies because of how many different installation methods there are due to the lack of a single package manager.

Manual installation

Python

Poetry

make

The dev tools use make to launch various commands.

On Linux, you can install make from your distribution's preferred package manager.

On macOS, you can install a more recent version of make via brew:

# check for gmake
which gmake

# If you don't have it, it will error out, install gmake
brew install make

# recheck, now you should have gmake
which gmake

In the following sections, be sure to use the proper make tool for your system: make, gmake, or other.

Cloning the repository

To get the source code of Concrete ML, clone the code repository using the link for your favorite communication protocol (ssh or https).

Setting up environment on your host OS

We are going to make use of virtual environments. This helps to keep the project isolated from other Python projects in the system. The following commands will create a new virtual environment under the project directory and install dependencies to it.

The following command will not work on Windows if you don't have Poetry >= 1.2.

cd concrete-ml
make setup_env

Activating the environment

Finally, activate the newly created environment using the following command:

macOS or Linux

source .venv/bin/activate

Windows

source .venv/Scripts/activate

Setting up environment on Docker

Docker automatically creates and sources a venv in ~/dev_venv/

The venv persists thanks to volumes. It also creates a volume for ~/.cache to speedup later reinstallations. You can check which Docker volumes exist with:

docker volume ls

You can still run all make commands inside Docker (to update the venv, for example). Be mindful of the current venv being used (the name in parentheses at the beginning of your command prompt).

# Here we have dev_venv sourced
(dev_venv) dev_user@8e299b32283c:/src$ make setup_env

Leaving the environment

After your work is done, you can simply run the following command to leave the environment:

deactivate

Syncing environment with the latest changes

From time to time, new dependencies will be added to the project or the old ones will be removed. The command below will make sure the project has the proper environment, so run it regularly!

make sync_env

Troubleshooting your environment

in your OS

If you are having issues, consider using the dev Docker exclusively (unless you are working on OS-specific bug fixes or features).

Here are the steps you can take on your OS to try and fix issues:

# Try to install the env normally
make setup_env

# If you are still having issues, sync the environment
make sync_env

# If you are still having issues on your OS, delete the venv:
rm -rf .venv

# And re-run the env setup
make setup_env

in Docker

Here are the steps you can take in your Docker to try and fix issues:

# Try to install the env normally
make setup_env

# If you are still having issues, sync the environment
make sync_env

# If you are still having issues in Docker, delete the venv:
rm -rf ~/dev_venv/*

# Disconnect from Docker
exit

# And relaunch, the venv will be reinstalled
make docker_start

# If you are still out of luck, force a rebuild which will also delete the volumes
make docker_rebuild

# And start Docker, which will reinstall the venv
make docker_start

If the problem persists at this point, you should ask for help. We're here and ready to assist!

Support and issues

Concrete ML is a constant work-in-progress, and thus may contain bugs or suboptimal APIs.

Furthermore, undefined behavior may occur if the input-set, which is internally used by the compilation core to set bit-widths of some intermediate data, is not sufficiently representative of the future user inputs. With all the inputs in the input-set, it appears that intermediate data can be represented as an n-bit integer. But, for a particular computation, this same intermediate data needs additional bits to be represented. The FHE execution for this computation will result in an incorrect output, as typically occurs in integer overflows in classical programs.

Submitting an issue

  • the reproducibility rate you see on your side

  • any insight you might have on the bug

  • any workaround you have been able to find

In order to be able to run all documentation examples, we recommend to also and then pull the necessary files :

A simple way to have everything installed is to use the development Docker (see the guide). On Linux and macOS, you have to run the script in ./script/make_utils/setup_os_deps.sh. Specify the --linux-install-python flag if you want to install python3.8 as well on apt-enabled Linux distributions. The script should install everything you need for Docker and bare OS development (you can first review the content of the file to check what it will do).

The first step is to (as some of the dev tools depend on it), then . In addition to installing Python, you are still going to need the following software available on path on Windows, as some of the basic dev tools depend on them:

git

jq

make

Development on Windows only works with the Docker environment. Follow .

To manually install Python, you can follow guide (alternatively, you can google how to install Python 3.8 (or 3.9)).

Poetry is used as the package manager. It drastically simplifies dependency and environment management. You can follow official guide to install it.

It is possible to install gmake as make. Check this for more info.

On Windows, check .

At this point, you should consider using Docker as nobody will have the exact same setup as you. If, however, you need to develop on your OS directly, you can .

Before opening an issue or asking for support, please read this documentation to understand common issues and limitations of Concrete ML. You can also check the .

If you didn't find an answer, you can ask a question through the .

When submitting an issue (), ideally include as much information as possible. In addition to the Python script, the following information is useful:

If you would like to contribute to a project and send pull requests, take a look at the guide.

install git-lfs
Docker setup
https://gitforwindows.org/
https://github.com/stedolan/jq/releases
https://gist.github.com/evanwill/0207876c3243bbb6863e65ec5dc3f058#make
this link to setup the Docker environment
this
this
StackOverflow post
this GitHub gist
install Python
Poetry
ask Zama for help
outstanding issues on github
community channels
here
contributor
PBS error tolerance setting
bootstrapping off-by-one error probability
bootstrapping off-by-one error probability
rounding setting
here
rounded accumulators
this section for more explanation
here
approximate computation
rounded activations and quantizers reference
p_error documentation for details
the API for finding the best p_error

Advanced features

Concrete ML provides features for advanced users to adjust cryptographic parameters generated by the Concrete stack. This allows users to identify the best trade-off between latency and performance for their specific machine learning models.

Approximate computations

Concrete ML makes use of table lookups (TLUs) to represent any non-linear operation (e.g., a sigmoid). TLUs are implemented through the Programmable Bootstrapping (PBS) operation, which applies a non-linear operation in the cryptographic realm.

The result of TLU operations is obtained with a specific tolerance to off-by-one errors. Concrete ML offers the possibility to set the probability of such errors occurring, which influences the cryptographic parameters. The lower the tolerance, the more restrictive the parameters become, making both key generation and, more significantly, FHE execution time slower.

Concrete ML has a simulation mode where the impact of approximate computation of TLUs on the model accuracy can be determined. The simulation is much faster, speeding up model development significantly. The behavior in simulation mode is representative of the behavior of the model on encrypted data.

In Concrete ML, there are three different ways to define the tolerance to off-by-one errors for each TLU operation:

p_error and global_p_error cannot be set at the same time, as they are incompatible with each other.

Tolerance to off-by-one error for an individual TLU

The first way to set error probabilities in Concrete ML is at the local level, by directly setting the tolerance to error of each individual TLU operation (such as activation functions for a neuron output). This tolerance is referred to as p_error. A given PBS operation has a 1 - p_error chance of being correct 100% of the time. The successful evaluation here means that the value decrypted after FHE evaluation is exactly the same as the one that would be computed in the clear. Otherwise, off-by-one errors might occur, but, in practice, these errors are not necessarily problematic if they are sufficiently rare.

Here is a visualization of the effect of the p_error on a neural network model with a p_error = 0.1 compared to execution in the clear (i.e., no error):

Varying p_error in the one hidden-layer neural network above produces the following inference times. Increasing p_error to 0.1 halves the inference time with respect to a p_error of 0.001. In the graph above, the decision boundary becomes noisier with a higher p_error.

p_error
Inference Time (ms)

0.001

0.80

0.01

0.41

0.1

0.37

Users have the possibility to change this p_error by passing an argument to the compile function of any of the models. Here is an example:

from concrete.ml.sklearn import XGBClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

x, y = make_classification(n_samples=100, class_sep=2, n_features=4, random_state=42)

# Retrieve train and test sets
X_train, _, y_train, _ = train_test_split(x, y, test_size=10, random_state=42)

clf = XGBClassifier()
clf.fit(X_train, y_train)

# Here we set the p_error parameter
clf.compile(X_train, p_error=0.1)

A global tolerance for one-off-errors for the entire model

A global_p_error is also available and defines the probability of 100% correctness for the entire model, compared to execution in the clear. In this case, the p_error for every TLU is determined internally in Concrete such that the global_p_error is reached for the whole model.

There might be cases where the user encounters a No cryptography parameter found error message. Increasing the p_error or the global_p_error in this case might help.

Usage is similar to the p_error parameter:

# Here we set the global_p_error parameter
clf.compile(X_train, global_p_error=0.1)

In the above example, XGBoostClassifier in FHE has a 1/10 probability to have a one-off output value compared to the expected value. The shift is relative to the expected value, so even if the result is different, it should be close to the expected value.

Using default error probability

If neither p_error or global_p_error are set, Concrete ML employs p_error = 2^-40 by default.

Searching for the best error probability

Currently finding a good p_error value a-priori is not possible, as it is difficult to determine the impact of the TLU error on the output of a neural network. Concrete ML provides a tool to find a good p_error value that improves inference speed while maintaining accuracy. The method is based on binary search and evaluates the latency/accuracy trade-off iteratively.

from time import time

from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from concrete.ml.search_parameters import BinarySearch
from concrete.ml.sklearn import DecisionTreeClassifier

x, y = make_classification(n_samples=100, class_sep=2, n_features=4, random_state=42)

# Retrieve train and test sets
X_train, _, y_train, _ = train_test_split(x, y, test_size=10, random_state=42)

clf = DecisionTreeClassifier(random_state=42)

# Fit the model
clf.fit(X_train, y_train)

# Compile the model with the default `p_error`
fhe_circuit = clf.compile(X_train)

# Key Generation
fhe_circuit.client.keygen(force=False)

start_time = time()
y_pred = clf.predict(X_train, fhe="execute")
end_time = time()

print(f"With the default p_error≈0, the inference time is {(end_time - start_time) / 60:.2f} s")
# Output: With the default p_error≈0, the inference time is 0.89 s
print(f"Accuracy = {accuracy_score(y_pred, y_train):.2%}")
# Output: Accuracy = 100.00%

# Search for the largest `p_error` that provides
# the best compromise between accuracy and computational efficiency in FHE
search = BinarySearch(estimator=clf, predict="predict", metric=accuracy_score)
p_error = search.run(x=X_train, ground_truth=y_train, max_iter=10)

# Compile the model with the optimal `p_error`
fhe_circuit = clf.compile(X_train, p_error=p_error)

# Key Generation
fhe_circuit.client.keygen(force=False)

start_time = time()
y_pred = clf.predict(X_train, fhe="execute")
end_time = time()

print(
    f"With p_error={p_error:.5f}, the inference time becomes {(end_time - start_time) / 60:.2f} s"
)
# Ouput: With p_error=0.00043, the inference time becomes 0.56 s
print(f"Accuracy = {accuracy_score(y_pred, y_train): .2%}")
# Output: Accuracy = 100.00%

With this optimal p_error, accuracy is maintained while execution time is improved by a factor of 1.51.

Please note that the default setting for the search interval is restricted to a range of 0.0 to 0.9. Increasing the upper bound beyond this range may result in longer execution times, especially when p_error≈1.

Rounded activations and quantizers

To speed-up neural networks, a rounding operator can be applied on the accumulators of linear and convolution layers to retain the most significant bits on which the activation and quantization is applied. The accumulator is represented using LLL bits, and P≤LP \leq LP≤L is the desired input bit-width of the TLU operation that computes the activation and quantization.

The rounding operation is defined as follows:

First, compute ttt as the difference between LLL, the actual bit-width of the accumulator, and PPP:

t=L−Pt = L - Pt=L−P

Then, the rounding operation can be computed as:

round_to_P_bits(x,t)=⌊x2t⌉⋅2t\mathrm{round\_to\_P\_bits}(x, t) = \left\lfloor \frac{x}{2^t} \right\rceil \cdot 2^tround_to_P_bits(x,t)=⌊2tx​⌉⋅2t

where xxx is the input number, and ⌊⋅⌉\lfloor \cdot \rceil⌊⋅⌉ denotes the operation that rounds to the nearest integer.

In Concrete ML, this feature is currently implemented for custom neural networks through the compile functions, including

  • concrete.ml.torch.compile_torch_model,

  • concrete.ml.torch.compile_onnx_model and

  • concrete.ml.torch.compile_brevitas_qat_model.

The rounding_threshold_bits argument can be set to a specific bit-width. It is important to choose an appropriate bit-width threshold to balance the trade-off between speed and accuracy. By reducing the bit-width of intermediate tensors, it is possible to speed-up computations while maintaining accuracy.

The rounding_threshold_bits parameter only works in FHE for TLU input bit-width (PPP) less or equal to 8 bits.

To find the best trade-off between speed and accuracy, it is recommended to experiment with different thresholds and check the accuracy on an evaluation set after compiling the model.

In practice, the process looks like this:

  1. Set a rounding_threshold_bits to a relatively high P. Say, 8 bits.

  2. Check the accuracy

  3. Update P = P - 1

  4. repeat steps 2 and 3 until the accuracy loss is above a certain, acceptable threshold.

Seeing compilation information

By using verbose = True and show_mlir = True during compilation, the user receives a lot of information from Concrete. These options are, however, mainly meant for power-users, so they may be hard to understand.

from concrete.ml.sklearn import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

x, y = make_classification(n_samples=100, class_sep=2, n_features=4, random_state=42)

# Retrieve train and test sets
X_train, _, y_train, _ = train_test_split(x, y, test_size=10, random_state=42)

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

clf.compile(X_train, verbose=True, show_mlir=True, p_error=0.033)

Here, one will see:

  • the computation graph (typically):

Computation Graph
-------------------------------------------------------------------------------------------------------------------------------
 %0 = _inputs                                  # EncryptedTensor<uint6, shape=(1, 4)>           ∈ [0, 63]
 %1 = transpose(%0)                            # EncryptedTensor<uint6, shape=(4, 1)>           ∈ [0, 63]
 %2 = [[0 0 0 1]]                              # ClearTensor<uint1, shape=(1, 4)>               ∈ [0, 1]
 %3 = matmul(%2, %1)                           # EncryptedTensor<uint6, shape=(1, 1)>           ∈ [0, 63]
 %4 = [[32]]                                   # ClearTensor<uint6, shape=(1, 1)>               ∈ [32, 32]
 %5 = less_equal(%3, %4)                       # EncryptedTensor<uint1, shape=(1, 1)>           ∈ [False, True]
 %6 = reshape(%5, newshape=[ 1  1 -1])         # EncryptedTensor<uint1, shape=(1, 1, 1)>        ∈ [False, True]
 %7 = [[[ 1]  [-1]]]                           # ClearTensor<int2, shape=(1, 2, 1)>             ∈ [-1, 1]
 %8 = matmul(%7, %6)                           # EncryptedTensor<int2, shape=(1, 2, 1)>         ∈ [-1, 1]
 %9 = reshape(%8, newshape=[ 2 -1])            # EncryptedTensor<int2, shape=(2, 1)>            ∈ [-1, 1]
%10 = [[1] [0]]                                # ClearTensor<uint1, shape=(2, 1)>               ∈ [0, 1]
%11 = equal(%10, %9)                           # EncryptedTensor<uint1, shape=(2, 1)>           ∈ [False, True]
%12 = reshape(%11, newshape=[ 1  2 -1])        # EncryptedTensor<uint1, shape=(1, 2, 1)>        ∈ [False, True]
%13 = [[[63  0]  [ 0 63]]]                     # ClearTensor<uint6, shape=(1, 2, 2)>            ∈ [0, 63]
%14 = matmul(%13, %12)                         # EncryptedTensor<uint6, shape=(1, 2, 1)>        ∈ [0, 63]
%15 = reshape(%14, newshape=[ 1  2 -1])        # EncryptedTensor<uint6, shape=(1, 2, 1)>        ∈ [0, 63]
return %15
  • the MLIR, produced by Concrete:

MLIR
-------------------------------------------------------------------------------------------------------------------------------
module {
  func.func @main(%arg0: tensor<1x4x!FHE.eint<6>>) -> tensor<1x2x1x!FHE.eint<6>> {
    %cst = arith.constant dense<[[[63, 0], [0, 63]]]> : tensor<1x2x2xi7>
    %cst_0 = arith.constant dense<[[1], [0]]> : tensor<2x1xi7>
    %cst_1 = arith.constant dense<[[[1], [-1]]]> : tensor<1x2x1xi7>
    %cst_2 = arith.constant dense<32> : tensor<1x1xi7>
    %cst_3 = arith.constant dense<[[0, 0, 0, 1]]> : tensor<1x4xi7>
    %c32_i7 = arith.constant 32 : i7
    %0 = "FHELinalg.transpose"(%arg0) {axes = []} : (tensor<1x4x!FHE.eint<6>>) -> tensor<4x1x!FHE.eint<6>>
    %cst_4 = tensor.from_elements %c32_i7 : tensor<1xi7>
    %1 = "FHELinalg.matmul_int_eint"(%cst_3, %0) : (tensor<1x4xi7>, tensor<4x1x!FHE.eint<6>>) -> tensor<1x1x!FHE.eint<6>>
    %cst_5 = arith.constant dense<[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]> : tensor<64xi64>
    %2 = "FHELinalg.apply_lookup_table"(%1, %cst_5) : (tensor<1x1x!FHE.eint<6>>, tensor<64xi64>) -> tensor<1x1x!FHE.eint<6>>
    %3 = tensor.expand_shape %2 [[0], [1, 2]] : tensor<1x1x!FHE.eint<6>> into tensor<1x1x1x!FHE.eint<6>>
    %4 = "FHELinalg.matmul_int_eint"(%cst_1, %3) : (tensor<1x2x1xi7>, tensor<1x1x1x!FHE.eint<6>>) -> tensor<1x2x1x!FHE.eint<6>>
    %5 = tensor.collapse_shape %4 [[0, 1], [2]] : tensor<1x2x1x!FHE.eint<6>> into tensor<2x1x!FHE.eint<6>>
    %6 = "FHELinalg.add_eint_int"(%5, %cst_4) : (tensor<2x1x!FHE.eint<6>>, tensor<1xi7>) -> tensor<2x1x!FHE.eint<6>>
    %cst_6 = arith.constant dense<"0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"> : tensor<2x64xi64>
    %cst_7 = arith.constant dense<[[0], [1]]> : tensor<2x1xindex>
    %7 = "FHELinalg.apply_mapped_lookup_table"(%6, %cst_6, %cst_7) : (tensor<2x1x!FHE.eint<6>>, tensor<2x64xi64>, tensor<2x1xindex>) -> tensor<2x1x!FHE.eint<6>>
    %8 = tensor.expand_shape %7 [[0, 1], [2]] : tensor<2x1x!FHE.eint<6>> into tensor<1x2x1x!FHE.eint<6>>
    %9 = "FHELinalg.matmul_int_eint"(%cst, %8) : (tensor<1x2x2xi7>, tensor<1x2x1x!FHE.eint<6>>) -> tensor<1x2x1x!FHE.eint<6>>
    return %9 : tensor<1x2x1x!FHE.eint<6>>
  }
}
  • information from the optimizer (including cryptographic parameters):

Optimizer
-------------------------------------------------------------------------------------------------------------------------------
--- Circuit
  6 bits integers
  7 manp (maxi log2 norm2)
  388ms to solve
--- User config
  3.300000e-02 error per pbs call
  1.000000e+00 error per circuit call
--- Complexity for the full circuit
  4.214000e+02 Millions Operations
--- Correctness for each Pbs call
  1/30 errors (3.234529e-02)
--- Correctness for the full circuit
  1/10 errors (9.390887e-02)
--- Parameters resolution
  1x glwe_dimension
  2**11 polynomial (2048)
  762 lwe dimension
  keyswitch l,b=5,3
  blindrota l,b=2,15
  wopPbs : false
---

In this latter optimization, the following information will be provided:

  • The bit-width ("6-bit integers") used in the program: for the moment, the compiler only supports a single precision (i.e., that all PBS are promoted to the same bit-width - the largest one). Therefore, this bit-width predominantly drives the speed of the program, and it is essential to reduce it as much as possible for faster execution.

  • The maximal norm2 ("7 manp"), which has an impact on the crypto parameters: The larger this norm2, the slower PBS will be. The norm2 is related to the norm of some constants appearing in your program, in a way which will be clarified in the Concrete documentation.

  • The probability of error of an individual PBS, which was requested by the user ("3.300000e-02 error per pbs call" in User Config).

  • The probability of error of the full circuit, which was requested by the user ("1.000000e+00 error per circuit call" in User Config). Here, the probability 1 stands for "not used", since we had set the individual probability via p_error.

  • The probability of error of an individual PBS, which is found by the optimizer ("1/30 errors (3.234529e-02)").

  • The probability of error of the full circuit which is found by the optimizer ("1/10 errors (9.390887e-02)").

  • An estimation of the cost of the circuit ("4.214000e+02 Millions Operations"): Large values indicate a circuit that will execute more slowly.

Here is some further information about cryptographic parameters:

  • 1x glwe_dimension

  • 2**11 polynomial (2048)

  • 762 lwe dimension

  • keyswitch l,b=5,3

  • blindrota l,b=2,15

  • wopPbs : false

This optimizer feedback is a work in progress and will be modified and improved in future releases.

Sklearn model decision boundaries
FHE model decision boundaries
Comparison of clasification decision boundaries between FHE and plaintext models
XGBoost n_bits comparison
Torch compilation flow with ONNX
Artificial Neuron
Fully Connected Neural Network
Pruned Fully Connected Neural Network
Comparison neural networks

setting p_error, the error probability of an individual TLU (see )

setting global_p_error, the error probability of the full circuit (see )

not setting p_error nor global_p_error, and using default parameters (see )

For simplicity, it is best to use , irrespective of the type of model. Especially for deep neural networks, default values may be too pessimistic, reducing computation speed without any improvement in accuracy. For deep neural networks, some TLU errors might not affect the accuracy of the network, so p_error can be safely increased (e.g., see CIFAR classifications in ).

Impact of p_error in a Neural Network

The speedup depends on model complexity, but, in an iterative approach, it is possible to search for a good value of p_error to obtain a speedup while maintaining good accuracy. Concrete ML provides a tool to find a good value for p_error based on .

If the p_error value is specified and is enabled, the run will take into account the randomness induced by the choice of p_error. This results in statistical similarity to the FHE evaluation.

An example of such implementation is available in and

evaluate_torch_cml.py
CifarInFheWithSmallerAccumulators.ipynb
here
here
here
our showcase
default options
binary search
simulation

What is Concrete ML

Understand the Concrete ML library with a full example.

Installation

Follow the step-by-step guide to install Concrete ML in your project.

Key concepts

Understand important cryptographic concepts to implement Concrete ML.

Cover
Cover
Cover

Fundamentals

Explore core features.

Guides

Deploy your projects.

Tutorials

Learn more with tutorials.

Cover
Cover
Cover

Built-in models
Deep learning
Prediction with FHE
Production deployment
Start here
Go further