1 of 70

0.4 What is Concrete ML?

Example usage

import numpy
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression

# Lets create a synthetic data-set
x, y = make_classification(n_samples=100,
    class_sep=2, n_features=4, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42
)

# Now we train in plaintext using quantization
model = LogisticRegression(n_bits=2)
model.fit(X_train, y_train)

y_pred_clear = model.predict(X_test)

# Finally we compile and run inference on encrypted inputs!
model.compile(x)
y_pred_fhe = model.predict(X_test, execute_in_fhe=True)

print("In clear  :", y_pred_clear)
print("In FHE    :", y_pred_fhe)
print("Comparison:", (y_pred_fhe == y_pred_clear))

# Output:
#   In clear  : [0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 0 1 1 1]
#   In FHE    : [0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 0 1 1 1]
#   Comparison: [ True  True  True  True  True  True  True  True  True  True  True  True
#   True  True  True  True  True  True  True  True]

This example shows the typical flow of a Concrete-ML model:

The model is trained on unencrypted (plaintext) data using scikit-learn. As FHE operates over integers, Concrete-ML quantizes the model to use only integers during inference.
The quantized model is compiled to a FHE equivalent. Under the hood, the model is first converted to a Concrete-Numpy program, then compiled.

Current limitations

To make a model work with FHE, the only constraint is to make it run within the supported precision limitations of Concrete-ML (currently 8-bit integers). Thus, machine learning models are required to be quantized, which sometimes leads to a loss of accuracy versus the original model operating on plaintext.

Additionally, Concrete-ML currently only supports FHE inference. On the other hand, training has to be done on unencrypted data, producing a model which is then converted to a FHE equivalent that can perform encrypted inference, i.e. prediction over encrypted data.

Finally, in Concrete-ML there is currently no support for pre-processing model inputs and for post-processing model outputs. These processing stages may involve text to numerical feature transformation, dimensionality reduction, KNN or clustering, featurization, normalization, and the mixing of results of ensemble models.

All of these issues are currently being addressed and significant improvements are expected to be released in the coming months.

Concrete Stack

Online demos and tutorials.

More generally, if you have built awesome projects using Concrete-ML, feel free to let us know and we'll link to it!

Additional resources

Looking for support? Ask our team!

Getting Started

Installation

Please note that not all hardware/OS combinations are supported. Determine your platform, OS version and Python version before referencing the table below.

Depending on your OS, Concrete-ML may be installed with Docker or with pip:

OS / HW

Available on Docker

Available on pip

Linux

Yes

Windows

Yes

Not currently

Windows Subsystem for Linux

Yes

macOS (Intel)

Yes

macOS (Apple Silicon, ie M1, M2 etc)

Yes

Not currently

Most of these limits are shared with the rest of the Concrete stack (namely Concrete-Numpy and Concrete-Compiler). Support for more platforms will be added in the future.

Using PyPi

Requirements

Installing on Windows can be done using Docker or WSL. On WSL, Concrete-ML will work as long as the package is not installed in the /mnt/c/ directory, which corresponds to the host OS filesystem.

Installation

To install Concrete-ML from PyPi, run the following:

pip install -U pip wheel setuptools
pip install concrete-ml

This will automatically install all dependencies, notably Concrete-Numpy.

Using Docker

Concrete-ML can be installed using Docker by either pulling the latest image or a specific version:

docker pull zamafhe/concrete-ml:latest
# or
docker pull zamafhe/concrete-ml:v0.4.0

The image can then be used via the following command:

# Without local volume:
docker run --rm -it -p 8888:8888 zamafhe/concrete-ml

# With local volume to save notebooks on host:
docker run --rm -it -p 8888:8888 -v /host/path:/data zamafhe/concrete-ml

This will launch a Concrete-ML enabled Jupyter server in Docker that can be accessed directly from a browser.

Alternatively, a shell can be lauched in Docker, with or without volumes:

docker run --rm -it zamafhe/concrete-ml /bin/bash

Key Concepts

Concrete-ML is built on top of Concrete-Numpy, which enables Numpy programs to be converted into FHE circuits.

Lifecycle of a Concrete-ML model

I. Model Development

Training. A model is trained using plaintext, non-encrypted, training data.
Inference. The compiled model can then be executed on encrypted data, once the proper keys have been generated. The model can also be deployed to a server and used to run private inference on encrypted inputs.

II. Model deployment

Client/Server deployment. In a client/server setting, the model can be exported in a way that:
- allows the client to generate keys, encrypt and decrypt.
- provides a compiled model that can run on the server to perform inference on encrypted data
Key generation. The data owner (client) needs to generate a pair of private keys (to encrypt/decrypt their data and results) and a public evaluation key (for the model's FHE evaluation on the server).

Cryptography concepts

Concrete-ML and Concrete-Numpy are tools that hide away the details of the underlying cryptography scheme, called TFHE. However, some cryptography concepts are still useful when using these two toolkits:

Encryption/Decryption. These operations transform plaintext, i.e. human-readable information, into ciphertext, i.e. data that contains a form of the original plaintext that is unreadable by a human or computer without the proper key to decrypt it. Encryption takes plaintext and an encryption key and produces ciphertext, while decryption is the inverse operation.
Encrypted inference. FHE allows a third party to execute (i.e. run inference or predict) a machine learning model on encrypted data (a ciphertext). The result of the inference is also encrypted and can only be read by the person who gets the decryption key.
Keys. A key is a series of bits used within an encryption algorithm for encrypting data so that the corresponding ciphertext appears random.
Key generation. Cryptographic keys need to be generated using random number generators. Their size may be large and key generation may take a long time. However, keys only need to be generated once for each model a client uses.
Guaranteed correctness of encrypted computations. To achieve security, TFHE, the underlying encryption scheme, adds random noise as ciphertexts. This can induce errors during processing of encrypted data, depending on noise parameters. By default, Concrete-ML uses parameters that ensure the correctness of the encrypted computation, so you do not need to take into account the noise parametrization. Therefore, results on encrypted data will be the same as the results of simulation on clear data.

Model accuracy considerations under FHE constraints

To respect FHE constraints, all numerical programs over encrypted data must have all inputs, constants and intermediate values represented with integers of a maximum of 8 bits.

Inference in the Cloud

Concrete-ML models can be easily deployed in a client/server setting, enabling the creation of privacy-preserving services in the cloud.

Keys are generated by the user once for each service they use, based on the model the service provides and its cryptographic parameters.

The overall communications protocol to enable cloud deployment of machine learning services can be summarized in the following diagram:

The steps detailed above are as follows:

The model developer deploys the compiled machine learning model to the server. This model includes the cryptographic parameters. The server is now ready to provide private inference.
The client requests the cryptographic parameters (also called "client specs"). Once it gets them from the server, the secret and evaluation keys are generated.
The client sends the evaluation key to the server. The server is now ready to accept requests from this client. The client sends their encrypted data.
The server uses the evaluation key to securely run inference on the user's data and sends back the encrypted result.
The client now decrypts the result and can send back new requests.

Built-in Models

Linear Models

Concrete-ML

scikit-learn

Models are also compatible with some of scikit-learn's main workflows, such as Pipeline() or GridSearch().

Example

import numpy
from tqdm import tqdm
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

from concrete.ml.sklearn import LogisticRegression

# Create the data for classification
X, y = make_classification(
    n_features=2,
    n_redundant=0,
    n_informative=2,
    random_state=2,
    n_clusters_per_class=1,
    n_samples=100,
)

# Retrieve train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

# Instantiate the model
model = LogisticRegression(n_bits=2)

# Fit the model
model.fit(X_train, y_train)

# Evaluate the model on the test set in clear
y_pred_clear = model.predict(X_test)

# Compile the model
model.compile(X_train)

# Perform the inference in FHE
# Note that here the encryption and decryption is done behind the scene.
# It is recommended to run this with a very small batch of
# examples first (e.g. N_TEST_FHE = 3)
N_TEST_FHE = 3
y_pred_fhe = numpy.array([
  model.predict([sample], execute_in_fhe=True)[0]
  for sample in tqdm(X_test[:N_TEST_FHE])
])

# Assert that FHE predictions are the same as the clear predictions
print(f"{(y_pred_fhe == y_pred_clear[:N_TEST_FHE]).sum()} "
      f"examples over {N_TEST_FHE} have a FHE inference equal to the clear inference.")

# Output:
#  3 examples over 3 have a FHE inference equal to the clear inference

We can clearly observe the impact of quantization over the decision boundaries in the FHE model, separating the initial lines into broken lines with steps. However, this does not change the overall score as both models output the same accuracy (90%).

In fact, the quantization process may sometimes create some artifacts that could lead to a decrease in performance. Still, the impact of those artifacts is often minor when considering linear models as FHE models reach similar scores as their equivalent clear ones.

Tree-based Models

Concrete-ML

scikit-learn

Concrete-ML

XGboost

Example

from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from concrete.ml.sklearn.xgb import XGBClassifier


# Get data-set and split into train and test
X, y = load_breast_cancer(return_X_y=True)

# Split the train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Define our model
model = XGBClassifier(n_jobs=1, n_bits=3)

# Define the pipeline
# We will normalize the data and apply a PCA before fitting the model
pipeline = Pipeline(
    [("standard_scaler", StandardScaler()), ("pca", PCA(random_state=0)), ("model", model)]
)

# Define the parameters to tune
param_grid = {
    "pca__n_components": [2, 5, 10, 15],
    "model__max_depth": [2, 3, 5],
    "model__n_estimators": [5, 10, 20],
}

# Instantiate the grid search with 5-fold cross validation on all available cores
grid = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1, scoring="accuracy")

# Launch the grid search
grid.fit(X_train, y_train)

# Print the best parameters found
print(f"Best parameters found: {grid.best_params_}")

# Output:
#  Best parameters found: {'model__max_depth': 5, 'model__n_estimators': 10, 'pca__n_components': 5}

# Currently we only focus on model inference in FHE
# The data transformation will be done in clear (client machine)
# while the model inference will be done in FHE on a server.
# The pipeline can be split into 2 parts:
#   1. data transformation
#   2. estimator
best_pipeline = grid.best_estimator_
data_transformation_pipeline = best_pipeline[:-1]
model = best_pipeline[-1]

# Transform test set
X_train_transformed = data_transformation_pipeline.transform(X_train)
X_test_transformed = data_transformation_pipeline.transform(X_test)

# Evaluate the model on the test set in clear
y_pred_clear = model.predict(X_test_transformed)
print(f"Test accuracy in clear: {(y_pred_clear == y_test).mean():0.2f}")

# Output:
#  Test accuracy: 0.98

# Compile the model to FHE
model.compile(X_train_transformed)

# Perform the inference in FHE
# Warning: this will take a while. It is recommended to run this with a very small batch of
# example first (e.g. N_TEST_FHE = 1)
# Note that here the encryption and decryption is done behind the scene.
N_TEST_FHE = 1
y_pred_fhe = model.predict(X_test_transformed[:N_TEST_FHE], execute_in_fhe=True)

# Assert that FHE predictions are the same as the clear predictions
print(f"{(y_pred_fhe == y_pred_clear[:N_TEST_FHE]).sum()} "
      f"examples over {N_TEST_FHE} have a FHE inference equal to the clear inference.")

# Output:
#  1 examples over 1 have a FHE inference equal to the clear inference

This graph shows the impact of quantization over the decision boundaries in the Concrete-ML FHE decision tree models. In the 3-bits model, only a rough, highly-discrete decision function is observed. This results in a small decrease of accuracy of about 7% compared to the initial XGBoost classifier. Besides, using 6-bits of quantization makes the model reach 93% accuracy, drastically reducing this difference to only 1.7 percentage points.

In fact, the quantization process may sometimes create some artifacts that could lead to a decrease in performance. Still, as the quantization is done individually on each input feature, the artifacts are minor when considering small tree-based models with 5-6 bits quantization. Thus, FHE tree-based models reach similar scores as their equivalent floating point ones.

The following graph shows that using 5-6 bits of quantization is usually sufficient to reach the performance of a non-quantized XGBoost model on floating point data. The metrics plotted are accuracy and F1-score on the spambase data-set.

Neural Networks

Concrete-ML provides simple neural networks models with a Scikit-learn interface through the NeuralNetClassifier and NeuralNetRegressor classes.

Concrete-ML

These models use a stack of linear layers and the activation function and the number of neurons in each layer is configurable. This approach is similar to what is available in Scikit-learn using the MLPClassifier/MLPRegressor classes. The built-in, fully connected neural network (FCNN) models train easily with a single call to .fit(), which will automatically quantize the weights and activations. These models use Quantization Aware Training, allowing good performance for low precision (down to 2-3 bit) weights and activations.

Example usage

To create an instance of a Fully Connected Neural Network you need to instantiate one of the NeuralNetClassifier and NeuralNetRegressor classes and configure a number of parameters that are passed to their constructor. Note that some parameters need to be prefixed by module__, while others don't. Basically, the parameters that are related to the model, i.e. the underlying nn.Module, must have the prefix. The parameters that are related to training options do not require the prefix.

from concrete.ml.sklearn import NeuralNetClassifier
import torch.nn as nn

n_inputs = 10
n_outputs = 2
params = {
    "module__n_layers": 2,
    "module__n_w_bits": 2,
    "module__n_a_bits": 2,
    "module__n_accum_bits": 8,
    "module__n_hidden_neurons_multiplier": 1,
    "module__n_outputs": n_outputs,
    "module__input_dim": n_inputs,
    "module__activation_function": nn.ReLU,
    "max_epochs": 10,
}

concrete_classifier = NeuralNetClassifier(**params)

The figure above shows, on the right, the Concrete-ML neural network, trained with Quantization Aware Training, in a FHE-compatible configuration. The figure compares this network to the floating point equivalent, trained with scikit-learn.

Architecture parameters

module__n_layers: number of layers in the FCNN, must be at least 1. Note that this is the total number of layers. For a single hidden layer NN model, set module__n_layers=2
module__n_outputs: number of outputs (classes or targets)
module__input_dim: dimensionality of the input

Quantization parameters

n_w_bits (default 3): number of bits for weights
n_a_bits (default 3): number of bits for activations and inputs

Training parameters (from Skorch)

max_epochs: The number of epochs to train the network (default 10)
verbose: Whether to log loss/metrics during training (default: False)
lr: Learning rate (default 0.001)

Advanced parameters

Network input/output

When you have training data in the form of a NumPy array, and targets in a NumPy 1d array, you can set:

    classes = np.unique(y_all)
    params["module__input_dim"] = x_train.shape[1]
    params["module__n_outputs"] = len(classes)

Class weights

You can give weights to each class to use in training. Note that this must be supported by the underlying PyTorch loss function.

    from sklearn.utils.class_weight import compute_class_weight
    params["criterion__weight"] = compute_class_weight("balanced", classes=classes, y=y_train)

Overflow errors

The n_hidden_neurons_multiplier parameter influences training accuracy as it controls the number of non-zero neurons that are allowed in each layer. Increasing n_hidden_neurons_multiplier improves accuracy, but should take into account precision limitations to avoid overflow in the accumulator. The default value is a good compromise that avoids overflow, in most cases, but you may want to change the value of this parameter to reduce the breadth of the network if you have overflow errors. A value of 1 should be completely safe with respect to overflow.

Pandas

Concrete-ML provides partial support for Pandas, with most available models (linear and tree-based models) usable on Pandas dataframes the same way they would be used with NumPy arrays.

The table below summarizes the current compatibility:

Methods

Support Pandas dataframe

fit

✓

compile

✗

predict (execute_in_fhe=False)

✓

predict (execute_in_fhe=True)

✓

Example

import numpy as np
import pandas as pd
from concrete.ml.sklearn import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Create the data set as a Pandas dataframe
X, y = make_classification(
    n_samples=100,
    n_features=2,
    n_redundant=0,
    random_state=2,
)
X, y = pd.DataFrame(X), pd.DataFrame(y)

# Retrieve train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

# Instantiate the model
model = LogisticRegression(n_bits=2)

# Fit the model
model.fit(X_train, y_train)

# Evaluate the model on the test set in clear
y_pred_clear = model.predict(X_test)

# Compile the model
model.compile(X_train.to_numpy())

# Perform the inference in FHE
# Warning: this will take a while. It is recommended to run this with a very small batch of
# examples first (e.g. N_TEST_FHE = 1)
# Note that here the encryption and decryption is done behind the scenes.
N_TEST_FHE = 1
y_pred_fhe = model.predict(X_test.head(N_TEST_FHE), execute_in_fhe=True)

# Assert that FHE predictions are the same as the clear predictions
print(f"{(y_pred_fhe == y_pred_clear[:N_TEST_FHE]).sum()} "
      f"examples over {N_TEST_FHE} have a FHE inference equal to the clear inference.")

# Output:
#  1 examples over 1 have a FHE inference equal to the clear inference

Built-in Model Examples

The following table summarizes the various examples in this section, along with their accuracies.

Model

Data-set

Metric

Floating Point

Simulation

FHE

Linear Regression

Synthetic 1D

0.876

0.863

Logistic Regression

Synthetic 2D with 2 classes

accuracy

0.90

0.875

Poisson Regression

mean Poisson deviance

0.61

0.60

Gamma Regression

mean Gamma deviance

0.45

Tweedie Regression

mean Tweedie deviance (power=1.9)

33.42

34.18

Decision Tree

precision score

0.95

0.97

0.97*

XGBoost Classifier

MCC

0.48

0.52

0.52*

XGBoost Regressor

0.92

0.90

0.90*

A * means that FHE accuracy was calculated on a subset of the validation set.

Concrete-ML models

Comparison of classifiers

Kaggle competition

Deep Learning

Using Torch

The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.

import brevitas.nn as qnn
import torch.nn as nn
import torch

N_FEAT = 12
n_bits = 3

class QATSimpleNet(nn.Module):
    def __init__(self, n_hidden):
        super().__init__()

        self.quant_inp = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
        self.fc1 = qnn.QuantLinear(N_FEAT, n_hidden, True, weight_bit_width=n_bits, bias_quant=None)
        self.quant2 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
        self.fc2 = qnn.QuantLinear(n_hidden, n_hidden, True, weight_bit_width=3, bias_quant=None)
        self.quant3 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
        self.fc3 = qnn.QuantLinear(n_hidden, 2, True, weight_bit_width=n_hidden, bias_quant=None)

    def forward(self, x):
        x = self.quant_inp(x)
        x = self.quant2(torch.relu(self.fc1(x)))
        x = self.quant3(torch.relu(self.fc2(x)))
        x = self.fc3(x)
        return x

from concrete.ml.torch.compile import compile_brevitas_qat_model
import numpy

torch_input = torch.randn(100, N_FEAT)
torch_model = QATSimpleNet(30)
quantized_numpy_module = compile_brevitas_qat_model(
    torch_model, # our model
    torch_input, # a representative input-set to be used for both quantization and compilation
    n_bits = n_bits,
)

The model can now be used to perform encrypted inference. Next, the test data is quantized:

x_test = numpy.array([numpy.random.randn(N_FEAT)])
x_test_quantized = quantized_numpy_module.quantize_input(x_test)

and the encrypted inference run using either:

quantized_numpy_module.forward_and_dequant() to compute predictions in the clear, on quantized data and then de-quantize the result. The return value of this function contains the dequantized (float) output of running the model in the clear. Calling the forward function on the clear data is useful when debugging. The results in FHE will be the same as those on clear quantized data.
quantized_numpy_module.forward_fhe.encrypt_run_decrypt() to perform the FHE inference. In this case, dequantization is done in a second stage using quantized_numpy_module.dequantize_output().

Generic Quantization Aware Training import

While the example above shows how to import a Brevitas/PyTorch model, Concrete-ML also provides an option to import generic QAT models implemented either in PyTorch or through ONNX. Interestingly, deep learning models made with TensorFlow or Keras should be usable, by preliminary converting them to ONNX.

QAT models contain quantizers in the PyTorch graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized.

from concrete.ml.torch.compile import compile_torch_model
n_bits_qat = 3

quantized_numpy_module = compile_torch_model(
    torch_model,
    torch_input,
    import_qat=True,
    n_bits=n_bits_qat,
)

When importing QAT models using this generic pipeline, a representative calibration set should be given as quantization parameters in the model need to be inferred from the statistics of the values encountered during inference.

Supported operators and activations

Concrete-ML supports a variety of PyTorch operators that can be used to build fully connected or convolutional neural networks, with normalization and activation layers. Moreover, many element-wise operators are supported.

Operators

univariate operators

shape modifying operators

operators that take an encrypted input and unencrypted constants

Please note that Concrete-ML supports these operators but also the Quantization Aware Training equivalents from Brevitas.

brevitas.nn.QuantLinear
brevitas.nn.QuantConv2d

operators that can take both encrypted+unencrypted and encrypted+encrypted inputs

Quantizers

brevitas.nn.QuantIdentity

Activations

Note that the equivalent versions from torch.functional are also supported.

Using ONNX

ONNX models can be compiled by directly importing models that are already quantized with Quantization Aware Training (QAT). or by performing Post-Training Quantization (PTQ) with Concrete-ML.

Simple example

The following example shows how to compile an ONNX model using PTQ. The model was initially trained using Keras before being exported to ONNX. The training code is not shown here.

import numpy
import onnx
import tensorflow
import tf2onnx

from concrete.ml.torch.compile import compile_onnx_model
from concrete.numpy.compilation import Configuration


class FC(tensorflow.keras.Model):
    """A fully-connected model."""

    def __init__(self):
        super().__init__()
        hidden_layer_size = 10
        output_size = 5

        self.dense1 = tensorflow.keras.layers.Dense(
            hidden_layer_size,
            activation=tensorflow.nn.relu,
        )
        self.dense2 = tensorflow.keras.layers.Dense(output_size, activation=tensorflow.nn.relu6)
        self.flatten = tensorflow.keras.layers.Flatten()

    def call(self, inputs):
        """Forward function."""
        x = self.flatten(inputs)
        x = self.dense1(x)
        x = self.dense2(x)
        return self.flatten(x)


n_bits = 6
input_output_feature = 2
input_shape = (input_output_feature,)
num_inputs = 1
n_examples = 5000

# Define the Keras model
keras_model = FC()
keras_model.build((None,) + input_shape)
keras_model.compute_output_shape(input_shape=(None, input_output_feature))

# Create random input
input_set = numpy.random.uniform(-100, 100, size=(n_examples, *input_shape))

# Convert to ONNX
tf2onnx.convert.from_keras(keras_model, opset=14, output_path="tmp.model.onnx")

onnx_model = onnx.load("tmp.model.onnx")
onnx.checker.check_model(onnx_model)

# Compile
quantized_numpy_module = compile_onnx_model(
    onnx_model, input_set, n_bits=2
)

# Create test data from the same distribution and quantize using
# learned quantization parameters during compilation
x_test = tuple(numpy.random.uniform(-100, 100, size=(1, *input_shape)) for _ in range(num_inputs))
qtest = quantized_numpy_module.quantize_input(x_test)

y_clear = quantized_numpy_module(*qtest)
y_fhe = quantized_numpy_module.forward_fhe.encrypt_run_decrypt(*qtest)

print("Execution in clear: ", y_clear)
print("Execution in FHE:   ", y_fhe)
print("Equality:           ", numpy.sum(y_clear == y_fhe), "over", numpy.size(y_fhe), "values")

While Keras was used in this example, it is not officially supported as additional work is needed to test all of Keras' types of layer and models.

Quantization Aware Training

QAT models contain quantizers in the ONNX graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized. Since these QAT models have quantizers that are configured during training to a specific number of bits, the ONNX graph will need to be imported using the same settings:

n_bits_qat = 3  # number of bits for weights and activations during training

quantized_numpy_module = compile_onnx_model(
    onnx_model,
    input_set,
    import_qat=True,
    n_bits=n_bits_qat,
)

Supported operators

The following operators are supported for evaluation and conversion to an equivalent FHE circuit. Other operators were not implemented either due to FHE constraints, or because they are rarely used in PyTorch activations or scikit-learn models.

Abs
Acos
Acosh
Add
Asin
Asinh
Atan
Atanh
AveragePool
BatchNormalization
Cast
Celu
Clip
Constant
Conv
Cos
Cosh
Div
Elu
Equal
Erf
Exp
Flatten
Gemm
Greater
GreaterOrEqual
HardSigmoid
HardSwish
Identity
LeakyRelu
Less
LessOrEqual
Log
MatMul
Mul
Not
Or
PRelu
Pad
Pow
ReduceSum
Relu
Reshape
Round
Selu
Sigmoid
Sin
Sinh
Softplus
Sub
Tan
Tanh
ThresholdedRelu
Transpose
Where
onnx.brevitas.Quant

Step-by-Step Guide

Summary

Baseline model

This example shows how to train a fully-connected neural network on a synthetic 2D data-set with a checkerboard grid pattern of 100 x 100 points. The data is split into 9500 training and 500 test samples.

In PyTorch, using standard layers, this network would look as follows:

from torch import nn
import torch

N_FEAT = 2
class SimpleNet(nn.Module):
    """Simple MLP with PyTorch"""

    def __init__(self, n_hidden=30):
        super().__init__()
        self.fc1 = nn.Linear(in_features=N_FEAT, out_features=n_hidden)
        self.fc2 = nn.Linear(in_features=n_hidden, out_features=n_hidden)
        self.fc3 = nn.Linear(in_features=n_hidden, out_features=2)


    def forward(self, x):
        """Forward pass."""
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

neurons

100

fp32 accuracy

68.70%

83.32%

88.06%

3bit accuracy

56.44%

55.54%

56.50%

mean accumulator size

6.6

6.9

7.4

This shows that the fp32 accuracy and accumulator size increases with the number of hidden neurons, while the 3-bit accuracy remains low irrespective of to the number of neurons. While all the configurations tried here were FHE-compatible (accumulator < 8 bits), it is sometimes preferable to have a lower accumulator size in order for the inference time to be faster.

The accumulator size is determined by Concrete-Numpy as being the maximum bit-width encountered anywhere in the encrypted circuit

Pruning using Torch

Considering that FHE only works with limited integer precision, there is a risk of overflowing in the accumulator, resulting in unpredictable results.

To understand how to overcome this limitation, consider a scenario where 2 bits are used for weights and layer inputs/outputs. The Linear layer computes a dot product between weights and inputs $y = \sum_i w_i x_i$ . With 2 bits, no overflow can occur during the computation of the Linear layer as long the number of neurons does not exceed 14, i.e. the sum of 14 products of 2-bit numbers does not exceed 7 bits.

By default, Concrete-ML uses symmetric quantization for model weights, with values in the interval $\left[-2^{n_{bits}-1}, 2^{n_{bits}-1}-1\right]$ . For example, for $n_{bits}=2$ the possible values are $[-2, -1, 0, 1]$ , for $n_{bits}=3$ the values can be $[-4,-3,-2,-1,0,1,2,3]$ .

However, in a typical setting, the weights will not all have the maximum or minimum values (e.g. $-2^{n_{bits}-1}$ ). Instead, weights typically have a normal distribution around 0, which is one of the motivating factors for their symmetric quantization. A symmetric distribution and many zero-valued weights are desirable because opposite sign weights can cancel each other out and zero weights do not increase the accumulator size.

The following code shows how to use pruning in the previous example:

import torch.nn.utils.prune as prune

class PrunedSimpleNet(SimpleNet):
    """Simple MLP with PyTorch"""

    def prune(self, max_non_zero, enable):
        # Linear layer weight has dimensions NumOutputs x NumInputs
        for layer in self.named_modules():
            if isinstance(layer, nn.Linear):
                num_zero_weights = (layer.weight.shape[1] - max_non_zero) * layer.weight.shape[0]
                if num_zero_weights <= 0:
                    continue

                if enable:
                    prune.l1_unstructured(layer, "weight", amount=num_zero_weights)
                else:
                    prune.remove(layer, "weight")

Results with PrunedSimpleNet, a pruned version of the SimpleNet with 100 neurons on the hidden layers, are given below:

non-zero neurons

fp32 accuracy

82.50%

88.06%

3bit accuracy

57.74%

57.82%

mean accumulator size

6.6

6.8

This shows that the fp32 accuracy has been improved while maintaining constant mean accumulator size.

When pruning a larger neural network during training, it is easier to obtain a low bit-width accumulator while maintaining better final accuracy. Thus, pruning is more robust than training a similar smaller network.

Quantization Aware Training

The QAT import tool in Concrete-ML is a work in progress. While it has been tested with some networks built with Brevitas, it is possible to use other tools to obtain QAT networks.

import brevitas.nn as qnn


from brevitas.core.bit_width import BitWidthImplType
from brevitas.core.quant import QuantType
from brevitas.core.restrict_val import FloatToIntImplType, RestrictValueType
from brevitas.core.scaling import ScalingImplType
from brevitas.core.zero_point import ZeroZeroPoint
from brevitas.inject import ExtendedInjector
from brevitas.quant.solver import ActQuantSolver, WeightQuantSolver
from dependencies import value

# Configure quantization options
class CommonQuant(ExtendedInjector):
    bit_width_impl_type = BitWidthImplType.CONST
    scaling_impl_type = ScalingImplType.CONST
    restrict_scaling_type = RestrictValueType.FP
    zero_point_impl = ZeroZeroPoint
    float_to_int_impl_type = FloatToIntImplType.ROUND
    scaling_per_output_channel = False
    narrow_range = True
    signed = True

    @value
    def quant_type(bit_width):
        if bit_width is None:
            return QuantType.FP
        elif bit_width == 1:
            return QuantType.BINARY
        else:
            return QuantType.INT

# Quantization options for weights/activations
class CommonWeightQuant(CommonQuant, WeightQuantSolver):
    scaling_const = 1.0
    signed = True


class CommonActQuant(CommonQuant, ActQuantSolver):
    min_val = -1.0
    max_val = 1.0

class QATPrunedSimpleNet(nn.Module):
    def __init__(self, n_hidden):
        super(QATPrunedSimpleNet, self).__init__()

        n_bits = 3
        self.quant_inp = qnn.QuantIdentity(
            act_quant=CommonActQuant,
            bit_width=n_bits,
            return_quant_tensor=True,
        )

        self.fc1 = qnn.QuantLinear(
            N_FEAT,
            n_hidden,
            True,
            weight_quant=CommonWeightQuant,
            weight_bit_width=n_bits,
            bias_quant=None,
        )

        self.q1 = qnn.QuantIdentity(
            act_quant=CommonActQuant, bit_width=n_bits, return_quant_tensor=True
        )

        self.fc2 = qnn.QuantLinear(
            n_hidden,
            n_hidden,
            True,
            weight_quant=CommonWeightQuant,
            weight_bit_width=3,
            bias_quant=None
        )

        self.q2 = qnn.QuantIdentity(
            act_quant=CommonActQuant, bit_width=n_bits, return_quant_tensor=True
        )

        self.fc3 = qnn.QuantLinear(
            n_hidden,
            2,
            True,
            weight_quant=CommonWeightQuant,
            weight_bit_width=n_hidden,
            bias_quant=None,
        )

        for m in self.modules():
            if isinstance(m, qnn.QuantLinear):
                torch.nn.init.uniform_(m.weight.data, -1, 1)

    def forward(self, x):
        x = self.quant_inp(x)
        x = self.q1(torch.relu(self.fc1(x)))
        x = self.q2(torch.relu(self.fc2(x)))
        x = self.fc3(x)
        return x

    def prune(self, max_non_zero, enable):
        # Linear layer weight has dimensions NumOutputs x NumInputs
        for name, layer in self.named_modules():
            if isinstance(layer, nn.Linear):
                num_zero_weights = (layer.weight.shape[1] - max_non_zero) * layer.weight.shape[0]
                if num_zero_weights <= 0:
                    continue

                if enable:
                    print(f"Pruning layer {name} factor {num_zero_weights}")
                    prune.l1_unstructured(layer, "weight", amount=num_zero_weights)
                else:
                    prune.remove(layer, "weight")

Training this network with 30 out of 100 total non-zero neurons gives good accuracy while being FHE-compatible (accumulator size < 8 bits).

non-zero neurons

3bit accuracy brevitas

95.4%

3bit accuracy in Concrete-ML

92.4%

accumulator size

The PyTorch QAT training loop is the same as the standard floating point training loop, but hyper-parameters such as learning rate might need to be adjusted.

Quantization Aware Training is somewhat slower than normal training. QAT introduces quantization during both the forward and backward passes. The quantization process is inefficient on GPUs as its computational intensity is low with respect to data transfer time.

Deep Learning Examples

Summary

The following table summarizes the examples in this section:

Examples

Debugging Models

This section provides a set of tools and guidelines to help users build optimized FHE-compatible models.

Virtual library

The Virtual Lib in Concrete-ML is a prototype that provides drop-in replacements for Concrete-Numpy's compiler, allowing users to simulate what would happen when converting a model to FHE without the current bit-width constraint. Additionally, it quickly simulates the behavior with 8 bits or less without actually doing the FHE computations.

The Virtual Lib can be useful when developing and iterating on an ML model implementation. For example, you can check that your model is compatible in terms of operands (all integers) with the Virtual Lib compilation. Then, you can check how many bits your ML model would require, which can give you hints as to how it should be modified if you want to compile it to an actual FHE Circuit (not a simulated one) that only supports 8 bits of integer precision.

The following example shows how to use the Virtual Lib in Concrete-ML. Simply add use_virtual_lib = True and enable_unsafe_features = True in a Configuration. The result of the compilation will then be a simulated circuit that allows for more precision or simulated FHE execution.

from sklearn.datasets import fetch_openml, make_circles
from concrete.ml.sklearn import RandomForestClassifier
from concrete.numpy import Configuration
debug_config = Configuration(
    enable_unsafe_features=True,
    use_insecure_key_cache=True,
    insecure_key_cache_location="~/.cml_keycache",
)

n_bits = 2
X, y = make_circles(n_samples=1000, noise=0.1, factor=0.6, random_state=0)
concrete_clf = RandomForestClassifier(
    n_bits=n_bits, n_estimators=10, max_depth=5
)
concrete_clf.fit(X, y)

concrete_clf.compile(X, debug_config, use_virtual_lib=True)

y_preds_clear = concrete_clf.predict(X)

Compilation debugging

The following example produces a neural network that is not FHE-compatible:

import numpy
import torch

from torch import nn
from concrete.ml.torch.compile import compile_torch_model

N_FEAT = 2
class SimpleNet(nn.Module):
    """Simple MLP with PyTorch"""

    def __init__(self, n_hidden=30):
        super().__init__()
        self.fc1 = nn.Linear(in_features=N_FEAT, out_features=n_hidden)
        self.fc2 = nn.Linear(in_features=n_hidden, out_features=n_hidden)
        self.fc3 = nn.Linear(in_features=n_hidden, out_features=2)


    def forward(self, x):
        """Forward pass."""
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x


torch_input = torch.randn(100, N_FEAT)
torch_model = SimpleNet(120)
try:
    quantized_numpy_module = compile_torch_model(
        torch_model,
        torch_input,
        n_bits = 3,
    )
except RuntimeError as err:
    print(err)

Upon execution, the compiler will raise the following error:

%0 = [[-1 -3] [ ... ] [-2  2]]        # ClearTensor<int3, shape=(120, 2)>
 %1 = [[ 1  3 -2 ...  1  2  0]]        # ClearTensor<int3, shape=(120, 120)>
 %2 = [[ 2  0  3 ... -2 -2 -1]]        # ClearTensor<int3, shape=(2, 120)>
 %3 = _onnx__Gemm_0                    # EncryptedTensor<uint5, shape=(1, 2)>
 %4 = -15                              # ClearScalar<int5>
 %5 = add(%3, %4)                      # EncryptedTensor<int6, shape=(1, 2)>
 %6 = subgraph(%5)                     # EncryptedTensor<int3, shape=(1, 2)>
 %7 = matmul(%6, %2)                   # EncryptedTensor<int6, shape=(1, 120)>
 %8 = subgraph(%7)                     # EncryptedTensor<uint3, shape=(1, 120)>
 %9 = matmul(%8, %1)                   # EncryptedTensor<int9, shape=(1, 120)>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ only up to 8-bit integers are supported
%10 = subgraph(%9)                     # EncryptedTensor<uint3, shape=(1, 120)>
%11 = matmul(%10, %0)                  # EncryptedTensor<int8, shape=(1, 2)>
%12 = subgraph(%11)                    # EncryptedTensor<uint5, shape=(1, 2)>
return %12

Knowing that a linear/dense layer is implemented as a matrix multiplication, it can determine which parts of the op-graph listing in the exception message above correspond to which layers.

Layer weights initialization:

%0 = [[-1 -3] [ ... ] [-2  2]]        # ClearTensor<int3, shape=(120, 2)>
 %1 = [[ 1  3 -2 ...  1  2  0]]        # ClearTensor<int3, shape=(120, 120)>
 %2 = [[ 2  0  3 ... -2 -2 -1]]        # ClearTensor<int3, shape=(2, 120)>

Input processing and quantization:

 %3 = _onnx__Gemm_0                    # EncryptedTensor<uint5, shape=(1, 2)>
 %4 = -15                              # ClearScalar<int5>
 %5 = add(%3, %4)                      # EncryptedTensor<int6, shape=(1, 2)>
 %6 = subgraph(%5)                     # EncryptedTensor<int3, shape=(1, 2)>

First dense layer and activation function:

%7 = matmul(%6, %2)                   # EncryptedTensor<int6, shape=(1, 120)>
%8 = subgraph(%7)                     # EncryptedTensor<uint3, shape=(1, 120)>

Second dense layer and activation function:

%9 = matmul(%8, %1)                   # EncryptedTensor<int9, shape=(1, 120)>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ only up to 8-bit integers are supported
%10 = subgraph(%9)                     # EncryptedTensor<uint3, shape=(1, 120)>

Third dense layer and output quantization:

%11 = matmul(%10, %0)                  # EncryptedTensor<int8, shape=(1, 2)>
%12 = subgraph(%11)                    # EncryptedTensor<uint5, shape=(1, 2)>
return %12

We can see here that the error is in the second layer. Reducing the number of neurons in this layer will resolve the error and make the network FHE-compatible:

torch_model = SimpleNet(50)
try:
    quantized_numpy_module = compile_torch_model(
        torch_model,
        torch_input,
        n_bits = 3,
    )
except RuntimeError as err:
    print(err)

Complexity analysis

In FHE, univariate functions are encoded as table lookups, which are then implemented using Programmable Bootstrapping (PBS). PBS is a powerful technique but will require significantly more computing resources, and thus time, than simpler encrypted operations such matrix multiplications, convolution or additions.

Furthermore, the cost of PBS will depend on the bit-width of the compiled circuit. Every additional bit in the maximum bit-width raises the complexity of the PBS by a significant factor. It may be of interest to the model developer, then, to determine the bit-width of the circuit and the amount of PBS it performs.

This can be done by inspecting the MLIR code produced by the compiler:

Concrete-ML Model

torch_model = SimpleNet(50)
try:
    quantized_numpy_module = compile_torch_model(
        torch_model,
        torch_input,
        n_bits = 3,
        show_mlir=True,
    )
except RuntimeError as err:
    print(err)

Compiled MLIR model

%cst = arith.constant dense<...> : tensor<50x2xi9>
%cst_0 = arith.constant dense<...>
%cst_1 = arith.constant dense<...> : tensor<2x50xi9>
%c-14_i9 = arith.constant -14 : i9
%c128_i9 = arith.constant 128 : i9
%c128_i9_2 = arith.constant 128 : i9
%c128_i9_3 = arith.constant 128 : i9
%c128_i9_4 = arith.constant 128 : i9
%hack_0_c-14_i9 = tensor.from_elements %c-14_i9 : tensor<1xi9>
%0 = "FHELinalg.add_eint_int"(%arg0, %hack_0_c-14_i9) : (tensor<1x2x!FHE.eint<8>>, tensor<1xi9>) -> tensor<1x2x!FHE.eint<8>>
%hack_1_c128_i9_4 = tensor.from_elements %c128_i9_4 : tensor<1xi9>
%1 = "FHELinalg.add_eint_int"(%0, %hack_1_c128_i9_4) : (tensor<1x2x!FHE.eint<8>>, tensor<1xi9>) -> tensor<1x2x!FHE.eint<8>>
%cst_5 = arith.constant dense<...> : tensor<256xi64>
%2 = "FHELinalg.apply_lookup_table"(%1, %cst_5) : (tensor<1x2x!FHE.eint<8>>, tensor<256xi64>) -> tensor<1x2x!FHE.eint<8>>

%3 = "FHELinalg.matmul_eint_int"(%2, %cst_1) : (tensor<1x2x!FHE.eint<8>>, tensor<2x50xi9>) -> tensor<1x50x!FHE.eint<8>>
%hack_4_c128_i9_3 = tensor.from_elements %c128_i9_3 : tensor<1xi9>
%4 = "FHELinalg.add_eint_int"(%3, %hack_4_c128_i9_3) : (tensor<1x50x!FHE.eint<8>>, tensor<1xi9>) -> tensor<1x50x!FHE.eint<8>>
%cst_6 = arith.constant dense<...> : tensor<34x256xi64>
%cst_7 = arith.constant dense<...]> : tensor<1x50xindex>
%5 = "FHELinalg.apply_mapped_lookup_table"(%4, %cst_6, %cst_7) : (tensor<1x50x!FHE.eint<8>>, tensor<34x256xi64>, tensor<1x50xindex>) -> tensor<1x50x!FHE.eint<8>>

%6 = "FHELinalg.matmul_eint_int"(%5, %cst_0) : (tensor<1x50x!FHE.eint<8>>, tensor<50x50xi9>) -> tensor<1x50x!FHE.eint<8>>
%hack_7_c128_i9_2 = tensor.from_elements %c128_i9_2 : tensor<1xi9>
%7 = "FHELinalg.add_eint_int"(%6, %hack_7_c128_i9_2) : (tensor<1x50x!FHE.eint<8>>, tensor<1xi9>) -> tensor<1x50x!FHE.eint<8>>
%cst_8 = arith.constant dense<...> : tensor<34x256xi64>
%cst_9 = arith.constant dense<...> : tensor<1x50xindex>
%8 = "FHELinalg.apply_mapped_lookup_table"(%7, %cst_8, %cst_9) : (tensor<1x50x!FHE.eint<8>>, tensor<34x256xi64>, tensor<1x50xindex>) -> tensor<1x50x!FHE.eint<8>>

%9 = "FHELinalg.matmul_eint_int"(%8, %cst) : (tensor<1x50x!FHE.eint<8>>, tensor<50x2xi9>) -> tensor<1x2x!FHE.eint<8>>
%hack_10_c128_i9 = tensor.from_elements %c128_i9 : tensor<1xi9>
%10 = "FHELinalg.add_eint_int"(%9, %hack_10_c128_i9) : (tensor<1x2x!FHE.eint<8>>, tensor<1xi9>) -> tensor<1x2x!FHE.eint<8>>
%cst_10 = arith.constant dense<...> : tensor<2x256xi64>
%cst_11 = arith.constant dense<[[0, 1]]> : tensor<1x2xindex>
%11 = "FHELinalg.apply_mapped_lookup_table"(%10, %cst_10, %cst_11) : (tensor<1x2x!FHE.eint<8>>, tensor<2x256xi64>, tensor<1x2xindex>) -> tensor<1x2x!FHE.eint<8>>
return %11 : tensor<1x2x!FHE.eint<8>>

There are several calls to FHELinalg.apply_mapped_lookup_table and FHELinalg.apply_lookup_table. These calls apply PBS to the cells of their input tensors. Their inputs in the listing above are: tensor<1x2x!FHE.eint<8>> for the first and last call and tensor<1x50x!FHE.eint<8>> for the two calls in the middle. Thus, PBS is applied 104 times.

Getting the bit-width of the circuit is then simply:

print(quantized_numpy_module.forward_fhe.graph.maximum_integer_bit_width())

Decreasing the number of bits and the number of PBS induces large reductions in the computation time of the compiled circuit.

Advanced topics

Quantization

Quantization is the process of constraining an input from a continuous or otherwise large set of values (such as real numbers) to a discrete set (such as integers).

This means that some accuracy in the representation is lost (e.g. a simple approach is to eliminate least-significant bits). However, in many cases in machine learning, it is possible to adapt the models to give meaningful results while using these smaller data types. This significantly reduces the number of bits necessary for intermediary results during the execution of these machine learning models.

Since FHE is currently limited to 8-bit integers, it is necessary to quantize models to make them compatible. As a general rule, the smaller the precision models, the better the FHE performance.

Overview of quantization in Concrete-ML

Quantization implemented in Concrete-ML is applied in two ways:

Built-in models apply quantization internally and the user only needs to configure some quantization parameters. This approach requires little work by the user but may not be a one-size-fits-all solution for all types of models. The final quantized model is FHE friendly and ready to predict over encrypted data. In this setting, Post-Training Quantization (PTQ) is for linear models, data quantization is used for tree-based models and, finally, Quantization Aware Training (QAT) is included in the built-in neural network models.

While Concrete-ML quantizes machine learning models, the data the client has is often in floating point. The Concrete-ML models provide APIs to quantize inputs and de-quantize outputs.

Please note that the floating point input is quantized in the clear, i.e. it is converted to integers before being encrypted. Moreover, the model's output are also integers and are decrypted before de-quantization.

Basics of quantization

Let $[\alpha, \beta ]$ be the range of a value to quantize where $\alpha$ is the minimum and $\beta$ is the maximum. To quantize a range of floating point values (in $\mathbb{R}$ ) to integer values (in $\mathbb{Z}$ ), the first step is to choose the data type that is going to be used. Concrete, the framework used by Concrete-ML, is currently limited to 8-bit integers, so this will be the value used in this example. Knowing the number of bits that can be used for a value in the range $[\alpha, \beta ]$ , the scale $S$ can be computed :

$S = \frac{\beta - \alpha}{2^n - 1}$

where $n$ is the number of bits ( $n \leq 8$ ). For the sake of example, let's take $n = 7$ .

In practice, the quantization scale is then $S = \frac{\beta - \alpha}{127}$ . This means the gap between consecutive representable values cannot be smaller than $S$ , which, in turn, means there can be a substantial loss of precision. Every interval of length $S$ will be represented by a value within the range $[0..127]$ .

The other important parameter from this quantization schema is the zero point $Z_p$ value. This essentially brings the 0 floating point value to a specific integer. If the quantization scheme is asymmetric (quantized values are not centered in 0), the resulting integer will be in $\mathbb{Z}$ .

$Z_p = \mathtt{round} \left(- \frac{\alpha}{S} \right)$

Configuring model quantization parameters

Built-in models provide a simple interface for configuring quantization parameters, most notably the number of bits used for inputs, model weights, intermediary and output values.

For linear models, n_bits is used to quantize both model inputs and weights. Depending on the number of features, you can use a single integer value for the n_bits parameter, e.g. a value between 2 and 7. When the number of features is high, the n_bits parameter should be decreased if you encounter compilation errors. It is also possible to quantize inputs and weights with different number of bits by passing a dictionary to n_bits , containing the op_inputs and op_weights keys.

Tree-based models can directly control the accumulator bit-width used. However, if 6 or 7 bits are not sufficient to obtain good accuracy on your data-set, one option is to use an ensemble model (RandomForest or XGBoost) and increase the number of trees in the ensemble. This, however, will have a detrimental impact on FHE execution speed.

Note that for the built-in linear models and neural networks, the maximum accumulator bit-width can not be precisely controlled. To use many input features and a high number of bits is beneficial for model accuracy, but it can conflict with the 8-bit accumulator constraint. Finding the best quantization parameters to maximize accuracy can only be done through experimentation.

Quantizing model inputs and outputs

The models implemented in Concrete-ML provide features to let the user quantize the input data and de-quantize the output data.

Here is a simple example showing how to perform inference, starting from float values and ending up with float values. Note that the FHE engine that is compiled for the ML models does not support data batching.

# Assume quantized_module : QuantizedModule
#        data: numpy.ndarray of float

# Quantization is done in the clear
x_test_q = quantized_module.quantize_input(data)

for i in range(x_test_q.shape[0]):
    # Inputs must have size (1 x N) or (1 x C x H x W), we add the batch dimension with N=1
    x_q = np.expand_dims(x_test_q[i, :], 0)

    # Execute the model in FHE
    out_fhe = quantized_module.forward_fhe.encrypt_run_decrypt(x_q)

    # Dequantization is done in the clear
    output = quantized_module.dequantize_output(out_fhe)

    # For classifiers with multi-class outputs, the arg max is done in the clear
    y_pred = np.argmax(output, 1)

Resources

Pruning

Overview of pruning in Concrete-ML

Pruning is used in Concrete-ML for two types of neural networks:

Basics of pruning

In neural networks, a neuron computes a linear combination of inputs and learned weights, then applies an activation function.

The neuron computes:

$y_k = \phi\left(\sum_i w_ix_i\right)$

When building a full neural network, each layer will contain multiple neurons, which are connected to the neuron outputs of a previous layer or to the inputs.

For every neuron shown in each layer of the figure above, the linear combinations of inputs and learned weights are computed. Depending on the values of the inputs and weights, the sum $v_k = \sum_i w_ix_i$ - which for Concrete-ML neural networks is computed with integers - can take a range of different values.

Pruning a neural network entails fixing some of the weights $w_k$ to be zero during training. This is advantageous to meet FHE constraints, as irrespective of the distribution of $x_i$ , multiplying these input values by 0 does not increase the accumulator value.

Fixing some of the weights to 0 makes the network graph look more similar to the following:

Pruning in practice

In the formula above, in the worst-case, the maximum number of the input and weights that can make the result exceed $n$ bits is given by:

$\Omega = \mathsf{floor} \left( \frac{2^{n_{\mathsf{max}}} - 1}{(2^{n_{\mathsf{weights}}} - 1)(2^{n_{\mathsf{inputs}}} - 1)} \right)$

Here, $n_{\mathsf{max}} = 8$ is the maximum precision allowed.

For example, if $n_{\mathsf{weights}} = 2$ and $n_{\mathsf{inputs}} = 2$ with $n_{\mathsf{max}} = 8$ , the worst case is where all inputs and weights are equal to their maximal value $2^2-1=3$ . In this case, there can be at most $\Omega = 28$ elements in the multi-sums.

In practice, the distribution of the weights of a neural network is Gaussian, with many weights either 0 or having a small value. This enables exceeding the worst-case number of active neurons without having to risk overflowing the bit-width. In built-in neural networks, the parameter n_hidden_neurons_multiplier is multiplied with $\Omega$ to determine the total number of non-zero weights that should be kept in a neuron.

Compilation

Compilation of a model produces machine code that executes the model on encrypted data. In some cases, notably in the client/server setting, the compilation can be done by the server when loading the model for serving.

As FHE execution is much slower than execution on non-encrypted data, Concrete-ML has a simulation mode, using an execution mode named the Virtual Library. Since, by default, the cryptographic parameters are chosen such that the results obtained in FHE are the same as those on clear data, the Virtual Library allows you to benchmark models quickly during development.

Compilation

From the perspective of the Concrete-ML user, the compilation process performed by Concrete-Numpy can be broken up into 3 steps:

Numpy program tracing and creation of a Concrete-Numpy op-graph
checking that the op-graph is FHE compatible
producing machine code for the op-graph. This step automatically determines cryptographic parameters

Simulation with the Virtual Library

The result of this single step of the compilation pipeline allows the:

execution of the op-graph, which includes TLUs, on clear non-encrypted data. This is, of course, not secure, but it is much faster than executing in FHE. This mode is useful for debugging, i.e. to find the appropriate hyper-parameters. This mode is called the Virtual Library.
verification of the maximum bit-width of the op-graph, to determine FHE compatibility, without actually compiling the circuit to machine code.

Enabling Virtual Library execution requires the definition of a compilation Configuration. As simulation does not execute in FHE, this can be considered unsafe:

    COMPIL_CONFIG_VL = Configuration(
        dump_artifacts_on_unexpected_failures=False,
        enable_unsafe_features=True,  # This is for our tests in Virtual Library only
    )

Next, the following code uses the simulation mode for built-in models:

    clf.compile(
        X_train,
        use_virtual_lib=True,
        configuration=COMPIL_CONFIG_VL,
    )

And finally, for custom models, it is possible to enable simulation using the following syntax:

    quantized_numpy_module = compile_torch_model(
        torch_model,  # our model
        X_train,  # a representative input-set to be used for both quantization and compilation
        n_bits={"net_inputs": 5, "op_inputs": 3, "op_weights": 3, "net_outputs": 5},
        import_qat=is_qat,  # signal to the conversion function whether the network is QAT
        use_virtual_lib=True,
        configuration=COMPIL_CONFIG_VL,
    )

Obtaining the simulated predictions of the models using the Virtual Library has the same syntax as execution in FHE:

    Z = clf.predict_proba(X, execute_in_fhe=True)

Moreover, the maximum accumulator bit-width is determined as follows:

    bit_width = clf.quantized_module_.forward_fhe.graph.maximum_integer_bit_width()

A simple Concrete-Numpy example

import numpy
from concrete.numpy.compilation import compiler

# Let's assume Quantization has been applied and we are left with integers only.
# This is essentially the work of Concrete-ML

# Some parameters (weight and bias) for our model taking a single feature
w = [2]
b = 2

# The function that implements our model
@compiler({"x": "encrypted"})
def linear_model(x):
    return w @ x + b

# A representative input-set is needed to compile the function
# (used for tracing)
n_bits_input = 2
inputset = numpy.arange(0, 2**n_bits_input).reshape(-1, 1)
circuit = linear_model.compile(inputset)

# Use the API to get the maximum bit-width in the circuit
max_bit_width = circuit.graph.maximum_integer_bit_width()
print("Max bit_width = ", max_bit_width)
# Max bit_width =  4

# Test our FHE inference
circuit.encrypt_run_decrypt(numpy.array([3]))
# 8

# Print the graph of the circuit
print(circuit)
# %0 = 2                     # ClearScalar<uint2>
# %1 = [2]                   # ClearTensor<uint2, shape=(1,)>
# %2 = x                     # EncryptedTensor<uint2, shape=(1,)>
# %3 = matmul(%1, %2)        # EncryptedScalar<uint3>
# %4 = add(%3, %0)           # EncryptedScalar<uint4>
# return %4

Production Deployment

Concrete-ML provides functionality to deploy FHE machine learning models in a client/server setting. The deployment workflow and model serving pattern is as follows:

Deployment

The training of the model and its compilation to FHE are performed on a development machine. Three different files are created when saving the model:

client.json contains the secure cryptographic parameters needed for the client to generate private and evaluation keys.
server.json contains the compiled model. This file is sufficient to run the model on a server.
serialized_processing.json contains the metadata about pre- and post-processing, such as quantization parameters to quantize the input and de-quantize the output.

The compiled model (server.zip) is deployed to a server and the cryptographic parameters (client.zip) along with the model meta data (serialized_processing.json) are shared with the clients.

Serving

The client obtains the cryptographic parameters (using client.zip) and generates a private encryption/decryption key as well as a set of public evaluation keys. The public evaluation keys are then sent to the server, while the secret key remains on the client.

The private data is then encrypted using serialized_processing.json by the client and sent to the server. Server-side, the FHE model inference is run on the encrypted inputs using the public evaluation keys.

The encrypted result is then returned by the server to the client, which decrypts it using its private key. Finally, the client performs any necessary post-processing of the decrypted result using serialized_processing.json.

Example notebook

Advanced Features

Concrete-ML offers some features for advanced users that wish to adjust the cryptographic parameters that are generated by the Concrete stack for a certain machine learning model.

Approximate computations using the `p_error` parameter

Concrete-ML makes use of table lookup (TLU) to represent any non-linear operation (e.g. sigmoid). This TLU is implemented through the Programmable Bootstrapping (PBS) operation which will apply a non-linear operation in the cryptographic realm.

In Concrete-ML, the result of the TLU operation is obtained with a specific error probability:

A single PBS operation has 1 - DEFAULT_P_ERROR_PBS = 99.9936657516% chances of being correct. This number plays a role in the cryptographic parameters. As such, the lower the p_error, the more constraining the parameters will become. This has an impact on both key generation and, more importantly, on FHE execution time.

Here is a visualization of the effect of the p_error over a simple linear regression with a p_error = 0.1 vs the default p_error value:

The execution for the two models are 336 ms per example for the standard p_error and 253 ms per example for a p_error = 0.1 (on a 8 cores Intel CPU machine). Obviously, this speedup is very dependent on model complexity. To obtain a speedup while maintaining good accuracy, it is possible to search for a good value of p_error. Currently no heuristic has been proposed to find a good value a-priori.

Users have the possibility to change this p_error as they choose fit, by passing an argument to the compile function of any of the models. Here is an example:

Developer Guide

Workflow

Set Up the Project

Concrete-ML is a Python library, so Python should be installed to develop Concrete-ML. v3.8 and v3.9 are the only supported versions. Concrete-ML also uses Poetry and Make.

First of all, you need to git clone the project:

Automatic installation

For Windows users, the setup_os_deps.sh script does not install dependencies because of how many different installation methods there are/lack of a single package manager.

Manual installation

Python

Poetry

As there is no concrete-compiler package for Windows, only the dev dependencies can be installed. This requires Poetry >= 1.2.

Make

The dev tools use make to launch the various commands.

On Linux, you can install make from your distribution's preferred package manager.

On macOS, you can install a more recent version of make via brew:

In the following sections, be sure to use the proper make tool for your system: make, gmake, or other.

Cloning the repository

To get the source code of Concrete-ML, clone the code repository using the link for your favourite communication protocol (ssh or https).

Setting up environment on your host OS

We are going to make use of virtual environments. This helps to keep the project isolated from other Python projects in the system. The following commands will create a new virtual environment under the project directory and install dependencies to it.

The following command will not work on Windows if you don't have Poetry >= 1.2.

Activating the environment

Finally, activate the newly created environment using the following command:

macOS or Linux

Windows

Setting up environment on Docker

Docker automatically creates and sources a venv in ~/dev_venv/

The venv persists thanks to volumes. It also creates a volume for ~/.cache to speed up later reinstallations. You can check which Docker volumes exist with:

You can still run all make commands inside Docker (to update the venv, for example). Be mindful of the current venv being used (the name in parentheses at the beginning of your command prompt).

Leaving the environment

After your work is done, you can simply run the following command to leave the environment:

Syncing environment with the latest changes

From time to time, new dependencies will be added to the project or the old ones will be removed. The command below will make sure the project has the proper environment, so run it regularly!

Troubleshooting your environment

in your OS

If you are having issues, consider using the dev Docker exclusively (unless you are working on OS-specific bug fixes or features).

Here are the steps you can take on your OS to try and fix issues:

in Docker

Here are the steps you can take in your Docker to try and fix issues:

If the problem persists at this point, you should ask for help. We're here and ready to assist!

Set Up Docker

Building the image

Once you do that, you can get inside the Docker environment using the following command:

After you finish your work, you can leave Docker by using the exit command or by pressing CTRL + D.

Documentation

Using GitBook

Documentation with GitBook is done mainly by pushing content on GitHub. GitBook then pulls the docs from the repository and publishes. In most cases, GitBook is just a mirror of what is available in GitHub.

There are, however, some use-cases where documentation can be modified directly in GitBook (and then, push the modifications to GitHub), for example when the documentation is modified by a person outside of Zama. In this case, a GitHub branch is created, and a GitHub space is associated to it: modifications are done in this space and automatically pushed to the branch. Once the modifications are done, one can simply create a pull-request, to finally merge modifications on the main branch.

Using Sphinx

Documentation can alternatively be built using Sphinx:

The documentation contains both files written by hand by developers (the .md files) and files automatically created by parsing the source files.

Then to open it, go to docs/_build/html/index.html or use the follwing command:

To build and open the docs at the same time, use:

Support and Issues

Concrete-ML is a constant work-in-progress, and thus may contain bugs or suboptimal APIs.

Furthermore, undefined behavior may occur if the input-set, which is internally used by the compilation core to set bit-widths of some intermediate data, is not sufficiently representative of the future user inputs. With all the inputs in the input-set, it appears that intermediate data can be represented as an n-bit integer. But, for a particular computation, this same intermediate data needs additional bits to be represented. The FHE execution for this computation will result in an incorrect output, as typically occurs in integer overflows in classical programs.

Submitting an issue

the reproducibility rate you see on your side
any insight you might have on the bug
any workaround you have been able to find

Contributing

There are three ways to contribute to Concrete-ML:

You can open issues to report bugs and typos and to suggest ideas.
You can also provide new tutorials or use-cases, showing what can be done with the library. The more examples we have, the better and clearer it is for the other users.

1. Creating a new branch

Concrete-ML uses a consistent branch naming scheme, and you are expected to follow it as well. Here is the format, along with some examples:

e.g.

2. Before committing

2.1 Conformance

Each commit to Concrete-ML should conform to the standards of the project. You can let the development tools fix some issues automatically with the following command:

Conformance can be checked using the following command:

2.2 Testing

Your code must be well documented, containing tests and not breaking other tests:

You need to make sure you get 100% code coverage. The make pytest command checks that by default and will fail with a coverage report at the end should some lines of your code not be executed during testing.

If your coverage is below 100%, you should write more tests and then create the pull request. If you ignore this warning and create the PR, GitHub actions will fail and your PR will not be merged.

There may be cases where covering your code is not possible (an exception that cannot be triggered in normal execution circumstances). In those cases, you may be allowed to disable coverage for some specific lines. This should be the exception rather than the rule, and reviewers will ask why some lines are not covered. If it appears they can be covered, then the PR won't be accepted in that state.

3. Committing

Concrete-ML uses a consistent commit naming scheme, and you are expected to follow it as well (the CI will make sure you do). The accepted format can be printed to your terminal by running:

e.g.

4. Rebasing

You should rebase on top of the main branch before you create your pull request. Merge commits are not allowed, so rebasing on main before pushing gives you the best chance of avoiding having to rewrite parts of your PR later if conflicts arise with other PRs being merged. After you commit your changes to your new branch, you can use the following commands to rebase:

5. Releases

Inner workings

Importing ONNX

As ONNX is becoming the standard exchange format for neural networks, this allows Concrete-ML to be flexible while also making model representation manipulation quite easy. In addition, it allows for straight-forward mapping to NumPy operators, supported by Concrete-Numpy to use Concrete stack's FHE conversion capabilities.

Torch to NumPy conversion using ONNX

The diagram below gives an overview of the steps involved in the conversion of an ONNX graph to a FHE compatible format, i.e. a format that can be compiled to FHE through Concrete-Numpy.

All Concrete-ML built-in models follow the same pattern for FHE conversion:

The models are trained with sklearn or PyTorch
The Concrete-ML ONNX parser checks that all the operations in the ONNX graph are supported and assigns reference NumPy operations to them. This step produces a NumpyModule.
Once the QuantizedModule is built, Concrete-Numpy is used to trace the ._forward() function of the QuantizedModule.

Once an ONNX model is imported, it is converted to a NumpyModule, then to a QuantizedModule and, finally, to a FHE circuit. However, as the diagram shows, it is perfectly possible to stop at the NumpyModule level if you just want to run the PyTorch model as NumPy code without doing quantization.

Inspecting the ONNX models

Quantization tools

Quantizing data

Concrete-ML has support for quantized ML models and also provides quantization tools for Quantization Aware Training and Post-Training Quantization. The core of this functionality is the conversion of floating point values to integers and back. This is done using QuantizedArray in concrete.ml.quantization.

n_bits that defines the precision of the quantization
values are floating point values that will be converted to integers
is_signed determines if the quantized integer values should allow negative values
is_symmetric determines if the range of floating point values to be quantized should be taken as symmetric around zero

It is also possible to use symmetric quantization, where the integer values are centered around 0:

In the following example, showing the de-quantization of model outputs, the QuantizedArray class is used in a different way. Here it uses pre-quantized integer values and has the scale and zero-point set explicitly. Once the QuantizedArray is constructed, calling dequant() will compute the floating point values corresponding to the integer values qvalues, which are the output of the forward_fhe.encrypt_run_decrypt(..) call.

Quantized modules

Machine learning models are implemented with a diverse set of operations, such as convolution, linear transformations, activation functions and element-wise operations. When working with quantized values, these operations cannot be carried out in an equivalent way as for floating point values. With quantization, it is necessary to re-scale the input and output values of each operation to fit in the quantization domain.

In Concrete-ML, the quantized equivalent of a scikit-learn model or a PyTorch nn.Module is the QuantizedModule. Note that only inference is implemented in the QuantizedModule, and it is built through a conversion of the inference function of the corresponding scikit-learn or PyTorch module.

Built-in neural networks expose the quantized_module member, while a QuantizedModule is also the result of the compilation of custom models through compile_torch_model and compile_brevitas_qat_model.

Calibration is the process of determining the typical distributions of values encountered for the intermediate values of a model during inference.

Resources

concrete.ml.quantization.quantizers

module `concrete.ml.quantization.quantizers`

Quantization utilities for a numpy array/tensor.

Global Variables

STABILITY_CONST

function `fill_from_kwargs`

fill_from_kwargs(obj, klass, **kwargs)

Fill a parameter set structure from kwargs parameters.

Args:

obj: an object of type klass, if None the object is created if any of the type's members appear in the kwargs
klass: the type of object to fill
kwargs: parameter names and values to fill into an instance of the klass type

Returns:

obj: an object of type klass
kwargs: remaining parameter names and values that were not filled into obj

Raises:

TypeError: if the types of the parameters in kwargs could not be converted to the corresponding types of members of klass

class `QuantizationOptions`

Options for quantization.

Determines the number of bits for quantization and the method of quantization of the values. Signed quantization allows negative quantized values. Symmetric quantization assumes the float values are distributed symmetrically around x=0 and assigns signed values around 0 to the float values. QAT (quantization aware training) quantization assumes the values are already quantized, taking a discrete set of values, and assigns these values to integers, computing only the scale.

method `init`

__init__(
    n_bits,
    is_signed: bool = False,
    is_symmetric: bool = False,
    is_qat: bool = False
) → None

property quant_options

Get a copy of the quantization parameters.

Returns:

UniformQuantizationParameters: a copy of the current quantization parameters

method `copy_opts`

copy_opts(opts)

Copy the options from a different structure.

Args:

opts (QuantizationOptions): structure to copy parameters from.

class `MinMaxQuantizationStats`

Calibration set statistics.

This class stores the statistics for the calibration set or for a calibration data batch. Currently we only store min/max to determine the quantization range. The min/max are computed from the calibration set.

property quant_stats

Get a copy of the calibration set statistics.

Returns:

MinMaxQuantizationStats: a copy of the current quantization stats

method `compute_quantization_stats`

compute_quantization_stats(values: ndarray) → None

Compute the calibration set quantization statistics.

Args:

values (numpy.ndarray): Calibration set on which to compute statistics.

method `copy_stats`

copy_stats(stats) → None

Copy the statistics from a different structure.

Args:

stats (MinMaxQuantizationStats): structure to copy statistics from.

class `UniformQuantizationParameters`

Quantization parameters for uniform quantization.

This class stores the parameters used for quantizing real values to discrete integer values. The parameters are computed from quantization options and quantization statistics.

property quant_params

Get a copy of the quantization parameters.

Returns:

UniformQuantizationParameters: a copy of the current quantization parameters

method `compute_quantization_parameters`

compute_quantization_parameters(
    options: QuantizationOptions,
    stats: MinMaxQuantizationStats
) → None

Compute the quantization parameters.

Args:

options (QuantizationOptions): quantization options set
stats (MinMaxQuantizationStats): calibrated statistics for quantization

method `copy_params`

copy_params(params) → None

Copy the parameters from a different structure.

Args:

params (UniformQuantizationParameters): parameter structure to copy

class `UniformQuantizer`

Uniform quantizer.

Contains all information necessary for uniform quantization and provides quantization/dequantization functionality on numpy arrays.

Args:

options (QuantizationOptions): Quantization options set
stats (Optional[MinMaxQuantizationStats]): Quantization batch statistics set
params (Optional[UniformQuantizationParameters]): Quantization parameters set (scale, zero-point)

method `init`

__init__(
    options: QuantizationOptions = None,
    stats: Optional[MinMaxQuantizationStats] = None,
    params: Optional[UniformQuantizationParameters] = None,
    **kwargs
)

property quant_options

Get a copy of the quantization parameters.

Returns:

UniformQuantizationParameters: a copy of the current quantization parameters

property quant_params

Get a copy of the quantization parameters.

Returns:

UniformQuantizationParameters: a copy of the current quantization parameters

property quant_stats

Get a copy of the calibration set statistics.

Returns:

MinMaxQuantizationStats: a copy of the current quantization stats

method `compute_quantization_parameters`

compute_quantization_parameters(
    options: QuantizationOptions,
    stats: MinMaxQuantizationStats
) → None

Compute the quantization parameters.

Args:

options (QuantizationOptions): quantization options set
stats (MinMaxQuantizationStats): calibrated statistics for quantization

method `compute_quantization_stats`

compute_quantization_stats(values: ndarray) → None

Compute the calibration set quantization statistics.

Args:

values (numpy.ndarray): Calibration set on which to compute statistics.

method `copy_opts`

copy_opts(opts)

Copy the options from a different structure.

Args:

opts (QuantizationOptions): structure to copy parameters from.

method `copy_params`

copy_params(params) → None

Copy the parameters from a different structure.

Args:

params (UniformQuantizationParameters): parameter structure to copy

method `copy_stats`

copy_stats(stats) → None

Copy the statistics from a different structure.

Args:

stats (MinMaxQuantizationStats): structure to copy statistics from.

method `dequant`

dequant(qvalues: ndarray) → ndarray

Dequantize values.

Args:

qvalues (numpy.ndarray): integer values to dequantize

Returns:

numpy.ndarray: Dequantized float values.

method `quant`

quant(values: ndarray) → ndarray

Quantize values.

Args:

values (numpy.ndarray): float values to quantize

Returns:

numpy.ndarray: Integer quantized values.

class `QuantizedArray`

Abstraction of quantized array.

Contains float values and their quantized integer counter-parts. Quantization is performed by the quantizer member object. Float and int values are kept in sync. Having both types of values is useful since quantized operators in Concrete ML graphs might need one or the other depending on how the operator works (in float or in int). Moreover, when the encrypted function needs to return a value, it must return integer values.

See https://arxiv.org/abs/1712.05877.

Args:

values (numpy.ndarray): Values to be quantized.
n_bits (int): The number of bits to use for quantization.
value_is_float (bool, optional): Whether the passed values are real (float) values or not. If False, the values will be quantized according to the passed scale and zero_point. Defaults to True.
options (QuantizationOptions): Quantization options set
stats (Optional[MinMaxQuantizationStats]): Quantization batch statistics set
params (Optional[UniformQuantizationParameters]): Quantization parameters set (scale, zero-point)
kwargs: Any member of the options, stats, params sets as a key-value pair. The parameter sets need to be completely parametrized if their members appear in kwargs.

method `init`

__init__(
    n_bits,
    values: Optional[ndarray],
    value_is_float: bool = True,
    options: QuantizationOptions = None,
    stats: Optional[MinMaxQuantizationStats] = None,
    params: Optional[UniformQuantizationParameters] = None,
    **kwargs
)

method `dequant`

dequant() → ndarray

Dequantize self.qvalues.

Returns:

numpy.ndarray: Dequantized values.

method `quant`

quant() → Union[ndarray, NoneType]

Quantize self.values.

Returns:

numpy.ndarray: Quantized values.

method `update_quantized_values`

update_quantized_values(qvalues: ndarray) → ndarray

Update qvalues to get their corresponding values using the related quantized parameters.

Args:

qvalues (numpy.ndarray): Values to replace self.qvalues

Returns:

values (numpy.ndarray): Corresponding values

method `update_values`

update_values(values: ndarray) → ndarray

Update values to get their corresponding qvalues using the related quantized parameters.

Args:

values (numpy.ndarray): Values to replace self.values

Returns:

qvalues (numpy.ndarray): Corresponding qvalues

concrete.ml.onnx.ops_impl

module `concrete.ml.onnx.ops_impl`

ONNX ops implementation in python + numpy.

function `cast_to_float`

cast_to_float(inputs)

Cast values to floating points.

Args:

inputs (Tuple[numpy.ndarray]): The values to consider.

Returns:

Tuple[numpy.ndarray]: The float values.

function `onnx_func_raw_args`

onnx_func_raw_args(*args)

Decorate a numpy onnx function to flag the raw/non quantized inputs.

Args:

*args (tuple[Any]): function argument names

Returns:

result (ONNXMixedFunction): wrapped numpy function with a list of mixed arguments

function `numpy_where_body`

numpy_where_body(c: ndarray, t: ndarray, f: Union[ndarray, int]) → ndarray

Compute the equivalent of numpy.where.

This function is not mapped to any ONNX operator (as opposed to numpy_where). It is usable by functions which are mapped to ONNX operators, e.g. numpy_div or numpy_where.

Args:

c (numpy.ndarray): Condition operand.
t (numpy.ndarray): True operand.
f (numpy.ndarray): False operand.

Returns:

numpy.ndarray: numpy.where(c, t, f)

function `numpy_where`

numpy_where(c: ndarray, t: ndarray, f: ndarray) → Tuple[ndarray]

Compute the equivalent of numpy.where.

Args:

c (numpy.ndarray): Condition operand.
t (numpy.ndarray): True operand.
f (numpy.ndarray): False operand.

Returns:

numpy.ndarray: numpy.where(c, t, f)

function `numpy_add`

numpy_add(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute add in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Add-13

Args:

a (numpy.ndarray): First operand.
b (numpy.ndarray): Second operand.

Returns:

Tuple[numpy.ndarray]: Result, has same element type as two inputs

function `numpy_constant`

numpy_constant(**kwargs)

Return the constant passed as a kwarg.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Constant-13

Args:

**kwargs: keyword arguments

Returns:

Any: The stored constant.

function `numpy_matmul`

numpy_matmul(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute matmul in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#MatMul-13

Args:

a (numpy.ndarray): N-dimensional matrix A
b (numpy.ndarray): N-dimensional matrix B

Returns:

Tuple[numpy.ndarray]: Matrix multiply results from A * B

function `numpy_relu`

numpy_relu(x: ndarray) → Tuple[ndarray]

Compute relu in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Relu-14

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_sigmoid`

numpy_sigmoid(x: ndarray) → Tuple[ndarray]

Compute sigmoid in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Sigmoid-13

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_softmax`

numpy_softmax(x, axis=1, keepdims=True)

Compute softmax in numpy according to ONNX spec.

Softmax is currently not supported in FHE.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#softmax-13

Args:

x (numpy.ndarray): Input tensor
axis (None, int, tuple of ints): Axis or axes along which a softmax's sum is performed. If None, it will sum all of the elements of the input array. If axis is negative it counts from the last to the first axis. Default to 1.
keepdims (bool): If True, the axes which are reduced along the sum are left in the result as dimensions with size one. Default to True.

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_cos`

numpy_cos(x: ndarray) → Tuple[ndarray]

Compute cos in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Cos-7

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_cosh`

numpy_cosh(x: ndarray) → Tuple[ndarray]

Compute cosh in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Cosh-9

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_sin`

numpy_sin(x: ndarray) → Tuple[ndarray]

Compute sin in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Sin-7

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_sinh`

numpy_sinh(x: ndarray) → Tuple[ndarray]

Compute sinh in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Sinh-9

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_tan`

numpy_tan(x: ndarray) → Tuple[ndarray]

Compute tan in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Tan-7

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_tanh`

numpy_tanh(x: ndarray) → Tuple[ndarray]

Compute tanh in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Tanh-13

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_acos`

numpy_acos(x: ndarray) → Tuple[ndarray]

Compute acos in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Acos-7

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_acosh`

numpy_acosh(x: ndarray) → Tuple[ndarray]

Compute acosh in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Acosh-9

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_asin`

numpy_asin(x: ndarray) → Tuple[ndarray]

Compute asin in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Asin-7

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_asinh`

numpy_asinh(x: ndarray) → Tuple[ndarray]

Compute sinh in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Asinh-9

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_atan`

numpy_atan(x: ndarray) → Tuple[ndarray]

Compute atan in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Atan-7

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_atanh`

numpy_atanh(x: ndarray) → Tuple[ndarray]

Compute atanh in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Atanh-9

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_elu`

numpy_elu(x: ndarray, alpha: float = 1) → Tuple[ndarray]

Compute elu in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Elu-6

Args:

x (numpy.ndarray): Input tensor
alpha (float): Coefficient

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_selu`

numpy_selu(
    x: ndarray,
    alpha: float = 1.6732632423543772,
    gamma: float = 1.0507009873554805
) → Tuple[ndarray]

Compute selu in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Selu-6

Args:

x (numpy.ndarray): Input tensor
alpha (float): Coefficient
gamma (float): Coefficient

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_celu`

numpy_celu(x: ndarray, alpha: float = 1) → Tuple[ndarray]

Compute celu in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Celu-12

Args:

x (numpy.ndarray): Input tensor
alpha (float): Coefficient

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_leakyrelu`

numpy_leakyrelu(x: ndarray, alpha: float = 0.01) → Tuple[ndarray]

Compute leakyrelu in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#LeakyRelu-6

Args:

x (numpy.ndarray): Input tensor
alpha (float): Coefficient

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_thresholdedrelu`

numpy_thresholdedrelu(x: ndarray, alpha: float = 1) → Tuple[ndarray]

Compute thresholdedrelu in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#ThresholdedRelu-10

Args:

x (numpy.ndarray): Input tensor
alpha (float): Coefficient

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_hardsigmoid`

numpy_hardsigmoid(
    x: ndarray,
    alpha: float = 0.2,
    beta: float = 0.5
) → Tuple[ndarray]

Compute hardsigmoid in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#HardSigmoid-6

Args:

x (numpy.ndarray): Input tensor
alpha (float): Coefficient
beta (float): Coefficient

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_softplus`

numpy_softplus(x: ndarray) → Tuple[ndarray]

Compute softplus in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Softplus-1

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_abs`

numpy_abs(x: ndarray) → Tuple[ndarray]

Compute abs in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Abs-13

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_div`

numpy_div(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute div in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Div-14

Args:

a (numpy.ndarray): Input tensor
b (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_mul`

numpy_mul(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute mul in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Mul-14

Args:

a (numpy.ndarray): Input tensor
b (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_sub`

numpy_sub(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute sub in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Sub-14

Args:

a (numpy.ndarray): Input tensor
b (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_log`

numpy_log(x: ndarray) → Tuple[ndarray]

Compute log in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Log-13

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_erf`

numpy_erf(x: ndarray) → Tuple[ndarray]

Compute erf in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Erf-13

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_hardswish`

numpy_hardswish(x: ndarray) → Tuple[ndarray]

Compute hardswish in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#hardswish-14

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_exp`

numpy_exp(x: ndarray) → Tuple[ndarray]

Compute exponential in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Exp-13

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: The exponential of the input tensor computed element-wise

function `numpy_equal`

numpy_equal(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute equal in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Equal-11

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_not`

numpy_not(x: ndarray) → Tuple[ndarray]

Compute not in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Not-1

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_not_float`

numpy_not_float(x: ndarray) → Tuple[ndarray]

Compute not in numpy according to ONNX spec and cast outputs to floats.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Not-1

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_greater`

numpy_greater(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute greater in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Greater-13

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_greater_float`

numpy_greater_float(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute greater in numpy according to ONNX spec and cast outputs to floats.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Greater-13

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_greater_or_equal`

numpy_greater_or_equal(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute greater or equal in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#GreaterOrEqual-12

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_greater_or_equal_float`

numpy_greater_or_equal_float(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute greater or equal in numpy according to ONNX specs and cast outputs to floats.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#GreaterOrEqual-12

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_less`

numpy_less(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute less in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Less-13

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_less_float`

numpy_less_float(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute less in numpy according to ONNX spec and cast outputs to floats.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Less-13

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_less_or_equal`

numpy_less_or_equal(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute less or equal in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#LessOrEqual-12

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_less_or_equal_float`

numpy_less_or_equal_float(x: ndarray, y: ndarray) → Tuple[ndarray]

Compute less or equal in numpy according to ONNX spec and cast outputs to floats.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#LessOrEqual-12

Args:

x (numpy.ndarray): Input tensor
y (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_identity`

numpy_identity(x: ndarray) → Tuple[ndarray]

Compute identity in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Identity-14

Args:

x (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_transpose`

numpy_transpose(x: ndarray, perm=None) → Tuple[ndarray]

Transpose in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Transpose-13

Args:

x (numpy.ndarray): Input tensor
perm (numpy.ndarray): Permutation of the axes

Returns:

Tuple[numpy.ndarray]: Output tensor

function `torch_avgpool`

torch_avgpool(
    x: ndarray,
    ceil_mode: int,
    kernel_shape: Tuple[int, ],
    pads: Tuple[int, ],
    strides: Tuple[int, ]
) → Tuple[ndarray]

Compute Average Pooling using Torch.

Currently supports 2d average pooling with torch semantics. This function is ONNX compatible.

See: https://github.com/onnx/onnx/blob/release/0.4.x/docs/Operators.md#AveragePool

Args:

x (numpy.ndarray): input data (many dtypes are supported). Shape is N x C x H x W for 2d
ceil_mode (int): ONNX rounding parameter, expected 0 (torch style dimension computation)
kernel_shape (Tuple[int]): shape of the kernel. Should have 2 elements for 2d conv
pads (Tuple[int]): padding in ONNX format (begin, end) on each axis
strides (Tuple[int]): stride of the convolution on each axis

Returns:

res (numpy.ndarray): a tensor of size (N x InChannels x OutHeight x OutWidth).
See https: //pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html

Raises:

AssertionError: if the pooling arguments are wrong

function `numpy_cast`

numpy_cast(data: ndarray, to: int) → Tuple[ndarray]

Execute ONNX cast in Numpy.

Supports only booleans for now, which are converted to integers.

See: https://github.com/onnx/onnx/blob/release/0.4.x/docs/Operators.md#Cast

Args:

data (numpy.ndarray): Input encrypted tensor
to (int): integer value of the onnx.TensorProto DataType enum

Returns:

result (numpy.ndarray): a tensor with the required data type

function `numpy_batchnorm`

numpy_batchnorm(
    x: ndarray,
    scale: ndarray,
    bias: ndarray,
    input_mean: ndarray,
    input_var: ndarray,
    epsilon=1e-05,
    momentum=0.9,
    training_mode=0
) → Tuple[ndarray]

Compute the batch normalization of the input tensor.

This can be expressed as:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#BatchNormalization-14

Args:

x (numpy.ndarray): tensor to normalize, dimensions are in the form of (N,C,D1,D2,...,Dn), where N is the batch size, C is the number of channels.
scale (numpy.ndarray): scale tensor of shape (C,)
bias (numpy.ndarray): bias tensor of shape (C,)
input_mean (numpy.ndarray): mean values to use for each input channel, shape (C,)
input_var (numpy.ndarray): variance values to use for each input channel, shape (C,)
epsilon (float): avoids division by zero
momentum (float): momentum used during training of the mean/variance, not used in inference
training_mode (int): if the model was exported in training mode this is set to 1, else 0

Returns:

numpy.ndarray: Normalized tensor

function `numpy_flatten`

numpy_flatten(x: ndarray, axis: int = 1) → Tuple[ndarray]

Flatten a tensor into a 2d array.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Flatten-13.

Args:

x (numpy.ndarray): tensor to flatten
axis (int): axis after which all dimensions will be flattened (axis=0 gives a 1D output)

Returns:

result: flattened tensor

function `numpy_or`

numpy_or(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute or in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Or-7

Args:

a (numpy.ndarray): Input tensor
b (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_or_float`

numpy_or_float(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute or in numpy according to ONNX spec and cast outputs to floats.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Or-7

Args:

a (numpy.ndarray): Input tensor
b (numpy.ndarray): Input tensor

Returns:

Tuple[numpy.ndarray]: Output tensor

function `numpy_round`

numpy_round(a: ndarray) → Tuple[ndarray]

Compute round in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Round-11 Remark that ONNX Round operator is actually a rint, since the number of decimals is forced to be 0

Args:

a (numpy.ndarray): Input tensor whose elements to be rounded.

Returns:

Tuple[numpy.ndarray]: Output tensor with rounded input elements.

function `numpy_pow`

numpy_pow(a: ndarray, b: ndarray) → Tuple[ndarray]

Compute pow in numpy according to ONNX spec.

See https://github.com/onnx/onnx/blob/release/0.4.x/docs/Changelog.md#Pow-13

Args:

a (numpy.ndarray): Input tensor whose elements to be raised.
b (numpy.ndarray): The power to which we want to raise.

Returns:

Tuple[numpy.ndarray]: Output tensor.

class `ONNXMixedFunction`

A mixed quantized-raw valued onnx function.

ONNX functions will take inputs which can be either quantized or float. Some functions only take quantized inputs, but some functions take both types. For mixed functions we need to tag the parameters that do not need quantization. Thus quantized ops can know which inputs are not QuantizedArray and we avoid unnecessary wrapping of float values as QuantizedArrays.

method `init`

__init__(function, non_quant_params: Set[str])

Create the mixed function and raw parameter list.

Args:

function (Any): function to be decorated
non_quant_params: Set[str]: set of parameters that will not be quantized (stored as numpy.ndarray)

concrete.ml.sklearn.glm

module `concrete.ml.sklearn.glm`

Implement sklearn's Generalized Linear Models (GLM).

class `PoissonRegressor`

A Poisson regression model with FHE.

method `init`

__init__(
    n_bits: 'Union[int, dict]' = 2,
    alpha: 'float' = 1.0,
    fit_intercept: 'bool' = True,
    max_iter: 'int' = 100,
    tol: 'float' = 0.0001,
    warm_start: 'bool' = False,
    verbose: 'int' = 0
)

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property input_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property onnx_model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `fit`

fit(X, y: 'ndarray', *args, **kwargs) → None

Fit the GLM regression quantized model.

Args:

X : The training data, which can be: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): The target data.
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

method `post_processing`

post_processing(
    y_preds: 'ndarray',
    already_dequantized: 'bool' = False
) → ndarray

Post-processing the predictions.

Args:

y_preds (numpy.ndarray): The predictions to post-process.
already_dequantized (bool): Wether the inputs were already dequantized or not. Default to False.

Returns:

numpy.ndarray: The post-processed predictions.

method `predict`

predict(X: 'ndarray', execute_in_fhe: 'bool' = False) → ndarray

Predict on user data.

Predict on user data using either the quantized clear model, implemented with tensors, or, if execute_in_fhe is set, using the compiled FHE circuit.

Args:

X (numpy.ndarray): The input data.
execute_in_fhe (bool): Whether to execute the inference in FHE. Default to False.

Returns:

numpy.ndarray: The model's predictions.

class `GammaRegressor`

A Gamma regression model with FHE.

method `init`

__init__(
    n_bits: 'Union[int, dict]' = 2,
    alpha: 'float' = 1.0,
    fit_intercept: 'bool' = True,
    max_iter: 'int' = 100,
    tol: 'float' = 0.0001,
    warm_start: 'bool' = False,
    verbose: 'int' = 0
)

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property input_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property onnx_model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `fit`

fit(X, y: 'ndarray', *args, **kwargs) → None

Fit the GLM regression quantized model.

Args:

X : The training data, which can be: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): The target data.
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

method `post_processing`

post_processing(
    y_preds: 'ndarray',
    already_dequantized: 'bool' = False
) → ndarray

Post-processing the predictions.

Args:

y_preds (numpy.ndarray): The predictions to post-process.
already_dequantized (bool): Wether the inputs were already dequantized or not. Default to False.

Returns:

numpy.ndarray: The post-processed predictions.

method `predict`

predict(X: 'ndarray', execute_in_fhe: 'bool' = False) → ndarray

Predict on user data.

Predict on user data using either the quantized clear model, implemented with tensors, or, if execute_in_fhe is set, using the compiled FHE circuit.

Args:

X (numpy.ndarray): The input data.
execute_in_fhe (bool): Whether to execute the inference in FHE. Default to False.

Returns:

numpy.ndarray: The model's predictions.

class `TweedieRegressor`

A Tweedie regression model with FHE.

method `init`

__init__(
    n_bits: 'Union[int, dict]' = 2,
    power: 'float' = 0.0,
    alpha: 'float' = 1.0,
    fit_intercept: 'bool' = True,
    link: 'str' = 'auto',
    max_iter: 'int' = 100,
    tol: 'float' = 0.0001,
    warm_start: 'bool' = False,
    verbose: 'int' = 0
)

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property input_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property onnx_model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `fit`

fit(X, y: 'ndarray', *args, **kwargs) → None

Fit the GLM regression quantized model.

Args:

X : The training data, which can be: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): The target data.
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

method `post_processing`

post_processing(
    y_preds: 'ndarray',
    already_dequantized: 'bool' = False
) → ndarray

Post-processing the predictions.

Args:

y_preds (numpy.ndarray): The predictions to post-process.
already_dequantized (bool): Wether the inputs were already dequantized or not. Default to False.

Returns:

numpy.ndarray: The post-processed predictions.

method `predict`

predict(X: 'ndarray', execute_in_fhe: 'bool' = False) → ndarray

Predict on user data.

Predict on user data using either the quantized clear model, implemented with tensors, or, if execute_in_fhe is set, using the compiled FHE circuit.

Args:

X (numpy.ndarray): The input data.
execute_in_fhe (bool): Whether to execute the inference in FHE. Default to False.

Returns:

numpy.ndarray: The model's predictions.

concrete.ml.sklearn.protocols

module `concrete.ml.sklearn.protocols`

Protocols.

Protocols are used to mix type hinting with duck-typing. Indeed we don't always want to have an abstract parent class between all objects. We are more interested in the behavior of such objects. Implementing a Protocol is a way to specify the behavior of objects.

To read more about Protocol please read: https://peps.python.org/pep-0544

class `Quantizer`

Quantizer Protocol.

To use to type hint a quantizer.

method `dequant`

dequant(X: 'ndarray') → ndarray

Dequantize some values.

Args:

X (numpy.ndarray): Values to dequantize

.. # noqa: DAR202

Returns:

numpy.ndarray: Dequantized values

method `quant`

quant(values: 'ndarray') → ndarray

Quantize some values.

Args:

values (numpy.ndarray): Values to quantize

.. # noqa: DAR202

Returns:

numpy.ndarray: The quantized values

class `ConcreteBaseEstimatorProtocol`

A Concrete Estimator Protocol.

property onnx_model

onnx_model.

.. # noqa: DAR202

Results: onnx.ModelProto

property quantize_input

Quantize input function.

method `compile`

compile(
    X: 'ndarray',
    configuration: 'Optional[Configuration]',
    compilation_artifacts: 'Optional[DebugArtifacts]',
    show_mlir: 'bool',
    use_virtual_lib: 'bool',
    p_error: 'float'
) → Circuit

Compiles a model to a FHE Circuit.

Args:

X (numpy.ndarray): the dequantized dataset
configuration (Optional[Configuration]): the options for compilation
compilation_artifacts (Optional[DebugArtifacts]): artifacts object to fill during compilation
show_mlir (bool): whether or not to show MLIR during the compilation
use_virtual_lib (bool): whether to compile using the virtual library that allows higher bitwidths
p_error (float): probability of error of a PBS

.. # noqa: DAR202

Returns:

Circuit: the compiled Circuit.

method `fit`

fit(X: 'ndarray', y: 'ndarray', **fit_params) → ConcreteBaseEstimatorProtocol

Initialize and fit the module.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
**fit_params: additional parameters that can be used during training

.. # noqa: DAR202

Returns:

ConcreteBaseEstimatorProtocol: the trained estimator

method `fit_benchmark`

fit_benchmark(
    X: 'ndarray',
    y: 'ndarray',
    *args,
    **kwargs
) → Tuple[ConcreteBaseEstimatorProtocol, BaseEstimator]

Fit the quantized estimator and return reference estimator.

This function returns both the quantized estimator (itself), but also a wrapper around the non-quantized trained NN. This is useful in order to compare performance between the quantized and fp32 versions of the classifier

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
*args: The arguments to pass to the underlying model.
**kwargs: The keyword arguments to pass to the underlying model.

.. # noqa: DAR202

Returns:

self: self fitted
model: underlying estimator

method `post_processing`

post_processing(y_preds: 'ndarray') → ndarray

Post-process models predictions.

Args:

y_preds (numpy.ndarray): predicted values by model (clear-quantized)

.. # noqa: DAR202

Returns:

numpy.ndarray: the post-processed predictions

class `ConcreteBaseClassifierProtocol`

Concrete classifier protocol.

property onnx_model

onnx_model.

.. # noqa: DAR202

Results: onnx.ModelProto

property quantize_input

Quantize input function.

method `compile`

compile(
    X: 'ndarray',
    configuration: 'Optional[Configuration]',
    compilation_artifacts: 'Optional[DebugArtifacts]',
    show_mlir: 'bool',
    use_virtual_lib: 'bool',
    p_error: 'float'
) → Circuit

Compiles a model to a FHE Circuit.

Args:

X (numpy.ndarray): the dequantized dataset
configuration (Optional[Configuration]): the options for compilation
compilation_artifacts (Optional[DebugArtifacts]): artifacts object to fill during compilation
show_mlir (bool): whether or not to show MLIR during the compilation
use_virtual_lib (bool): whether to compile using the virtual library that allows higher bitwidths
p_error (float): probability of error of a PBS

.. # noqa: DAR202

Returns:

Circuit: the compiled Circuit.

method `fit`

fit(X: 'ndarray', y: 'ndarray', **fit_params) → ConcreteBaseEstimatorProtocol

Initialize and fit the module.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
**fit_params: additional parameters that can be used during training

.. # noqa: DAR202

Returns:

ConcreteBaseEstimatorProtocol: the trained estimator

method `fit_benchmark`

fit_benchmark(
    X: 'ndarray',
    y: 'ndarray',
    *args,
    **kwargs
) → Tuple[ConcreteBaseEstimatorProtocol, BaseEstimator]

Fit the quantized estimator and return reference estimator.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
*args: The arguments to pass to the underlying model.
**kwargs: The keyword arguments to pass to the underlying model.

.. # noqa: DAR202

Returns:

self: self fitted
model: underlying estimator

method `post_processing`

post_processing(y_preds: 'ndarray') → ndarray

Post-process models predictions.

Args:

y_preds (numpy.ndarray): predicted values by model (clear-quantized)

.. # noqa: DAR202

Returns:

numpy.ndarray: the post-processed predictions

method `predict`

predict(X: 'ndarray', execute_in_fhe: 'bool') → ndarray

Predicts for each sample the class with highest probability.

Args:

X (numpy.ndarray): Features
execute_in_fhe (bool): Whether the inference should be done in fhe or not.

.. # noqa: DAR202

Returns: numpy.ndarray

method `predict_proba`

predict_proba(X: 'ndarray', execute_in_fhe: 'bool') → ndarray

Predicts for each sample the probability of each class.

Args:

X (numpy.ndarray): Features
execute_in_fhe (bool): Whether the inference should be done in fhe or not.

.. # noqa: DAR202

Returns: numpy.ndarray

class `ConcreteBaseRegressorProtocol`

Concrete regressor protocol.

property onnx_model

onnx_model.

.. # noqa: DAR202

Results: onnx.ModelProto

property quantize_input

Quantize input function.

method `compile`

compile(
    X: 'ndarray',
    configuration: 'Optional[Configuration]',
    compilation_artifacts: 'Optional[DebugArtifacts]',
    show_mlir: 'bool',
    use_virtual_lib: 'bool',
    p_error: 'float'
) → Circuit

Compiles a model to a FHE Circuit.

Args:

X (numpy.ndarray): the dequantized dataset
configuration (Optional[Configuration]): the options for compilation
compilation_artifacts (Optional[DebugArtifacts]): artifacts object to fill during compilation
show_mlir (bool): whether or not to show MLIR during the compilation
use_virtual_lib (bool): whether to compile using the virtual library that allows higher bitwidths
p_error (float): probability of error of a PBS

.. # noqa: DAR202

Returns:

Circuit: the compiled Circuit.

method `fit`

fit(X: 'ndarray', y: 'ndarray', **fit_params) → ConcreteBaseEstimatorProtocol

Initialize and fit the module.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
**fit_params: additional parameters that can be used during training

.. # noqa: DAR202

Returns:

ConcreteBaseEstimatorProtocol: the trained estimator

method `fit_benchmark`

fit_benchmark(
    X: 'ndarray',
    y: 'ndarray',
    *args,
    **kwargs
) → Tuple[ConcreteBaseEstimatorProtocol, BaseEstimator]

Fit the quantized estimator and return reference estimator.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
*args: The arguments to pass to the underlying model.
**kwargs: The keyword arguments to pass to the underlying model.

.. # noqa: DAR202

Returns:

self: self fitted
model: underlying estimator

method `post_processing`

post_processing(y_preds: 'ndarray') → ndarray

Post-process models predictions.

Args:

y_preds (numpy.ndarray): predicted values by model (clear-quantized)

.. # noqa: DAR202

Returns:

numpy.ndarray: the post-processed predictions

method `predict`

predict(X: 'ndarray', execute_in_fhe: 'bool') → ndarray

Predicts for each sample the expected value.

Args:

X (numpy.ndarray): Features
execute_in_fhe (bool): Whether the inference should be done in fhe or not.

.. # noqa: DAR202

Returns: numpy.ndarray

concrete.ml.sklearn.base

module `concrete.ml.sklearn.base`

Module that contains base classes for our libraries estimators.

Global Variables

DEFAULT_P_ERROR_PBS
OPSET_VERSION_FOR_ONNX_EXPORT

class `QuantizedTorchEstimatorMixin`

Mixin that provides quantization for a torch module and follows the Estimator API.

This class should be mixed in with another that provides the full Estimator API. This class only provides modifiers for .fit() (with quantization) and .predict() (optionally in FHE)

method `init`

__init__()

property base_estimator_type

Get the sklearn estimator that should be trained by the child class.

property base_module_to_compile

Get the Torch module that should be compiled to FHE.

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property input_quantizers

Get the input quantizers.

Returns:

List[Quantizer]: the input quantizers

property n_bits_quant

Get the number of quantization bits.

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

_onnx_model_ (onnx.ModelProto): the ONNX model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `compile`

compile(
    X: ndarray,
    configuration: Optional[Configuration] = None,
    compilation_artifacts: Optional[DebugArtifacts] = None,
    show_mlir: bool = False,
    use_virtual_lib: bool = False,
    p_error: Optional[float] = 6.3342483999973e-05
) → Circuit

Compile the model.

Args:

X (numpy.ndarray): the dequantized dataset
configuration (Optional[Configuration]): the options for compilation
compilation_artifacts (Optional[DebugArtifacts]): artifacts object to fill during compilation
show_mlir (bool): whether or not to show MLIR during the compilation
use_virtual_lib (bool): whether to compile using the virtual library that allows higher bitwidths
p_error (Optional[float]): probability of error of a PBS

Returns:

Circuit: the compiled Circuit.

Raises:

ValueError: if called before the model is trained

method `fit`

fit(X, y, **fit_params)

Initialize and fit the module.

If the module was already initialized, by calling fit, the module will be re-initialized (unless warm_start is True). In addition to the torch training step, this method performs quantization of the trained torch model.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
**fit_params: additional parameters that can be used during training, these are passed to the torch training interface

Returns:

self: the trained quantized estimator

method `fit_benchmark`

fit_benchmark(X: ndarray, y: ndarray, *args, **kwargs) → Tuple[Any, Any]

Fit the quantized estimator and return reference estimator.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): labels associated with training data
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

Returns:

self: the trained quantized estimator
fp32_model: trained raw (fp32) wrapped NN estimator

method `get_params_for_benchmark`

get_params_for_benchmark()

Get the parameters to instantiate the sklearn estimator trained by the child class.

Returns:

params (dict): dictionary with parameters that will initialize a new Estimator

method `post_processing`

post_processing(y_preds: ndarray) → ndarray

Post-processing the output.

Args:

y_preds (numpy.ndarray): the output to post-process

Raises:

ValueError: if unknown post-processing function

Returns:

numpy.ndarray: the post-processed output

method `predict`

predict(X, execute_in_fhe=False)

Predict on user provided data.

Predicts using the quantized clear or FHE classifier

Args:

X : input data, a numpy array of raw values (non quantized)
execute_in_fhe : whether to execute the inference in FHE or in the clear

Returns:

y_pred : numpy ndarray with predictions

method `predict_proba`

predict_proba(X, execute_in_fhe=False)

Predict on user provided data, returning probabilities.

Predicts using the quantized clear or FHE classifier

Args:

X : input data, a numpy array of raw values (non quantized)
execute_in_fhe : whether to execute the inference in FHE or in the clear

Returns:

y_pred : numpy ndarray with probabilities (if applicable)

Raises:

ValueError: if the estimator was not yet trained or compiled

class `BaseTreeEstimatorMixin`

Mixin class for tree-based estimators.

A place to share methods that are used on all tree-based estimators.

method `init`

__init__(n_bits: int)

Initialize the TreeBasedEstimatorMixin.

Args:

n_bits (int): number of bits used for quantization

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

onnx.ModelProto: the ONNX model

method `compile`

compile(
    X: ndarray,
    configuration: Optional[Configuration] = None,
    compilation_artifacts: Optional[DebugArtifacts] = None,
    show_mlir: bool = False,
    use_virtual_lib: bool = False,
    p_error: Optional[float] = 6.3342483999973e-05
) → Circuit

Compile the model.

Args:

X (numpy.ndarray): the dequantized dataset
configuration (Optional[Configuration]): the options for compilation
compilation_artifacts (Optional[DebugArtifacts]): artifacts object to fill during compilation
show_mlir (bool): whether or not to show MLIR during the compilation
use_virtual_lib (bool): set to True to use the so called virtual lib simulating FHE computation. Defaults to False
p_error (Optional[float]): probability of error of a PBS

Returns:

Circuit: the compiled Circuit.

method `dequantize_output`

dequantize_output(y_preds: ndarray)

Dequantize the integer predictions.

Args:

y_preds (numpy.ndarray): the predictions

Returns: the dequantized predictions

method `fit_benchmark`

fit_benchmark(
    X: ndarray,
    y: ndarray,
    *args,
    random_state: Optional[int] = None,
    **kwargs
) → Tuple[Any, Any]

Fit the sklearn tree-based model and the FHE tree-based model.

Args:

X (numpy.ndarray): The input data.
y (numpy.ndarray): The target data. random_state (Optional[Union[int, numpy.random.RandomState, None]]): The random state. Defaults to None.
*args: args for super().fit
**kwargs: kwargs for super().fit

Returns: Tuple[ConcreteEstimators, SklearnEstimators]: The FHE and sklearn tree-based models.

method `quantize_input`

quantize_input(X: ndarray)

Quantize the input.

Args:

X (numpy.ndarray): the input

Returns: the quantized input

class `BaseTreeRegressorMixin`

Mixin class for tree-based regressors.

A place to share methods that are used on all tree-based regressors.

method `init`

__init__(n_bits: int)

Initialize the TreeBasedEstimatorMixin.

Args:

n_bits (int): number of bits used for quantization

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

onnx.ModelProto: the ONNX model

method `compile`

compile(
    X: ndarray,
    configuration: Optional[Configuration] = None,
    compilation_artifacts: Optional[DebugArtifacts] = None,
    show_mlir: bool = False,
    use_virtual_lib: bool = False,
    p_error: Optional[float] = 6.3342483999973e-05
) → Circuit

Compile the model.

Args:

X (numpy.ndarray): the dequantized dataset
configuration (Optional[Configuration]): the options for compilation
compilation_artifacts (Optional[DebugArtifacts]): artifacts object to fill during compilation
show_mlir (bool): whether or not to show MLIR during the compilation
use_virtual_lib (bool): set to True to use the so called virtual lib simulating FHE computation. Defaults to False
p_error (Optional[float]): probability of error of a PBS

Returns:

Circuit: the compiled Circuit.

method `dequantize_output`

dequantize_output(y_preds: ndarray)

Dequantize the integer predictions.

Args:

y_preds (numpy.ndarray): the predictions

Returns: the dequantized predictions

method `fit`

fit(X, y: ndarray, **kwargs) → Any

Fit the tree-based estimator.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): The target data.
**kwargs: args for super().fit

Returns:

Any: The fitted model.

method `fit_benchmark`

fit_benchmark(
    X: ndarray,
    y: ndarray,
    *args,
    random_state: Optional[int] = None,
    **kwargs
) → Tuple[Any, Any]

Fit the sklearn tree-based model and the FHE tree-based model.

Args:

X (numpy.ndarray): The input data.
y (numpy.ndarray): The target data. random_state (Optional[Union[int, numpy.random.RandomState, None]]): The random state. Defaults to None.
*args: args for super().fit
**kwargs: kwargs for super().fit

Returns: Tuple[ConcreteEstimators, SklearnEstimators]: The FHE and sklearn tree-based models.

method `post_processing`

post_processing(y_preds: ndarray) → ndarray

Apply post-processing to the predictions.

Args:

y_preds (numpy.ndarray): The predictions.

Returns:

numpy.ndarray: The post-processed predictions.

method `predict`

predict(X: ndarray, execute_in_fhe: bool = False) → ndarray

Predict the probability.

Args:

X (numpy.ndarray): The input data.
execute_in_fhe (bool): Whether to execute in FHE. Defaults to False.

Returns:

numpy.ndarray: The predicted probabilities.

method `quantize_input`

quantize_input(X: ndarray)

Quantize the input.

Args:

X (numpy.ndarray): the input

Returns: the quantized input

class `BaseTreeClassifierMixin`

Mixin class for tree-based classifiers.

A place to share methods that are used on all tree-based classifiers.

method `init`

__init__(n_bits: int)

Initialize the TreeBasedEstimatorMixin.

Args:

n_bits (int): number of bits used for quantization

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

onnx.ModelProto: the ONNX model

method `compile`

compile(
    X: ndarray,
    configuration: Optional[Configuration] = None,
    compilation_artifacts: Optional[DebugArtifacts] = None,
    show_mlir: bool = False,
    use_virtual_lib: bool = False,
    p_error: Optional[float] = 6.3342483999973e-05
) → Circuit

Compile the model.

Args:

X (numpy.ndarray): the dequantized dataset
configuration (Optional[Configuration]): the options for compilation
compilation_artifacts (Optional[DebugArtifacts]): artifacts object to fill during compilation
show_mlir (bool): whether or not to show MLIR during the compilation
use_virtual_lib (bool): set to True to use the so called virtual lib simulating FHE computation. Defaults to False
p_error (Optional[float]): probability of error of a PBS

Returns:

Circuit: the compiled Circuit.

method `dequantize_output`

dequantize_output(y_preds: ndarray)

Dequantize the integer predictions.

Args:

y_preds (numpy.ndarray): the predictions

Returns: the dequantized predictions

method `fit`

fit(X, y: ndarray, **kwargs) → Any

Fit the tree-based estimator.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): The target data.
**kwargs: args for super().fit

Returns:

Any: The fitted model.

method `fit_benchmark`

fit_benchmark(
    X: ndarray,
    y: ndarray,
    *args,
    random_state: Optional[int] = None,
    **kwargs
) → Tuple[Any, Any]

Fit the sklearn tree-based model and the FHE tree-based model.

Args:

X (numpy.ndarray): The input data.
y (numpy.ndarray): The target data. random_state (Optional[Union[int, numpy.random.RandomState, None]]): The random state. Defaults to None.
*args: args for super().fit
**kwargs: kwargs for super().fit

Returns: Tuple[ConcreteEstimators, SklearnEstimators]: The FHE and sklearn tree-based models.

method `post_processing`

post_processing(y_preds: ndarray) → ndarray

Apply post-processing to the predictions.

Args:

y_preds (numpy.ndarray): The predictions.

Returns:

numpy.ndarray: The post-processed predictions.

method `predict`

predict(X: ndarray, execute_in_fhe: bool = False) → ndarray

Predict the class with highest probability.

Args:

X (numpy.ndarray): The input data.
execute_in_fhe (bool): Whether to execute in FHE. Defaults to False.

Returns:

numpy.ndarray: The predicted target values.

method `predict_proba`

predict_proba(X: ndarray, execute_in_fhe: bool = False) → ndarray

Predict the probability.

Args:

X (numpy.ndarray): The input data.
execute_in_fhe (bool): Whether to execute in FHE. Defaults to False.

Returns:

numpy.ndarray: The predicted probabilities.

method `quantize_input`

quantize_input(X: ndarray)

Quantize the input.

Args:

X (numpy.ndarray): the input

Returns: the quantized input

class `SklearnLinearModelMixin`

A Mixin class for sklearn linear models with FHE.

method `init`

__init__(*args, n_bits: Union[int, Dict] = 2, **kwargs)

Initialize the FHE linear model.

Args:

n_bits (int, Dict): Number of bits to quantize the model. If an int is passed for n_bits, the value will be used for activation, inputs and weights. If a dict is passed, then it should contain "model_inputs", "op_inputs", "op_weights" and "model_outputs" keys with corresponding number of quantization bits for: - model_inputs : number of bits for model input - op_inputs : number of bits to quantize layer input values - op_weights: learned parameters or constants in the network - model_outputs: final model output quantization bits Default to 2.
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property input_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

onnx.ModelProto: the ONNX model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `clean_graph`

clean_graph()

Clean the graph of the onnx model.

This will remove the Cast node in the model's onnx.graph since they have no use in quantized or FHE models.

method `compile`

compile(
    X: ndarray,
    configuration: Optional[Configuration] = None,
    compilation_artifacts: Optional[DebugArtifacts] = None,
    show_mlir: bool = False,
    use_virtual_lib: bool = False,
    p_error: Optional[float] = 6.3342483999973e-05
) → Circuit

Compile the FHE linear model.

Args:

X (numpy.ndarray): The input data.
configuration (Optional[Configuration]): Configuration object to use during compilation
compilation_artifacts (Optional[DebugArtifacts]): Artifacts object to fill during compilation
show_mlir (bool): if set, the MLIR produced by the converter and which is going to be sent to the compiler backend is shown on the screen, e.g., for debugging or demo. Defaults to False.
use_virtual_lib (bool): whether to compile using the virtual library that allows higher bitwidths with simulated FHE computation. Defaults to False
p_error (Optional[float]): probability of error of a PBS

Returns:

Circuit: the compiled Circuit.

method `fit`

fit(X, y: ndarray, *args, **kwargs) → Any

Fit the FHE linear model.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): The target data.
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

Returns: Any

method `fit_benchmark`

fit_benchmark(
    X: ndarray,
    y: ndarray,
    *args,
    random_state: Optional[int] = None,
    **kwargs
) → Tuple[Any, Any]

Fit the sklearn linear model and the FHE linear model.

Args:

X (numpy.ndarray): The input data.
y (numpy.ndarray): The target data. random_state (Optional[Union[int, numpy.random.RandomState, None]]): The random state. Defaults to None.
*args: The arguments to pass to the sklearn linear model. or not (False). Default to False.
*args: args for super().fit
**kwargs: kwargs for super().fit

Returns: Tuple[SklearnLinearModelMixin, sklearn.linear_model.LinearRegression]: The FHE and sklearn LinearRegression.

method `post_processing`

post_processing(y_preds: ndarray) → ndarray

Post-processing the output.

Args:

y_preds (numpy.ndarray): the output to post-process

Returns:

numpy.ndarray: the post-processed output

method `predict`

predict(X: ndarray, execute_in_fhe: bool = False) → ndarray

Predict on user data.

Predict on user data using either the quantized clear model, implemented with tensors, or, if execute_in_fhe is set, using the compiled FHE circuit

Args:

X (numpy.ndarray): the input data
execute_in_fhe (bool): whether to execute the inference in FHE

Returns:

numpy.ndarray: the prediction as ordinals

class `SklearnLinearClassifierMixin`

A Mixin class for sklearn linear classifiers with FHE.

method `init`

__init__(*args, n_bits: Union[int, Dict] = 2, **kwargs)

Initialize the FHE linear model.

Args:

n_bits (int, Dict): Number of bits to quantize the model. If an int is passed for n_bits, the value will be used for activation, inputs and weights. If a dict is passed, then it should contain "model_inputs", "op_inputs", "op_weights" and "model_outputs" keys with corresponding number of quantization bits for: - model_inputs : number of bits for model input - op_inputs : number of bits to quantize layer input values - op_weights: learned parameters or constants in the network - model_outputs: final model output quantization bits Default to 2.
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property input_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

onnx.ModelProto: the ONNX model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `clean_graph`

clean_graph()

Clean the graph of the onnx model.

Any operators following gemm, including the sigmoid, softmax and argmax operators, are removed from the graph. They will be executed in clear in the post-processing method.

method `compile`

compile(
    X: ndarray,
    configuration: Optional[Configuration] = None,
    compilation_artifacts: Optional[DebugArtifacts] = None,
    show_mlir: bool = False,
    use_virtual_lib: bool = False,
    p_error: Optional[float] = 6.3342483999973e-05
) → Circuit

Compile the FHE linear model.

Args:

X (numpy.ndarray): The input data.
configuration (Optional[Configuration]): Configuration object to use during compilation
compilation_artifacts (Optional[DebugArtifacts]): Artifacts object to fill during compilation
show_mlir (bool): if set, the MLIR produced by the converter and which is going to be sent to the compiler backend is shown on the screen, e.g., for debugging or demo. Defaults to False.
use_virtual_lib (bool): whether to compile using the virtual library that allows higher bitwidths with simulated FHE computation. Defaults to False
p_error (Optional[float]): probability of error of a PBS

Returns:

Circuit: the compiled Circuit.

method `decision_function`

decision_function(X: ndarray, execute_in_fhe: bool = False) → ndarray

Predict confidence scores for samples.

Args:

X (numpy.ndarray): Samples to predict.
execute_in_fhe (bool): If True, the inference will be executed in FHE. Default to False.

Returns:

numpy.ndarray: Confidence scores for samples.

method `fit`

fit(X, y: ndarray, *args, **kwargs) → Any

Fit the FHE linear model.

Args:

X : training data By default, you should be able to pass: * numpy arrays * torch tensors * pandas DataFrame or Series
y (numpy.ndarray): The target data.
*args: The arguments to pass to the sklearn linear model.
**kwargs: The keyword arguments to pass to the sklearn linear model.

Returns: Any

method `fit_benchmark`

fit_benchmark(
    X: ndarray,
    y: ndarray,
    *args,
    random_state: Optional[int] = None,
    **kwargs
) → Tuple[Any, Any]

Fit the sklearn linear model and the FHE linear model.

Args:

X (numpy.ndarray): The input data.
y (numpy.ndarray): The target data. random_state (Optional[Union[int, numpy.random.RandomState, None]]): The random state. Defaults to None.
*args: The arguments to pass to the sklearn linear model. or not (False). Default to False.
*args: args for super().fit
**kwargs: kwargs for super().fit

Returns: Tuple[SklearnLinearModelMixin, sklearn.linear_model.LinearRegression]: The FHE and sklearn LinearRegression.

method `post_processing`

post_processing(y_preds: ndarray, already_dequantized: bool = False)

Post-processing the predictions.

This step may include a dequantization of the inputs if not done previously, in particular within the client-server workflow.

Args:

y_preds (numpy.ndarray): The predictions to post-process.
already_dequantized (bool): Wether the inputs were already dequantized or not. Default to False.

Returns:

numpy.ndarray: The post-processed predictions.

method `predict`

predict(X: ndarray, execute_in_fhe: bool = False) → ndarray

Predict on user data.

Predict on user data using either the quantized clear model, implemented with tensors, or, if execute_in_fhe is set, using the compiled FHE circuit.

Args:

X (numpy.ndarray): Samples to predict.
execute_in_fhe (bool): If True, the inference will be executed in FHE. Default to False.

Returns:

numpy.ndarray: The prediction as ordinals.

method `predict_proba`

predict_proba(X: ndarray, execute_in_fhe: bool = False) → ndarray

Predict class probabilities for samples.

Args:

X (numpy.ndarray): Samples to predict.
execute_in_fhe (bool): If True, the inference will be executed in FHE. Default to False.

Returns:

numpy.ndarray: Class probabilities for samples.

concrete.ml.sklearn.qnn

module `concrete.ml.sklearn.qnn`

Scikit-learn interface for concrete quantized neural networks.

Global Variables

MAXIMUM_TLU_BIT_WIDTH

class `SparseQuantNeuralNetImpl`

Sparse Quantized Neural Network classifier.

This class implements an MLP that is compatible with FHE constraints. The weights and activations are quantized to low bitwidth and pruning is used to ensure accumulators do not surpass an user-provided accumulator bit-width. The number of classes and number of layers are specified by the user, as well as the breadth of the network

method `init`

Sparse Quantized Neural Network constructor.

Args:

input_dim: Number of dimensions of the input data
n_layers: Number of linear layers for this network
n_outputs: Number of output classes or regression targets
n_w_bits: Number of weight bits
n_a_bits: Number of activation and input bits
n_accum_bits: Maximal allowed bitwidth of intermediate accumulators
n_hidden_neurons_multiplier: A factor that is multiplied by the maximal number of active (non-zero weight) neurons for every layer. The maximal number of neurons in the worst case scenario is: 2^n_max-1 max_active_neurons(n_max, n_w, n_a) = floor(---------------------) (2^n_w-1)*(2^n_a-1) ) The worst case scenario for the bitwidth of the accumulator is when all weights and activations are maximum simultaneously. We set, for each layer, the total number of neurons to be: n_hidden_neurons_multiplier * max_active_neurons(n_accum_bits, n_w_bits, n_a_bits) Through experiments, for typical distributions of weights and activations, the default value for n_hidden_neurons_multiplier, 4, is safe to avoid overflow.
activation_function: a torch class that is used to construct activation functions in the network (e.g. torch.ReLU, torch.SELU, torch.Sigmoid, etc)

Raises:

ValueError: if the parameters have invalid values or the computed accumulator bitwidth is zero

method `enable_pruning`

Enable pruning in the network. Pruning must be made permanent to recover pruned weights.

Raises:

ValueError: if the quantization parameters are invalid

method `forward`

Forward pass.

Args:

x (torch.Tensor): network input

Returns:

x (torch.Tensor): network prediction

method `make_pruning_permanent`

Make the learned pruning permanent in the network.

method `max_active_neurons`

Compute the maximum number of active (non-zero weight) neurons.

The computation is done using the quantization parameters passed to the constructor. Warning: With the current quantization algorithm (asymmetric) the value returned by this function is not guaranteed to ensure FHE compatibility. For some weight distributions, weights that are 0 (which are pruned weights) will not be quantized to 0. Therefore the total number of active quantized neurons will not be equal to max_active_neurons.

Returns:

n (int): maximum number of active neurons

method `on_train_end`

Call back when training is finished, can be useful to remove training hooks.

class `QuantizedSkorchEstimatorMixin`

Mixin class that adds quantization features to Skorch NN estimators.

property base_estimator_type

Get the sklearn estimator that should be trained by the child class.

property base_module_to_compile

Get the module that should be compiled to FHE. In our case this is a torch nn.Module.

Returns:

module (nn.Module): the instantiated torch module

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property input_quantizers

Get the input quantizers.

Returns:

List[Quantizer]: the input quantizers

property n_bits_quant

Return the number of quantization bits.

This is stored by the torch.nn.module instance and thus cannot be retrieved until this instance is created.

Returns:

n_bits (int): the number of bits to quantize the network

Raises:

ValueError: with skorch estimators, the module_ is not instantiated until .fit() is called. Thus this estimator needs to be .fit() before we get the quantization number of bits. If it is not trained we raise an exception

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

_onnx_model_ (onnx.ModelProto): the ONNX model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `get_params_for_benchmark`

Get parameters for benchmark when cloning a skorch wrapped NN.

We must remove all parameters related to the module. Skorch takes either a class or a class instance for the module parameter. We want to pass our trained model, a class instance. But for this to work, we need to remove all module related constructor params. If not, skorch will instantiate a new class instance of the same type as the passed module see skorch net.py NeuralNet::initialize_instance

Returns:

params (dict): parameters to create an equivalent fp32 sklearn estimator for benchmark

method `infer`

Perform a single inference step on a batch of data.

This method is specific to Skorch estimators.

Args:

x (torch.Tensor): A batch of the input data, produced by a Dataset
**fit_params (dict) : Additional parameters passed to the forward method of the module and to the self.train_split call.

Returns: A torch tensor with the inference results for each item in the input

method `on_train_end`

Call back when training is finished by the skorch wrapper.

Check if the underlying neural net has a callback for this event and, if so, call it.

Args:

net: estimator for which training has ended (equal to self)
X: data
y: targets
kwargs: other arguments

class `FixedTypeSkorchNeuralNet`

A mixin with a helpful modification to a skorch estimator that fixes the module type.

method `get_params`

Get parameters for this estimator.

Args:

deep (bool): If True, will return the parameters for this estimator and contained subobjects that are estimators.
**kwargs: any additional parameters to pass to the sklearn BaseEstimator class

Returns:

params : dict, Parameter names mapped to their values.

class `NeuralNetClassifier`

Scikit-learn interface for quantized FHE compatible neural networks.

This class wraps a quantized NN implemented using our Torch tools as a scikit-learn Estimator. It uses the skorch package to handle training and scikit-learn compatibility, and adds quantization and compilation functionality. The neural network implemented by this class is a multi layer fully connected network trained with Quantization Aware Training (QAT).

The datatypes that are allowed for prediction by this wrapper are more restricted than standard scikit-learn estimators as this class needs to predict in FHE and network inference executor is the NumpyModule.

method `init`

property base_estimator_type

property base_module_to_compile

Get the module that should be compiled to FHE. In our case this is a torch nn.Module.

Returns:

module (nn.Module): the instantiated torch module

property classes_

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property history

property input_quantizers

Get the input quantizers.

Returns:

List[Quantizer]: the input quantizers

property n_bits_quant

Return the number of quantization bits.

This is stored by the torch.nn.module instance and thus cannot be retrieved until this instance is created.

Returns:

n_bits (int): the number of bits to quantize the network

Raises:

ValueError: with skorch estimators, the module_ is not instantiated until .fit() is called. Thus this estimator needs to be .fit() before we get the quantization number of bits. If it is not trained we raise an exception

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

_onnx_model_ (onnx.ModelProto): the ONNX model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `fit`

method `get_params`

Get parameters for this estimator.

Args:

deep (bool): If True, will return the parameters for this estimator and contained subobjects that are estimators.
**kwargs: any additional parameters to pass to the sklearn BaseEstimator class

Returns:

params : dict, Parameter names mapped to their values.

method `get_params_for_benchmark`

Get parameters for benchmark when cloning a skorch wrapped NN.

Returns:

params (dict): parameters to create an equivalent fp32 sklearn estimator for benchmark

method `infer`

Perform a single inference step on a batch of data.

This method is specific to Skorch estimators.

Args:

x (torch.Tensor): A batch of the input data, produced by a Dataset
**fit_params (dict) : Additional parameters passed to the forward method of the module and to the self.train_split call.

Returns: A torch tensor with the inference results for each item in the input

method `on_train_end`

Call back when training is finished by the skorch wrapper.

Check if the underlying neural net has a callback for this event and, if so, call it.

Args:

net: estimator for which training has ended (equal to self)
X: data
y: targets
kwargs: other arguments

method `predict`

Predict on user provided data.

Predicts using the quantized clear or FHE classifier

Args:

X : input data, a numpy array of raw values (non quantized)
execute_in_fhe : whether to execute the inference in FHE or in the clear

Returns:

y_pred : numpy ndarray with predictions

class `NeuralNetRegressor`

Scikit-learn interface for quantized FHE compatible neural networks.

method `init`

property base_estimator_type

property base_module_to_compile

Get the module that should be compiled to FHE. In our case this is a torch nn.Module.

Returns:

module (nn.Module): the instantiated torch module

property fhe_circuit

Get the FHE circuit.

Returns:

Circuit: the FHE circuit

property history

property input_quantizers

Get the input quantizers.

Returns:

List[Quantizer]: the input quantizers

property n_bits_quant

Return the number of quantization bits.

This is stored by the torch.nn.module instance and thus cannot be retrieved until this instance is created.

Returns:

n_bits (int): the number of bits to quantize the network

Raises:

ValueError: with skorch estimators, the module_ is not instantiated until .fit() is called. Thus this estimator needs to be .fit() before we get the quantization number of bits. If it is not trained we raise an exception

property onnx_model

Get the ONNX model.

.. # noqa: DAR201

Returns:

_onnx_model_ (onnx.ModelProto): the ONNX model

property output_quantizers

Get the input quantizers.

Returns:

List[QuantizedArray]: the input quantizers

property quantize_input

Get the input quantization function.

Returns:

Callable : function that quantizes the input

method `fit`

method `get_params`

Get parameters for this estimator.

Args:

deep (bool): If True, will return the parameters for this estimator and contained subobjects that are estimators.
**kwargs: any additional parameters to pass to the sklearn BaseEstimator class

Returns:

params : dict, Parameter names mapped to their values.

method `get_params_for_benchmark`

Get parameters for benchmark when cloning a skorch wrapped NN.

Returns:

params (dict): parameters to create an equivalent fp32 sklearn estimator for benchmark

method `infer`

Perform a single inference step on a batch of data.

This method is specific to Skorch estimators.

Args:

x (torch.Tensor): A batch of the input data, produced by a Dataset
**fit_params (dict) : Additional parameters passed to the forward method of the module and to the self.train_split call.

Returns: A torch tensor with the inference results for each item in the input

method `on_train_end`

Call back when training is finished by the skorch wrapper.

Check if the underlying neural net has a callback for this event and, if so, call it.

Args:

net: estimator for which training has ended (equal to self)
X: data
y: targets
kwargs: other arguments

concrete.ml.quantization.quantized_ops

module `concrete.ml.quantization.quantized_ops`

Quantized versions of the ONNX operators for post training quantization.

class `QuantizedSigmoid`

Quantized sigmoid op.

class `QuantizedHardSigmoid`

Quantized HardSigmoid op.

class `QuantizedRelu`

Quantized Relu op.

class `QuantizedPRelu`

Quantized PRelu op.

class `QuantizedLeakyRelu`

Quantized LeakyRelu op.

class `QuantizedHardSwish`

Quantized Hardswish op.

class `QuantizedElu`

Quantized Elu op.

class `QuantizedSelu`

Quantized Selu op.

class `QuantizedCelu`

Quantized Celu op.

class `QuantizedClip`

Quantized clip op.

class `QuantizedRound`

Quantized round op.

class `QuantizedPow`

Quantized pow op.

Only works for a float constant power. This operation will be fused to a (potentially larger) TLU.

method `init`

method `can_fuse`

Determine if this op can be fused.

Power raising can be fused and computed in float when a single integer tensor generates both the operands. For example in the formula: f(x) = x ** (x + 1) where x is an integer tensor.

Returns:

bool: Can fuse

class `QuantizedGemm`

Quantized Gemm op.

method `init`

method `can_fuse`

Determine if this op can be fused.

Gemm operation can not be fused since it must be performed over integer tensors and it combines different values of the input tensors.

Returns:

bool: False, this operation can not be fused as it adds different encrypted integers

method `q_impl`

class `QuantizedMatMul`

Quantized MatMul op.

method `init`

method `can_fuse`

Determine if this op can be fused.

Gemm operation can not be fused since it must be performed over integer tensors and it combines different values of the input tensors.

Returns:

bool: False, this operation can not be fused as it adds different encrypted integers

method `q_impl`

class `QuantizedAdd`

Quantized Addition operator.

Can add either two variables (both encrypted) or a variable and a constant

method `can_fuse`

Determine if this op can be fused.

Add operation can be computed in float and fused if it operates over inputs produced by a single integer tensor. For example the expression x + x * 1.75, where x is an encrypted tensor, can be computed with a single TLU.

Returns:

bool: Whether the number of integer input tensors allows computing this op as a TLU

method `q_impl`

class `QuantizedTanh`

Quantized Tanh op.

class `QuantizedSoftplus`

Quantized Softplus op.

class `QuantizedExp`

Quantized Exp op.

class `QuantizedLog`

Quantized Log op.

class `QuantizedAbs`

Quantized Abs op.

class `QuantizedIdentity`

Quantized Identity op.

method `q_impl`

class `QuantizedReshape`

Quantized Reshape op.

method `q_impl`

Reshape the input integer encrypted tensor.

Args:

q_inputs: an encrypted integer tensor at index 0 and one constant shape at index 1
attrs: additional optional reshape options

Returns:

result (QuantizedArray): reshaped encrypted integer tensor

class `QuantizedConv`

Quantized Conv op.

method `init`

Construct the quantized convolution operator and retrieve parameters.

Args:

n_bits_output: number of bits for the quantization of the outputs of this operator
int_input_names: names of integer tensors that are taken as input for this operation
constant_inputs: the weights and activations
input_quant_opts: options for the input quantizer
attrs: convolution options
dilations (Tuple[int]): dilation of the kernel, default 1 on all dimensions.
group (int): number of convolution groups, default 1
kernel_shape (Tuple[int]): shape of the kernel. Should have 2 elements for 2d conv
pads (Tuple[int]): padding in ONNX format (begin, end) on each axis
strides (Tuple[int]): stride of the convolution on each axis

method `can_fuse`

Determine if this op can be fused.

Conv operation can not be fused since it must be performed over integer tensors and it combines different elements of the input tensors.

Returns:

bool: False, this operation can not be fused as it adds different encrypted integers

method `q_impl`

Compute the quantized convolution between two quantized tensors.

Allows an optional quantized bias.

Args:

q_inputs: input tuple, contains
x (numpy.ndarray): input data. Shape is N x C x H x W for 2d
w (numpy.ndarray): weights tensor. Shape is (O x I x Kh x Kw) for 2d
b (numpy.ndarray, Optional): bias tensor, Shape is (O,)
attrs: convolution options handled in constructor

Returns:

res (QuantizedArray): result of the quantized integer convolution

class `QuantizedAvgPool`

Quantized Average Pooling op.

method `init`

method `can_fuse`

Determine if this op can be fused.

Avg Pooling operation can not be fused since it must be performed over integer tensors and it combines different elements of the input tensors.

Returns:

bool: False, this operation can not be fused as it adds different encrypted integers

method `q_impl`

class `QuantizedPad`

Quantized Padding op.

method `init`

method `can_fuse`

Determine if this op can be fused.

Pad operation can not be fused since it must be performed over integer tensors.

Returns:

bool: False, this operation can not be fused as it is manipulates integer tensors

class `QuantizedWhere`

Where operator on quantized arrays.

Supports only constants for the results produced on the True/False branches.

method `init`

class `QuantizedCast`

Cast the input to the required data type.

In FHE we only support a limited number of output types. Booleans are cast to integers.

class `QuantizedGreater`

Comparison operator >.

Only supports comparison with a constant.

method `init`

class `QuantizedGreaterOrEqual`

Comparison operator >=.

Only supports comparison with a constant.

method `init`

class `QuantizedLess`

Comparison operator <.

Only supports comparison with a constant.

method `init`

class `QuantizedLessOrEqual`

Comparison operator <=.

Only supports comparison with a constant.

method `init`

class `QuantizedOr`

Or operator ||.

This operation is not really working as a quantized operation. It just works when things got fused, as in e.g. Act(x) = x || (x + 42))

method `init`

method `can_fuse`

Determine if this op can be fused.

Or can be fused and computed in float when a single integer tensor generates both the operands. For example in the formula: f(x) = x || (x + 1) where x is an integer tensor.

Returns:

bool: Can fuse

class `QuantizedDiv`

Div operator /.

This operation is not really working as a quantized operation. It just works when things got fused, as in e.g. Act(x) = 1000 / (x + 42))

method `init`

method `can_fuse`

Determine if this op can be fused.

Div can be fused and computed in float when a single integer tensor generates both the operands. For example in the formula: f(x) = x / (x + 1) where x is an integer tensor.

Returns:

bool: Can fuse

class `QuantizedMul`

Multiplication operator.

Only multiplies an encrypted tensor with a float constant for now. This operation will be fused to a (potentially larger) TLU.

method `init`

method `can_fuse`

Determine if this op can be fused.

Multiplication can be fused and computed in float when a single integer tensor generates both the operands. For example in the formula: f(x) = x * (x + 1) where x is an integer tensor.

Returns:

bool: Can fuse

class `QuantizedSub`

Subtraction operator.

This works the same as addition on both encrypted - encrypted and on encrypted - constant.

method `can_fuse`

Determine if this op can be fused.

Returns:

bool: Whether the number of integer input tensors allows computing this op as a TLU

method `q_impl`

class `QuantizedBatchNormalization`

Quantized Batch normalization with encrypted input and in-the-clear normalization params.

class `QuantizedFlatten`

Quantized flatten for encrypted inputs.

method `can_fuse`

Determine if this op can be fused.

Flatten operation can not be fused since it must be performed over integer tensors.

Returns:

bool: False, this operation can not be fused as it is manipulates integer tensors.

method `q_impl`

Flatten the input integer encrypted tensor.

Args:

q_inputs: an encrypted integer tensor at index 0
attrs: contains axis attribute

Returns:

result (QuantizedArray): reshaped encrypted integer tensor

class `QuantizedReduceSum`

ReduceSum with encrypted input.

This operator is currently an experimental feature.

method `init`

Construct the quantized ReduceSum operator and retrieve parameters.

Args:

n_bits_output (int): Number of bits for the operator's quantization of outputs.
int_input_names (Optional[Set[str]]): Names of input integer tensors. Default to None.
constant_inputs (Optional[Dict]): Input constant tensor.
axes (Optional[numpy.ndarray]): Array of integers along which to reduce. The default is to reduce over all the dimensions of the input tensor if 'noop_with_empty_axes' is false, else act as an Identity op when 'noop_with_empty_axes' is true. Accepted range is [-r, r-1] where r = rank(data). Default to None.
input_quant_opts (Optional[QuantizationOptions]): Options for the input quantizer. Default to None.
attrs (dict): RecuseSum options.
keepdims (int): Keep the reduced dimension or not, 1 means keeping the input dimension, 0 will reduce it along the given axis. Default to 1.
noop_with_empty_axes (int): Defines behavior if 'axes' is empty or set to None. Default behavior with 0 is to reduce all axes. When axes is empty and this attribute is set to true 1, input tensor will not be reduced, and the output tensor would be equivalent to input tensor. Default to 0.

method `calibrate`

Create corresponding QuantizedArray for the output of the activation function.

Args:

*inputs (numpy.ndarray): Calibration sample inputs.

Returns:

numpy.ndarray: the output values for the provided calibration samples.

method `q_impl`

Sum the encrypted tensor's values over axis 1.

Args:

q_inputs (QuantizedArray): An encrypted integer tensor at index 0.
attrs (Dict): Contains axis attribute.

Returns:

(QuantizedArray): The sum of all values along axis 1 as an encrypted integer tensor.

method `tree_sum`

Large sum without overflow (only MSB remains).

Args:

input_qarray: Enctyped integer tensor.
is_calibration: Whether we are calibrating the tree sum. If so, it will create all the quantizers for the downscaling.

Returns:

(numpy.ndarray): The MSB (based on the precision self.n_bits) of the integers sum.

class `QuantizedErf`

Quantized erf op.

class `QuantizedNot`

Quantized Not op.

class `QuantizedBrevitasQuant`

Brevitas uniform quantization with encrypted input.

method `init`

Construct the Brevitas quantization operator.

Args:

n_bits_output (int): Number of bits for the operator's quantization of outputs. Not used, will be overridden by the bit_width in ONNX
int_input_names (Optional[Set[str]]): Names of input integer tensors. Default to None.
constant_inputs (Optional[Dict]): Input constant tensor.
scale (float): Quantizer scale
zero_point (float): Quantizer zero-point
bit_width (int): Number of bits of the integer representation
input_quant_opts (Optional[QuantizationOptions]): Options for the input quantizer. Default to None. attrs (dict):
rounding_mode (str): Rounding mode (default and only accepted option is "ROUND")
signed (int): Whether this op quantizes to signed integers (default 1),
narrow (int): Whether this op quantizes to a narrow range of integers e.g. [-2n_bits-1 .. 2n_bits-1] (default 0),

method `q_impl`

Quantize values.

Args:

q_inputs: an encrypted integer tensor at index 0 and one constant shape at index 1
attrs: additional optional reshape options

Returns:

result (QuantizedArray): reshaped encrypted integer tensor

class `QuantizedTranspose`

Transpose operator for quantized inputs.

This operator performs quantization, transposes the encrypted data, then dequantizes again.

method `q_impl`

Reshape the input integer encrypted tensor.

Args:

q_inputs: an encrypted integer tensor at index 0 and one constant shape at index 1
attrs: additional optional reshape options

Returns:

result (QuantizedArray): reshaped encrypted integer tensor