1 of 30

0.2 What is Concrete ML?

| |

Concrete-ML is an open-source private machine learning inference framework based on fully homomorphic encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from and .

Fully Homomorphic Encryption (FHE) is an encryption technique that allows computating directly on encrypted data, without needing to decrypt it. With FHE, you can build private-by-design applications without compromising on features. You can learn more about FHE in , or by joining the community.

Example usage

This example shows the typical flow of a Concrete-ML model:

The model is trained on unencrypted (plaintext) data
The resulting model is quantized to small integers using either post-training quantization or quantization-aware training
The quantized model is compiled to a FHE equivalent (under the hood, the model is first converted to a Concrete-Numpy program, then compiled)
Inference can then be done on encrypted data

To make a model work with FHE, the only constrain is to make it run within the supported precision limitations of Concrete-ML (currently 7-bit integers).

Current Limitations

Currently, Concrete only supports 7-bit encrypted integer arithmetics. This requires models to be quantized heavily, which sometimes leads to loss of accuracy vs the plaintext model. Furthermore, the Concrete-Compiler is still a work in progress, meaning it won't always find optimal performance parameters, leading to slower than expected execution times.

Additionally, Concrete-ML currently only supports FHE inference. Training on the other hand has to be done on unencrypted data, producing a model which is then converted to an FHE equivalent that can do encrypted inference.

Finally, there is currently no support for pre and post processing in FHE. Data must arrive to the FHE model already pre-processed and post-processing (if there is any) has to be done client-side.

All of these issues are currently being addressed and significant improvements are expected to be released in the coming months.

Additional resources

Looking for support? Ask our team!

Installing

Concrete ML can be run on Linux based OSes as well as macOS on x86 CPUs. These hardware requirements are dictated by Concrete-Lib.

Do note that since WSL on Windows is a Linux based OS, Concrete ML will work as long as the package is not mandated in the /mnt/c/ directory, which corresponds to the host OS filesystem.

Python package

To install Concrete-ML from PyPi, run the following:

pip install concrete-ml

Note that concrete-ml installs concrete-numpy with all extras, including pygraphviz to draw graphs.

pygraphviz requires graphviz packages being installed on your OS, see https://pygraphviz.github.io/documentation/stable/install.html

graphviz packages are binary packages that won't automatically be installed by pip. Do check https://pygraphviz.github.io/documentation/stable/install.html for instructions on how to install graphviz for pygraphviz.

Docker image

You can also get the concrete-ml Docker image by either pulling the latest Docker image or a specific version:

docker pull zamafhe/concrete-ml:latest
# or
docker pull zamafhe/concrete-ml:v0.1.0

The image can be used with Docker volumes, see the Docker documentation here.

You can then use this image with the following command:

# Without local volume:
docker run --rm -it -p 8888:8888 zamafhe/concrete-ml:v0.1.0

# With local volume to save notebooks on host:
docker run --rm -it -p 8888:8888 -v /host/path:/data zamafhe/concrete-ml:v0.1.0

This will launch a Concrete-ML enabled Jupyter server in Docker, that you can access from your browser.

Alternatively, you can just open a shell in Docker with or without volumes:

docker run --rm -it zamafhe/concrete-ml:v0.2.0 /bin/bash

How To

Scikit-learn

Concrete-ML is compatible with sklearn APIs such as Pipeline() or GridSearch(), which are popular model selection methods.

Here is a simple example of such a process:

Torch

Concrete-ML allows you to compile a torch model to its FHE counterpart.

This process executes most of the concepts described in the documentation on how to use quantization and triggers the compilation to be able to run the model over homomorphically encrypted data.

from torch import nn
import torch
class LogisticRegression(nn.Module):
    """LogisticRegression with torch"""

    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(in_features=14, out_features=1)
        self.sigmoid1 = nn.Sigmoid()


    def forward(self, x):
        """Forward pass."""
        out = self.fc1(x)
        out = self.sigmoid1(out)
        return out

torch_model = LogisticRegression()

Note that the architecture of the neural network passed to be compiled must respect some hard constraints given by FHE. Please read the our detailed documentation on these limitations.

Once your model is trained, you can simply call the compile_torch_model function to execute the compilation.

from concrete.ml.torch.compile import compile_torch_model
import numpy
torch_input = torch.randn(100, 14)
quantized_numpy_module = compile_torch_model(
    torch_model, # our model
    torch_input, # a representative inputset to be used for both quantization and compilation
    n_bits = 2,
)

You can then call quantized_numpy_module.forward_fhe.encrypt_run_decrypt() to have the FHE inference.

Now your model is ready to infer in FHE settings.

enc_x = numpy.array([numpy.random.randn(14)])
# An example that is going to be encrypted, and used for homomorphic inference.
enc_x_q = quantized_numpy_module.quantize_input(enc_x)
fhe_prediction = quantized_numpy_module.forward_fhe.encrypt_run_decrypt(enc_x_q)

fhe_prediction contains the clear quantized output. The user can now dequantize the output to get the actual floating point prediction as follows:

clear_output = quantized_numpy_module.dequantize_output(
    numpy.array(fhe_prediction, dtype=numpy.float32)
)

If you want to see more compilation examples, you can check out the Fully Connected Neural Network

List of supported torch operators

Our torch conversion pipeline uses ONNX and an intermediate representation. We refer the user to the Concrete ML ONNX operator reference for more information.

The following operators in torch will be exported as Concrete-ML compatible ONNX operators:

Operators that take an encrypted input and unencrypted constants:

List of supported activations

Note that the equivalent versions from torch.functional are also supported.

Compute with Quantized Functions

With the current version of the framework, we cannot represent encrypted integers with more than 7 bits. While we are working on supporting larger integers, currently, whenever a floating point model needs to be processed in FHE, quantization is necessary.

What happens when encrypted computations produce values larger than 7 bits?

In this situation, you will get a compilation error. Here is an example:

import concrete.numpy as hnp

def f(x):
    return 42 * x

compiler = hnp.NPFHECompiler(f, {"x": "encrypted"})
circuit = compiler.compile_on_inputset(range(8))

When you compile this example, it results in:

RuntimeError: max_bit_width of some nodes is too high for the current version of the compiler (maximum must be 7), which is not compatible with:

%0 = x                  # EncryptedScalar<uint3>
%1 = 42                 # ClearScalar<uint6>
%2 = mul(%0, %1)        # EncryptedScalar<uint9>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 9 bits is not supported for the time being
return %2

Notice that the maximum bit width, determined by the compiler, depends on the inputset passed to the compile_on_inputset function. In this case, the error is caused by the input value in the inputset that produces a result whose representation requires 9 bits. This input is the value 8, since 8 * 42 = 336, which is a 9-bit value.

You can determine the number of bits necessary to represent an integer value with the formula:

Can floating point computations be replaced by integer computations?

For a more practical example, the MNIST classification task consists of taking an image, a 28x28 array containing uint8 values representing a handwritten digit, and predicting whether it belongs to one of 10 classes: the digits from 0 to 9. The output is a one-hot vector which indicates the class to which a particular sample belongs.

The input contains 28x28x8 bits, so 6272 bits of information. In practice, you could still obtain good results on MNIST by thresholding the pixels to {0, 1} and training a model for this new binarized MNIST task. This means that in a real use case where you actually need to perform digit recognition, you could binarize your input on the fly, replacing each pixel with either 0 or 1. In doing so, you use 1 bit per pixel and now only have 784 bits of input data. It also means that if you are doing some accumulation (adding pixel values together), you are going to need accumulators that are smaller (adding 0s and 1s requires less space than adding values ranging from 0 to 255). An example of MNIST classification with a quantized neural network is given in the CNN advanced example.

This shows how adapting your data or model parameters can allow you to use models that may require smaller data types (i.e. use less precision) to perform their computations.

Binarization is an extreme case of quantization which is introduced here. You can also find further resources on the linked page.

While applying quantization directly to the input features is mandatory to reduce the effective bit width of computations, a different and complementary approach is dimensionality reduction. This can be accomplished through Principal Component Analysis (PCA) as shown in the Poisson Regression example

Model accuracy concerns when using quantization

Quantization and dimensionality reduction reduce the bit width required to run the model and increase execution speed. These two tools are necessary to make models compatible with FHE constraints.

However, quantization and, especially, binarization, induce a loss in the accuracy of the model since its representation power is diminished. Carefully choosing a quantization approach for model parameters can alleviate accuracy loss, all the while allowing compilation to FHE.

The quantization of model parameters and model inputs is illustrated in the advanced examples for Linear Regression and for Logistic Regression. Note that different quantization parameters are used for inputs and for model weights.

Limitations for FHE friendly neural networks

Recent quantization literature usually aims to make use of dedicated machine learning accelerators in a mixed setting where a CPU or General Purpose GPU (GPGPU) is also available. Thus, in literature, some floating point computation is assumed to be acceptable. This approach allows us to reach performance similar to those achieved by floating point models. In this popular mixed float-int setting, the input is usually left in floating point. This is also true for the first and last layers, which have more impact on the resulting model accuracy than hidden layers.

However, in Concrete-ML, to respect FHE constraints, the inputs, the weights and the accumulator must all be represented with integers of a maximum of 7 bits.

Thus, in Concrete-ML, we also quantize the input data and network output activations in the same way as the rest of the network: everything is quantized to a specific number of bits. It turns out that the number of bits used for the input or the output of any activation function is crucial to comply with the constraint on accumulator width.

The core operations in neural networks are matrix multiplications (matmul) and convolutions, which both compute linear combinations of inputs (encrypted) and weights (in clear). The linear combination operation must be done such that the maximum value of its result requires at most 7 bits of precision.

Currently, Concrete-ML computes the number of bits needed for the computation depending on the inputset calibration data and does not allow the overflow (see Integer overflow) to happen, raising an exception as shown above.

Use Concrete ML ONNX Support

Internally, Concrete-ML uses operators as intermediate representation (or IR) for manipulating machine learning models produced through export for , and . As ONNX is becoming the standard exchange format for neural networks, this allows Concrete-ML to be flexible while also making model representation manipulation quite easy. In addition, it allows for straight-forward mapping to NumPy operators, supported by Concrete-Numpy to use the Concrete stack FHE conversion capabilities.

Here we list the operators that are supported as well as the operators that have a quantized version, which should allow you to perform automatic Post Training Quantization (PTQ) of your models.

Please note that due to the current precision constraints from the Concrete stack, PTQ may produce circuits that have worse accuracy than your original model.

Ops supported for evaluation/NumPy conversion

The following operators should be supported for evaluation and conversion to an equivalent NumPy circuit. As long as your model converts to an ONNX using these operators, it should be convertible to an FHE equivalent.

Do note that all operators may not be fully supported for conversion to a circuit executable in FHE. You will get error messages should you use such an operator in a circuit you are trying to convert to FHE.

Abs
Acos
Acosh
Add
Asin
Asinh
Atan
Atanh
Celu
Clip
Constant
Conv
Cos
Cosh
Div
Elu
Equal
Erf
Exp
Gemm
Greater
HardSigmoid
Identity
LeakyRelu
Less
Log
MatMul
Mul
Not
Relu
Reshape
Selu
Sigmoid
Sin
Sinh
Softplus
Sub
Tan
Tanh
ThresholdedRelu

Ops supported for Post Training Quantization

Abs: QuantizedAbs
Add: QuantizedAdd
Celu: QuantizedCelu
Clip: QuantizedClip
Conv: QuantizedConv
Elu: QuantizedElu
Exp: QuantizedExp
Gemm: QuantizedGemm
HardSigmoid: QuantizedHardSigmoid
Identity: QuantizedIdentity
LeakyRelu: QuantizedLeakyRelu
Linear: QuantizedLinear
Log: QuantizedLog
MatMul: QuantizedMatMul
Relu: QuantizedRelu
Reshape: QuantizedReshape
Selu: QuantizedSelu
Sigmoid: QuantizedSigmoid
Softplus: QuantizedSoftplus
Tanh: QuantizedTanh

Debug / Get Support / Submit Issues

This version of Concrete-ML is a first version of the product, meaning that it is not completely finished, contains several bugs (both known and unknown at this time), and will improve over time with feedback from early users.

Here are some ways to debug your problems. If nothing seems conclusive, you can still report the issue, as explained in a later section of this page.

Is it a bug by the framework or by the user?

First, we encourage the user to have a look at:

the error message received
the documentation of the product
the known limits of the product

Once you have determined that the bug was not your own, it is time to go further.

Is the inputset sufficiently representative?

A bug may happen if ever the inputset, which is internally used by the compilation core to set bit widths of some intermediate data, is not sufficiently representative. With all the inputs in the inputset, it appears that intermediate data can be represented as an n-bit integer. But, for a particular computation, this same intermediate data needs additional bits to be represented. The FHE execution for this computation will result in an incorrect output (as typically occurs in integer overflows in classical programs).

So, in general, when a bug appears, it may be a good idea to enlarge the inputset and try to have random-looking inputs in the latter, following distribution of inputs used with the function.

Having a reproducible bug

Once you're sure it is a bug, it would be nice to:

make it highly reproducible by reducing as much of the randomness as possible - If you can find an input which fails, there is no reason to let the input remain random
reduce it to the smallest possible bug - It is easier to investigate bugs which are small, so when you have an issue, please try to reduce it to a smaller issue, notably with fewer lines of code, smaller parameters, less complex functions to compile, and faster scripts, etc.

Asking the community

You can directly ask the developers and community about your issue on our Discourse server (link on the right of the top menu).

Hopefully, it is just a misunderstanding or a small mistake on your side that we can help you fix easily. Additionally, your feedback helps us make the documentation even clearer (by adding to the FAQ, for example).

Submitting an issue

To simplify our work and let us reproduce your bug easily, we need all the information we can get. So, in addition to your Python script, the following information is useful:

the reproducibility rate you see on your side
any insight you might have on the bug
any workaround you have been able to find

Remember, Concrete-ML is a project where we are open to contributions. You can find more information at Contributing.

In case you have a reproducible bug that you have reduced to a small piece of code, we have our issue tracker in place (link on the right of the top menu). Remember that a well-described short issue is an issue that is more likely to be studied and fixed. The more issues we receive, the better the product will be.

Advanced examples

Concrete-ML models

The following table summarizes the various examples in this section, along with their accuracies.

Model

Dataset

Metric

Clear

Quantized

FHE

A * means that FHE accuracy was calculated on a subset of the validation set.

Comparison of classifiers

Deep learning

In this table, ** means that the accuracy is actually random-like, because the quantization we need to set to fullfill bitsize constraints is too strong.

Explanations

Philosophy of the Design

Our primary concern in this release was the ease of adoption of our framework. That is why we built APIs, which should feel natural to data scientists. While performance is also an important concern for deployment of FHE machine learning models, improvements on this front will come in future releases.

To this end, we have decided to mimic the APIs of scikit-learn and XGBoost for machine learning models (linear models and tree-based models) and of torch for deep learning models. We refer readers to and to , which show how similar our APIs are to their non-FHE counterparts.

Quantization

from :

Quantization is the process of constraining an input from a continuous or otherwise large set of values (such as the real numbers) to a discrete set (such as the integers).

Why is it needed?

Modern computing has been using data types that are 32 or 64 bits wide for many years, for both integers and floating point values. Even bigger data types are available or can be constructed easily. However, due to the costly nature of FHE computations (see ), using such types with FHE is impractical (or plain impossible) if we are to execute computations in a reasonable amount of time.

The gist of quantization

The basic idea of quantization is to take a range of values that are represented by a large data type and represent them using a single value of a smaller data type. This means that some accuracy in the representation is lost (e.g. a simple approach is to eliminate least-significant bits), but, in many cases in machine learning, it is possible to adapt the models to give meaningful results while using these smaller data types. This significantly reduces the number of bits necessary for intermediary results during the execution of these machine learning models.

Quantization in practice

Let's first define some notations. Let be the range of our value to quantize where is the minimum and is the maximum.

To quantize a range with floating point values (in ) to integer values (in ), we first need to choose the data type that is going to be used. Concrete-Library, the backend library used by Concrete-ML, is currently limited to 7-bit integers, so we'll use this value for the example. Knowing the number of bits that can be used, for a value in the range , we can compute the scale of the quantization:

where is the number of bits (here, 7).

In practice, the quantization scale is then . This means the gap between consecutive representable values cannot be smaller than , which, in turn, means there can be a substantial loss of precision. Every interval of length will be represented by a value within the range .

The other important parameter from this quantization schema is the zero point value. This essentially brings the 0 floating point value to a specific integer. If the quantization scheme is asymmetric (quantized values are not centered in 0), the resulting integer will be in .

Regarding quantization in Concrete-ML and FHE compilation, it is important to understand the difference between two approaches:

The quantization is done automatically during the model compilation stage (inside our framework). This approach requires little work by the user, but may not be a one-size-fits-all solution for all types of models that a user may want to implement.
The quantization is done by the user, before compilation to FHE; notably, the quantization is completely controlled by the user, and can be done by any means, including by using third-party frameworks. In this approach, the user is responsible for implementing their models directly with NumPy.

For the moment, the first method is applicable through the tools provided by in Concrete-ML, and the models implemented in our framework make use of this approach. When quantization is only performed in the compilation stage, the model training stage does not take into account that the model will be quantized. This setting is called Post-Training Quantization (PTQ), and this is the approach currently taken in Concrete-ML. PTQ is effective for moderate bit widths, such as 7-8 bits per weight and activation, but, for a model to be compatible with FHE constraints, we must quantize these values to as few as 2-3 bits. Thus, for models with more than a few neurons per layer, PTQ is not the optimal solution, and we plan to implement a more performant approach called Quantization Aware Training in the near future.

Resources

Pruning

In neural networks, a neuron computes a linear combination of inputs and learned weights, then applies an activation function.

The neuron computes:

When building a full neural network, each layer will contain multiple neurons, which are connected to the neuron outputs of a previous layer or to the inputs.

For every neuron shown in each layer of the figure above, the linear combinations of inputs and learned weights are computed. Depending on the values of the inputs and weights, the sum - which, for Concrete-ML neural networks, is computed with integers - can take a range of different values.

To respect the bit width constraint of the mechanism, implemented with programmable bootstrapping, the values of the accumulator must remain small to be representable with only 7 bits. In other words, the values must be between 0 and 127.

Pruning a neural network entails fixing some of the weights to be zero during training. This is advantageous to meet FHE constraints, as, irrespective of the distribution of , multiplying these input values by 0 does not increase the accumulator value.

Fixing some of the weights to 0 makes the network graph look more similar to the following:

Pruning weights can reduce the prediction performance of the neural network, but studies show that a high level of pruning (above 50%, see Han, Song & Pool, Jeff & Tran, John & Dally, William. (2015). Learning both Weights and Connections for Efficient Neural Networks) can be applied. In Concrete-ML, we implement with pruning, as described in the .

Virtual Lib

What is the Virtual Lib?

The Virtual Lib in Concrete-ML is a prototype that provides drop-in replacements for Concrete-Numpy, Compiler and Circuit that allow users to simulate what would happen when converting a model to FHE without the current bit width constraint, or to more quickly simulate the behavior with 7 bits or less as there are no FHE computations.

In other words, you can use the compile functions from the Concrete-ML package by passing use_virtual_lib = True and using a CompilationConfiguration with enable_unsafe_features = True. You will then get a simulated circuit that allows you to use more than the current 7 bits of precision allowed by the Concrete stack. It is also a faster way to measure the potential FHE accuracy with 7 bits or less. It is something we used for the red/blue contours in the Classifier Comparison notebook, as computing in FHE for the whole grid and all the classifiers would be very long.

What should it be used for?

The Virtual Lib can be useful when developing and iterating on an ML model implementation. For example, you can check that your model is compatible in terms of operands (all integers) with the Virtual Lib compilation. Then, you can check how many bits your ML model would require, which can give you hints as to how it should be modified if you want to compile it to an actual FHE Circuit (not a simulated one) that only supports 7 bits of integer precision.

The Virtual Lib, being pure Python and not requiring crypto key generation, can be much faster than the actual compilation and FHE execution, thus allowing for faster iterations, debugging and FHE simulation, regardless of the bit width used.

Resources

The interested reader has even more resources to review, in addition to this documentation:

Our , the link for which can be found at the top right of doc pages.
The varied blogs we publish, currently located on . Notably, describes the use of a Poisson regressor to tackle a real-life use case in a privacy-preserving setting.

Additionally, we plan to publish academic and white papers explaining interesting aspects of our work, covering both the engineering and scientific sides of our offering.

Developper How To

Set Up the Project

You will need to first install Python. This can be done automatically for Linux with the rest of the dependencies running the script indicated below with the --linux-install-python flag. If you want to install some of the dependencies manually, we detail the installations of Poetry and Make.

On Linux and macOS you will have to run the script in ./script/make_utils/setup_os_deps.sh. Specify the --linux-install-python flag if you want to install python3.8 as well on apt-enabled Linux distributions. The script should install everything you need for Docker and bare OS development (you can first check the content of the file to check what it will do).

It is strongly recommended to use the development Docker (see the docker guide). However, our helper script should bring all the tools you need to develop directly on Linux and macOS.

For Windows see the Warning admonition below.

The project targets Python 3.8 through 3.9 inclusive.

For Windows users, the setup_os_deps.sh script does not install dependencies because of how many different installation methods there are/lack of a single package manager.

The first step is to install Python (as some of our dev tools depend on it), then Poetry. In addition to installing Python, you are still going to need the following software available on path on Windows, as some of our basic dev tools depend on them:

Development on Windows only works with the Docker environment. Follow this link to setup the Docker environment.

Installing Python

Concrete ML is a Python library, so Python should be installed to develop Concrete ML. v3.8 and v3.9 are the only supported versions.

As stated at the start of this document, you can install Python 3.8 for Linux automatically if it's available in your distribution's apt repository using the ./script/make_utils/setup_os_deps.sh script.

You can follow this guide to install it (alternatively, you can google how to install Python 3.8 (or 3.9)).

Installing Poetry

Poetry is our package manager. It drastically simplifies dependency and environment management.

As stated at the start of this document, you can install Poetry for macOS and Linux automatically using the ./script/make_utils/setup_os_deps.sh script.

You can follow this official guide to install it.

As there is no concrete-compiler package for Windows, only the dev dependencies can be installed. This requires Poetry >= 1.2.

At the time of writing (March 2022), there is only an alpha version of Poetry 1.2 that you can install. Use the official installer to install preview versions.

Installing Make

The dev tools use make to launch the various commands.

As stated at the start of this document, you can install make for macOS and Linux automatically if it's available in your distribution's apt repository using the ./script/make_utils/setup_os_deps.sh script.

On Linux, you can install make from your distribution's preferred package manager.

On macOS, you can install a more recent version of make via brew:

# check for gmake
which gmake
# If you don't have it, it will error out, install gmake
brew install make
# recheck, now you should have gmake
which gmake

It is possible to install gmake as make. Check this StackOverflow post for more info.

On Windows, check this GitHub gist.

In the following sections, be sure to use the proper make tool for your system: make, gmake, or other.

Cloning repository

Now, it's time to get the source code of Concrete ML.

Clone the code repository using the link for your favourite communication protocol (ssh or https).

Setting up environment on your host OS

We are going to make use of virtual environments. This helps to keep the project isolated from other Python projects in the system. The following commands will create a new virtual environment under the project directory and install dependencies to it.

The following command will not work on Windows if you don't have Poetry >= 1.2.

cd concrete-ml
make setup_env

Activating the environment

Finally, all we need to do is to activate the newly created environment using the following command:

macOS or Linux

source .venv/bin/activate

Windows

source .venv/Scripts/activate

Setting up environment on Docker

Docker automatically creates and sources a venv in ~/dev_venv/

The venv persists thanks to volumes. We also create a volume for ~/.cache to speed up later reinstallations. You can check which Docker volumes exist with:

docker volume ls

You can still run all make commands inside Docker (to update the venv, for example). Be mindful of the current venv being used (the name in parentheses at the beginning of your command prompt).

# Here we have dev_venv sourced
(dev_venv) dev_user@8e299b32283c:/src$ make setup_env

Leaving the environment

After your work is done, you can simply run the following command to leave the environment:

deactivate

Syncing environment with the latest changes

From time to time, new dependencies will be added to the project or the old ones will be removed. The command below will make sure the project has the proper environment. So run it regularly!

make sync_env

Troubleshooting your environment

In your OS

If you are having issues, consider using the dev Docker exclusively (unless you are working on OS specific bug fixes or features).

Here are the steps you can take on your OS to try and fix issues:

# Try to install the env normally
make setup_env

# If you are still having issues, sync the environment
make sync_env

# If you are still having issues on your OS, delete the venv:
rm -rf .venv

# And re-run the env setup
make setup_env

At this point, you should consider using Docker as nobody will have the exact same setup as you. If, however, you need to develop on your OS directly, you can ask us for help but may not get a solution right away.

In Docker

Here are the steps you can take in your Docker to try and fix issues:

# Try to install the env normally
make setup_env

# If you are still having issues, sync the environment
make sync_env

# If you are still having issues in Docker, delete the venv:
rm -rf ~/dev_venv/*

# Disconnect from Docker
exit

# And relaunch, the venv will be reinstalled
make docker_start

# If you are still out of luck, force a rebuild which will also delete the volumes
make docker_rebuild

# And start Docker, which will reinstall the venv
make docker_start

If the problem persists at this point, you should ask for help. We're here and ready to assist!

Set Up Docker

Setting up Docker and X forwarding

Before you start this section, go ahead and install Docker. You can follow this official guide if you require assistance.

X forwarding means redirecting the display to your host machine screen so that the Docker container can display things on your screen (otherwise you would only get CLI/terminal interface to your container).

Linux

xhost +localhost

MacOS.

To be able to use X forwarding on macOS:

Install XQuartz
Open XQuartz.app and make sure that authorize network connections is set in the application parameters (currently in the Security settings)
Open a new terminal within XQuartz.app and type:

xhost +127.0.0.1

Now, the X server should be all set in Docker (in the regular terminal).

Windows

Install Xming and use Xlaunch:

Multiple Windows, Display number: 0
Start no client
IMPORTANT: Check No Access Control
You can save this configuration to relaunch easily, then click finish.

Building the image

Once you have access to this repository and the dev environment is installed on your host OS (via make setup_env once you followed the steps here), you should be able to launch the commands to build the dev Docker image with make docker_build.

Once you do that, you can get inside the Docker environment using the following command:

make docker_start

# or build and start at the same time
make docker_build_and_start
# or equivalently but shorter
make docker_bas

After you finish your work, you can leave Docker by using the exit command or by pressing CTRL + D.

Document

Using Sphinx

One can simply create docs with Sphinx and open them, by doing:

make docs

The documentation contains both files written by hand by developers (the .md files) and files automatically created by parsing the source files.

Opening doc

make open_docs

Or simply open docs/_build/html/index.html.

Remark that a

make docs_and_open_docs

conveniently builds and opens the doc at the end.

Create a Release on GitHub

Release candidate cycle

Before settling for a final release, we go through a release candidate (RC) cycle. The idea is that once the codebase and documentations look ready for a release, you create an RC release by opening an issue with the release template here, starting with version vX.Y.Zrc1 and then with versions vX.Y.Zrc2, vX.Y.Zrc3...

Proper release

Once the last RC is deemed ready, open an issue with the release template using the last RC version from which you remove the rc? part (i.e. v12.67.19 if your last RC version was v12.67.19-rc4) on github.

Contribute

There are two ways to contribute to Concrete-ML or to Concrete tools in general:

You can open issues to report bugs and typos and to suggest ideas.
You can ask to become an official contributor by emailing hello@zama.ai. Only approved contributors can send pull requests (PR), so please make sure to get in touch before you do!

Let's go over some other important things that you need to be careful about.

Creating a new branch

We are using a consistent branch naming scheme, and you are expected to follow it as well. Here is the format, along with some examples:

git checkout -b {feat|fix|refactor|test|benchmark|doc|style|chore}/short-description_$issue_id

e.g.

git checkout -b feat/explicit-tlu_11
git checkout -b fix/tracing_indexing_42

Before committing

Conformance.

Each commit to Concrete-ML should conform to the standards decided by the team.

You can let the development tools fix some issues automatically with the following command:

make conformance

Conformance can be checked using the following command:

make pcc

Pytest.

Of course, tests must pass as well.

make pytest

Coverage.

The last requirement is to make sure you get 100 percent code coverage. The make pytest command checks that by default and will fail with a coverage report at the end should some lines of your code not be executed during testing.

If your coverage is below 100 percent, you should write more tests and then create the pull request (PR). If you ignore this warning and create the PR, GitHub actions will fail and your PR will not be merged.

There may be cases where covering your code is not possible (an exception that cannot be triggered in normal execution circumstances). In those cases, you may be allowed to disable coverage for some specific lines. This should be the exception rather than the rule, and reviewers will ask why some lines are not covered. If it appears they can be covered, then the PR won't be accepted in that state.

Committing

We are using a consistent commit naming scheme, and you are expected to follow it as well (the CI will make sure you do). The accepted format can be printed to your terminal by running:

make show_scope

e.g.

git commit -m "feat: implement bounds checking"
git commit -m "feat(debugging): add an helper function to draw intermediate representation"
git commit -m "fix(tracing): fix a bug that crashed pytorch tracer"

To learn more about conventional commits, check this page. Just a reminder that commit messages are checked in the comformance step, and rejected if they don't follow the rules.

Before creating a pull request

We remind you that only official contributors can send pull requests. To become an official contributor, please email hello@zama.ai.

You should rebase on top of the main branch before you create your pull request. We don't allow merge commits, so rebasing on main before pushing gives you the best chance of avoiding having to rewrite parts of your PR later if some conflicts arise with other PRs being merged. After you commit your changes to your new branch, you can use the following commands to rebase:

# fetch the list of active remote branches
git fetch --all --prune

# checkout to main
git checkout main

# pull the latest changes to main (--ff-only is there to prevent accidental commits to main)
git pull --ff-only

# checkout back to your branch
git checkout $YOUR_BRANCH

# rebase on top of main branch
git rebase main

# If there are conflicts during the rebase, resolve them
# and continue the rebase with the following command
git rebase --continue

# push the latest version of the local branch to remote
git push --force

You can learn more about rebasing here.

Developper Explanations

Concrete Stack

Concrete-ML is built on top of Zama’s Concrete stack. It uses Concrete-Numpy, which itself uses the Concrete-Compiler.

The Concrete-Compiler takes MLIR code as input representing a computation circuit and compiles it to an executable using Concrete primitives to perform the computations.

We refer the reader to Concrete-Numpy documentation and, more generally, to the documentation of the whole Concrete-Framework for more information.

Quantization

In this section, we detail the usage of quantization in Concrete-ML.

Quantizing data

Since quantization is necessary to make ML models work in FHE, Concrete-ML implements quantized ML models to facilitate usage, but also exposes some quantization tools. The core of this functionality is the conversion of floating point values to integers, following the techniques described here. We can apply this conversion using QuantizedArray, available in concrete.ml.quantization.

The QuantizedArray class takes several arguments that determine how float values are quantized:

n_bits that defines the precision of the quantization
values are floating point values that will be converted to integers
is_signed determines if the quantized integer values should allow negative values
is_symmetric determines if the range of floating point values to be quantized should be taken as symmetric around zero

Please see the API reference for more information.

from concrete.ml.quantization import QuantizedArray
import numpy
numpy.random.seed(0)
A = numpy.random.uniform(-2, 2, 10)
print("A = ", A)
# array([ 0.19525402,  0.86075747,  0.4110535,  0.17953273, -0.3053808,
#         0.58357645, -0.24965115,  1.567092 ,  1.85465104, -0.46623392])
q_A = QuantizedArray(7, A)
print("q_A.qvalues = ", q_A.qvalues)
# array([ 37,          73,          48,         36,          9,
#         58,          12,          112,        127,         0])
# the quantized integers values from A.
print("q_A.scale = ", q_A.scale)
# 0.018274684777173276, the scale S.
print("q_A.zero_point = ", q_A.zero_point)
# 26, the zero point Z.
print("q_A.dequant() = ", q_A.dequant())
# array([ 0.20102153,  0.85891018,  0.40204307,  0.18274685, -0.31066964,
#         0.58478991, -0.25584559,  1.57162289,  1.84574316, -0.4751418 ])
# Dequantized values.

We can also use symmetric quantization, where the integer values are centered around 0 and may, thus, take negative values.

q_A = QuantizedArray(3, A)
print("Unsigned: q_A.qvalues = ", q_A.qvalues)
print("q_A.zero_point = ", q_A.zero_point)
# Unsigned: q_A.qvalues =  [2 4 2 2 0 3 0 6 7 0]
# q_A.zero_point =  1

q_A = QuantizedArray(3, A, is_signed=True, is_symmetric=True)
print("Signed Symmetric: q_A.qvalues = ", q_A.qvalues)
print("q_A.zero_point = ", q_A.zero_point)
# Signed Symmetric: q_A.qvalues =  [ 0  1  1  0  0  1  0  3  3 -1]
# q_A.zero_point =  0

Machine learning models in the quantized realm

Machine learning models are implemented with a diverse set of operations, such as convolution, linear transformations, activation functions and element-wise operations. When working with quantized values, these operations cannot be carried out in the same way as for floating point values. With quantization, it is necessary to re-scale the input and output values of each operation to fit in the quantization domain.

Model inputs and outputs

The ML models implemented in Concrete-ML provide features to let the user quantize the input data and dequantize the output data.

Here is a simple example showing how to perform inference, starting from float values and ending up with float values. Note that the FHE engine that is compiled for the ML models does not support data batching.

# Assume quantized_module : QuantizedModule
#        data: numpy.ndarray of float

# Quantization is done in the clear
x_test_q = quantized_module.quantize_input(data)

for i in range(x_test_q.shape[0]):
    # Inputs must have size (1 x N) or (1 x C x H x W), we add the batch dimension with N=1
    x_q = np.expand_dims(x_test_q[i, :], 0)

    # Execute the model in FHE
    out_fhe = quantized_module.forward_fhe.encrypt_run_decrypt(x_q)

    # Dequantization is done in the clear
    output = quantized_module.dequantize_output(out_fhe)

    # For classifiers with multi-class outputs, the arg max is done in the clear
    y_pred = np.argmax(output, 1)

If we are to examine the operations done by quantize_input and dequantize_output, we will see usage of the QuantizedArray described above. When the ML model quantized_module is calibrated, the min and max of the value distributions will be recorded, and these are then applied to quantize/dequantize new data.

Here, a different usage of QuantizedArray is shown, where it is constructed from quantized integer values and the scale and zero-point are set explicitly from calibration parameters. Once the QuantizedArray is constructed, calling dequant() will compute the floating point values corresponding to the integer values qvalues, which are the output of the forward_fhe.encrypt_run_decrypt(..) call.

def dequantize_output(self, qvalues: numpy.ndarray) -> numpy.ndarray:
    # .....
        QuantizedArray(
                output_layer.n_bits,
                qvalues,
                value_is_float=False,
                scale=output_layer.output_scale,
                zero_point=output_layer.output_zero_point,
            ).dequant()
    # ....

Adding new quantized layers

Intermediary values computed during model inference might need to be re-scaled into the quantized domain of a subsequent model operator. For example, the output of a convolution layer in a neural network might have values that are 7 bits wide, but the next convolutional layer requires that its inputs are, at most, 2 bits wide. In the non-encrypted realm, this implies that we need to make use of floating point operations. In the FHE setting, where we only work with integers, this could be a problem, but, luckily, the FHE implementation behind Concrete-ML provides a solution. We essentially make use of a table lookup, which is later translated into a Programmable Bootstrap (PBS).

Of course, having a PBS for every quantized addition isn't recommended for computational cost reasons. Also, a PBS is currently only allowed for univariate operations (i.e. matrix multiplication can't be done in a PBS). Therefore, our quantized modules split the computation of floating point values and unsigned integers, as it is currently done in concrete.ml.quantization.QuantizedLinear. Moreover, the operations done by the activation function of a previous layer and additional re-scaling to the new quantized domain, which are all floating point operations, can be fused to a single TLU. Concrete-ML implements quantized operators that perform this fusion, significantly reducing the number of TLUs necessary to perform inference.

We can distinguish three types of operators:

Operators that perform linear combinations of encrypted and constant (clear) values. For example: matrix multiplication, convolution, addition
Operators that perform element-wise operations between two encrypted tensors. For example: addition
Element-wise, fixed-function operators which can be: addition with a constant, activation functions

In the first category, we will find operators such as Gemm, which will quantize their inputs. Notice that here we use the _prepare_inputs_with_constants helper function, with quantize_actual_values=True, to apply the quantization function to the input data. The quantization function operators using floating point and a non-linear function, round, will thus produce a TLU, together with any preceding floating point operations.

class QuantizedGemm(QuantizedOp):
    """Quantized Gemm op."""

 def q_impl(
        self,
        *q_inputs: QuantizedArray,
        **attrs,
    ) -> QuantizedArray:

        # ...

        prepared_inputs = self._prepare_inputs_with_constants(
            *q_inputs, calibrate=False, quantize_actual_values=True
        )

        q_input: QuantizedArray = prepared_inputs[0]
        q_weights: QuantizedArray = prepared_inputs[1]
        q_bias: Optional[QuantizedArray] = (
            None if len(prepared_inputs) == 2 or beta == 0 else prepared_inputs[2]
        )

For element-wise operations with a fixed function, we simply let Concrete-Numpy generate a TLU. To do so, we just need to give this function the corresponding NumPy implementation, which must be defined in ops_impl.py.

class QuantizedAbs(QuantizedOp):
    """Quantized Abs op."""

    _impl_for_op_named: str = "Abs"

Using ONNX as IR for FHE Compilation

It was decided to use ONNX as the intermediate format to convert various ML models (including torch nn.Module and various sklearn models, among others) to NumPy. The reason here is that converting/interpreting torchscript and other representations would require a lot of effort while ONNX has tools readily available to easily manipulate the model's representation in Python. Additionally, JAX had an example of a lightweight interpreter to run ONNX models as NumPy code.

Steps of the conversion and compilation of a torch model to NumPy via ONNX

In the diagram above, it is perfectly possible to stop at the NumpyModule level if you just want to run the torch model as NumPy code without doing quantization.

Note that if you keep the obtained NumpyModule without quantizing it with Post Training Quantization (PTQ), it is very likely that it won't be convertible to FHE since the Concrete stack requires operators to use integers for computations.

The NumpyModule stores the ONNX model that it interprets. The interpreter works by going through the ONNX graph (which, by specification, is sorted in topological order, allowing users to run through the graph without having to care for evaluation order) and storing the intermediate results as it goes. To execute a node, the interpreter feeds the required inputs - taken either from the model inputs or the intermediate results - to the NumPy implementation of each ONNX node.

Do note that the NumpyModule interpreter currently supports the following ONNX operators.

Initializers (ONNX's parameters) are quantized according to n_bits and passed to the Post Training Quantization (PTQ) process.

During the PTQ process, the ONNX model stored in the NumpyModule is interpreted and calibrated using the supported ONNX operators for PTQ.

Quantized operators are then used to create a QuantizedModule that, similarly to the NumpyModule, runs through the operators to perform the quantized inference with integers-only operations.

That QuantizedModule is then compilable to FHE if the intermediate values conform to the 7 bits precision limit of the Concrete stack.

How to use `QuantizedOp`

QuantizedOp is the base class for all ONNX quantized operators. It abstracts away a lot of things to allow easy implementation of new quantized ops.

Case: We already have a NumPy implementation of an ONNX operator.

You can check ops_impl.py to see how implementations are done in NumPy. The requirements are as follows:

The required inputs should be positional arguments only before the /, which marks the limit of the positional arguments
The optional inputs should be positional or keyword arguments between the / and *, which marks the limits of positional or keyword arguments
The operator attributes should be keyword arguments only after the *

The proper use of positional/keyword arguments is required to allow the QuantizedOp class to properly populate metadata automatically. It uses Python inspect modules and stores relevant information for each argument related to its positional/keyword status. This allows us to use our NumPy implementation as specifications for QuantizedOp, which removes some data duplication and allows us to have a single source of truth for QuantizedOp and ONNX NumPy implementations.

In that case (unless the quantized implementation requires special handling like QuantizedGemm), you can just set _impl_for_op_named to the name of the ONNX op for which the quantized class is implemented (this uses the mapping ONNX_OPS_TO_numpy_IMPL we have in onnx_utils.py to get the right implementation).

Case: We need an alternative implementation of an ONNX operator/We don't have such an implementation.

If you want to provide an alternative implementation, you can set _impl_for_op_named to the name of the operator (e.g. Exp) and you can set impl and/or q_impl to the functions that will do the alternative handling. QuantizedGemm is an example of such a case where quantized matrix multiplication requires proper handling of scales and zero points. The q_impl of that class reflects that.

Hummingbird Usage

Why Hummingbird?

Hummingbird contains an interesting feature for Concrete-ML: it converts many algorithms (see supported algorithms) to tensor computations using a specific backend (torch, torchscript, ONNX and TVM).

Concrete-ML allows the conversion of an ONNX inference to NumPy inference (note that NumPy is always our entry point to run models in FHE).

Usage

We use a simple functionnality of Hummingbird, which is the convert function that can be imported as follows from the hummingbird.ml package:

# Disable Hummingbird warnings for pytest.
import warnings
warnings.filterwarnings("ignore")
from hummingbird.ml import convert

This function can be used to convert a machine learning model to an ONNX as follows:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

# Instantiate the logistic regression from sklearn
lr = LogisticRegression()

# Create synthetic data
X, y = make_classification(
    n_samples=100, n_features=20, n_classes=2
)

# Fit the model
lr.fit(X, y)

# Convert the model to ONNX
onnx_model = convert(lr, backend="onnx", test_input=X).model

In theory, we can directly use this onnx_model within our get_equivalent_numpy_forward (as long as all operators present in the ONNX model are implemented in NumPy) and get the NumPy inference.

In practice, we have some steps to clean the ONNX and make the graph compatible with our framework such as:

applying quantization where needed
deleting non-FHE friendly ONNX operators, such as Softmax and ArgMax

Skorch Usage

We use skorch to implement multi-layer, fully-connected torch neural networks in Concrete-ML in a way that is compatible with the scikit-learn API.

This wrapper implements torch training boilerplate code, alleviating the work that needs to be done by the user. It is possible to add hooks during the training phase, for example once an epoch is finished.

Skorch allows the user to easily create a classifier or regressor around a neural network (NN), implemented in Torch as a nn.Module. We provide a simple, fully-connected, multi-layer NN with a configurable number of layers and optional pruning (see pruning).

The SparseQuantNeuralNetImpl class implements this neural network. Please see the documentation on this class in the API guide.

class SparseQuantNeuralNetImpl(nn.Module):
    """Sparse Quantized Neural Network classifier.

The constructor of this class takes some parameters that influence FHE compatibility:

n_w_bits (default 3): number of bits for weights
n_a_bits (default 3): number of bits for activations and inputs
n_accum_bits (default 7): maximum accumulator bit width to impose through pruning
n_hidden_neurons_multiplier (default 4): explained below

A linear or convolutional layer of an NN will compute a linear combination of weights and inputs (we also call this a 'multi-sum'). For example, a linear layer will compute:

$output^k = \sum_i^Nw_{i}^kx_i$

where $k$ is the k-th neuron in the layer. In this case, the sum is taken on a single dimension. A convolutional layer will compute:

$output_{xy}^{k} = \sum_c^{N}\sum_j^{K_h}\sum_i^{K_w}w_{cji}^kx_{c,y+j,x+i}^k$

where $k$ is the k-th filter of the convolutional layer and $N$ , $K_h$ , $K_w$ are the number of input channels, the kernel height and the kernel width, respectively.

Following the formulas for the resulting bit width of quantized linear combinations described here, notably the maximum dimensionality of the input and weights that can make the result exceed 7 bits:

$\Omega = \mathsf{floor} \left( \frac{2^{n_{\mathsf{max}}} - 1}{(2^{n_{\mathsf{weights}}} - 1)(2^{n_{\mathsf{inputs}}} - 1)} \right)$

where $n_{\mathsf{max}} = 7$ is the maximum precision allowed.

For example, we set $n_{\mathsf{weights}} = 2$ and $n_{\mathsf{inputs}} = 2$ with $n_{\mathsf{max}} = 7$ . The worst case is a scenario where all inputs and weights are equal to their maximal value $2^2-1=3$ . The formula above tells us that, in this case, we can afford at most $\Omega = 14$ elements in the multi-sums detailed above.

In a practical setting, the distribution of the weights of a neural network is Gaussian. Thus, there will be weights that are equal to 0 and many weights will have small values. In a typical scenario, we can exceed the worst-case number of active neurons. The parameter n_hidden_neurons_multiplier is a factor that is multiplied with $\Omega$ to determine the total number of non-zero weights that should be kept in a neuron.

The pruning mechanism is already implemented in SparseQuantNeuralNetImpl, and the user only needs to determine the parameters listed above. They can choose them in a way that is convenient, e.g. maximizing accuracy.

The skorch wrapper requires that all the parameters that will be passed to the wrapped nn.Module be prefixed with module__. For example, the code to create an FHE-compatible Concrete-ML fully-connected NN classifier for a dataset with 10 input dimensions and two classes, will thus be:

n_inputs = 10
n_outputs = 2
params = {
    "module__n_layers": 2,
    "module__n_w_bits": 2,
    "module__n_a_bits": 2,
    "module__n_accum_bits": 7,
    "module__n_hidden_neurons_multiplier": 1,
    "module__n_outputs": n_outputs,
    "module__input_dim": n_inputs,
    "module__activation_function": nn.ReLU,
    "max_epochs": 10,
}

concrete_classifier = NeuralNetClassifier(**params)

We could then increase n_hidden_neurons_multiplier to improve performance, taking care to verify that the compiled NN does not exceed 7 bits of accumulator bit width.

A similar example is given in the classifier comparison notebook.