Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This section lists several demos that apply Concrete-ML to some popular machine learning problems. They show how to build ML models that perform well under FHE constraints, and then how to perform the conversion to FHE.
Simpler tutorials that discuss only model usage and compilation are also available for the built-in models and for deep learning.
Concrete-ML provides several of the most popular linear models for regression
and classification
that can be found in Scikit-learn:
Using these models in FHE is extremely similar to what can be done with scikit-learn's API, making it easy for data scientists who are used to this framework to get started with Concrete-ML.
Models are also compatible with some of scikit-learn's main workflows, such as Pipeline()
and GridSearch()
.
The n_bits
parameter controls the bit-width of the inputs and weights of the linear models. When non-linear mapping is applied by the model, such as exp or sigmoid, currently Concrete-ML applies it on the client-side, on clear-text values that are the decrypted output of the linear part of the model. Thus, Linear Models do not use table lookups, and can, therefore, use high precision integers for weight and inputs. The n_bits
parameter can be set to 8
or more bits for models with up to 300 input dimensions. When the input has more dimensions, n_bits
must be reduced to 6-7
. Accuracy and R2 scores are preserved down to n_bits=6
, compared to the non-quantized float models from scikit-learn.
Here is an example below of how to use a LogisticRegression model in FHE on a simple data set for classification. A more complete example can be found in the LogisticRegression notebook.
We can then plot the decision boundary of the classifier and compare those results with a scikit-learn model executed in clear. The complete code can be found in the LogisticRegression notebook.
The overall accuracy scores are identical (93%) between the scikit-learn model (executed in the clear) and the Concrete-ML one (executed in FHE). In fact, quantization has little impact on the decision boundaries, as linear models are able to consider large precision numbers when quantizing inputs and weights in Concrete-ML. Additionally, as the linear models do not use PBS, the FHE computations are always exact, meaning the FHE predictions are always identical to the quantized clear ones.
Concrete-ML models can be easily deployed in a client/server setting, enabling the creation of privacy-preserving services in the cloud.
As seen in the concepts section, a Concrete-ML model, once compiled to FHE, generates machine code that performs the inference on private data. Furthermore, secret encryption keys are needed so that the user can securely encrypt their data and decrypt the inference result. An evaluation key is also needed for the server to securely process the user's encrypted data.
Keys are generated by the user once for each service they use, based on the model the service provides and its cryptographic parameters.
The overall communications protocol to enable cloud deployment of machine learning services can be summarized in the following diagram:
The steps detailed above are as follows:
The model developer deploys the compiled machine learning model to the server. This model includes the cryptographic parameters. The server is now ready to provide private inference.
The client requests the cryptographic parameters (also called "client specs"). Once it receives them from the server, the secret and evaluation keys are generated.
The client sends the evaluation key to the server. The server is now ready to accept requests from this client. The client sends their encrypted data.
The server uses the evaluation key to securely run inference on the user's data and sends back the encrypted result.
The client now decrypts the result and can send back new requests.
For more information on how to implement this basic secure inference protocol, refer to the Production Deployment section and to the client/server example.
Please note that not all hardware/OS combinations are supported. Determine your platform, OS version, and Python version before referencing the table below.
Depending on your OS, Concrete-ML may be installed with Docker or with pip:
Also, only some versions of python
are supported: in the current release, these are 3.7
(Linux only), 3.8
, and 3.9
. Moreover, the Concrete-ML Python package requires glibc >= 2.28
. On Linux, you can check your glibc
version by running ldd --version
.
Most of these limits are shared with the rest of the Concrete stack (namely Concrete-Numpy and Concrete-Compiler). Support for more platforms will be added in the future.
Installing on Windows can be done using Docker or WSL. On WSL, Concrete-ML will work as long as the package is not installed in the /mnt/c/ directory, which corresponds to the host OS filesystem.
To install Concrete-ML from PyPi, run the following:
This will automatically install all dependencies, notably Concrete-Numpy.
Concrete-ML can be installed using Docker by either pulling the latest image or a specific version:
The image can then be used via the following command:
This will launch a Concrete-ML enabled Jupyter server in Docker that can be accessed directly from a browser.
Alternatively, a shell can be lauched in Docker, with or without volumes:
The following example considers a LogisticRegression
model on a simple classification problem. A more advanced example can be found in the , which considers a XGBClassifier
.
Concrete-ML can be installed on Kaggle (), but not on Google Colab ().
Installing Concrete-ML using PyPi requires a Linux-based OS or macOS running on an x86 CPU. For Apple Silicon, Docker is the only currently supported option (see ).
The image can be used with Docker volumes, .
fit
✓
compile
✗
predict (execute_in_fhe=False)
✓
predict (execute_in_fhe=True)
✓
Linux
Yes
Yes
Windows
Yes
Not currently
Windows Subsystem for Linux
Yes
Yes
macOS (Intel)
Yes
Yes
macOS (Apple Silicon, ie M1, M2 etc)
Yes
Not currently
In addition to the built-in models, Concrete-ML supports generic machine learning models implemented with Torch, or exported as ONNX graphs.
As Quantization Aware Training (QAT) is the most appropriate method of training neural networks that are compatible with FHE constraints, Concrete-ML works with Brevitas, a library providing QAT support for PyTorch.
The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.
Once the model is trained, calling the compile_brevitas_qat_model
from Concrete-ML will automatically perform conversion and compilation of a QAT network. Here, 3-bit quantization is used for both the weights and activations.
The model can now be used to perform encrypted inference. Next, the test data is quantized:
and the encrypted inference can be run using either:
quantized_numpy_module.forward_and_dequant()
to compute predictions in the clear on quantized data, and then de-quantize the result. The return value of this function contains the dequantized (float) output of running the model in the clear. Calling the forward function on the clear data is useful when debugging. The results in FHE will be the same as those on clear quantized data.
quantized_numpy_module.forward_fhe.encrypt_run_decrypt()
to perform the FHE inference. In this case, de-quantization is done in a second stage using quantized_numpy_module.dequantize_output()
.
While the example above shows how to import a Brevitas/PyTorch model, Concrete-ML also provides an option to import generic QAT models implemented either in PyTorch or through ONNX. Interestingly, deep learning models made with TensorFlow or Keras should be usable, by preliminary converting them to ONNX.
QAT models contain quantizers in the PyTorch graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized.
Suppose that n_bits_qat
is the bit-width of activations and weights during the QAT process. To import a PyTorch QAT network, you can use the compile_torch_model
library function, passing import_qat=True
:
Alternatively, if you want to import an ONNX model directly, please see the ONNX guide. The compile_onnx_model
also supports the import_qat
parameter.
When importing QAT models using this generic pipeline, a representative calibration set should be given as quantization parameters in the model need to be inferred from the statistics of the values encountered during inference.
Concrete-ML supports a variety of PyTorch operators that can be used to build fully connected or convolutional neural networks, with normalization and activation layers. Moreover, many element-wise operators are supported.
Please note that Concrete-ML supports these operators but also the QAT equivalents from Brevitas.
brevitas.nn.QuantLinear
brevitas.nn.QuantConv2d
brevitas.nn.QuantIdentity
torch.nn.Threshold
-- partial support
Note that the equivalent versions from torch.functional
are also supported.
This example shows the typical flow of a Concrete-ML model:
The model is trained on unencrypted (plaintext) data using scikit-learn. As FHE operates over integers, Concrete-ML quantizes the model to use only integers during inference.
The quantized model is compiled to a FHE equivalent. Under the hood, the model is first converted to a Concrete-Numpy program, then compiled.
To make a model work with FHE, the only constraint is to make it run within the supported precision limitations of Concrete-ML (currently 16-bit integers). Thus, machine learning models are required to be quantized, which sometimes leads to a loss of accuracy versus the original model, which operates on plaintext.
Additionally, Concrete-ML currently only supports FHE inference. On the other hand, training has to be done on unencrypted data, producing a model which is then converted to a FHE equivalent that can perform encrypted inference, i.e. prediction over encrypted data.
Finally, in Concrete-ML there is currently no support for pre-processing model inputs and post-processing model outputs. These processing stages may involve text-to-numerical feature transformation, dimensionality reduction, KNN or clustering, featurization, normalization, and the mixing of results of ensemble models.
All of these issues are currently being addressed and significant improvements are expected to be released in the coming months.
If you have built awesome projects using Concrete-ML, feel free to let us know and we'll link to your work!
ONNX models can be compiled by directly importing models that are already quantized with Quantization Aware Training (QAT) or by performing Post-Training Quantization (PTQ) with Concrete-ML.
The following example shows how to compile an ONNX model using PTQ. The model was initially trained using Keras before being exported to ONNX. The training code is not shown here.
While Keras was used in this example, it is not officially supported as additional work is needed to test all of Keras' types of layer and models.
QAT models contain quantizers in the ONNX graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized. Since these QAT models have quantizers that are configured during training to a specific number of bits, the ONNX graph will need to be imported using the same settings:
The following operators are supported for evaluation and conversion to an equivalent FHE circuit. Other operators were not implemented, either due to FHE constraints or because they are rarely used in PyTorch activations or scikit-learn models.
Abs
Acos
Acosh
Add
Asin
Asinh
Atan
Atanh
AveragePool
BatchNormalization
Cast
Celu
Clip
Concat
Constant
Conv
Cos
Cosh
Div
Elu
Equal
Erf
Exp
Flatten
Floor
Gemm
Greater
GreaterOrEqual
HardSigmoid
HardSwish
Identity
LeakyRelu
Less
LessOrEqual
Log
MatMul
Max
MaxPool
Min
Mul
Neg
Not
Or
PRelu
Pad
Pow
ReduceSum
Relu
Reshape
Round
Selu
Sigmoid
Sign
Sin
Sinh
Softplus
Sub
Tan
Tanh
ThresholdedRelu
Transpose
Unsqueeze
Where
onnx.brevitas.Quant
This graph above shows that, when using a sufficiently high bit-width, quantization has little impact on the decision boundaries of the Concrete-ML FHE decision tree models. As the quantization is done individually on each input feature, the impact of quantization is strongly reduced, and, thus, FHE tree-based models reach similar accuracy as their floating point equivalents. Using 6 bits for quantization makes the Concrete-ML model reach or exceed the floating point accuracy. The number of bits for quantization can be adjusted through the n_bits
parameter.
When n_bits
is set low, the quantization process may sometimes create some artifacts that could lead to a decrease in performance, but the execution speed in FHE decreases. In this way, it is possible to adjust the accuracy/speed trade-off, and some accuracy can be recovered by increasing the n_estimators
.
The following graph shows that using 5-6 bits of quantization is usually sufficient to reach the performance of a non-quantized XGBoost model on floating point data. The metrics plotted are accuracy and F1-score on the spambase
data-set.
Some examples constrain accumulators to 7-8 bits, which can be sufficient for simple data-sets. Up to 16-bit accumulators can be used, but this introduces a slowdown of 4-5x compared to 8-bit accumulators.
Shows how to use Quantization Aware Training and pruning when starting out from a classical PyTorch network. This example uses a simple data-set and a small NN, which achieves good accuracy with low accumulator size.
This guide provides a complete example of converting a PyTorch neural network into its FHE-friendly, quantized counterpart. It focuses on Quantization Aware Training a simple network on a synthetic data-set.
In general, quantization can be carried out in two different ways: either during training with Quantization Aware Training (QAT) or after the training phase with Post-Training Quantization (PTQ).
In PyTorch, using standard layers, a fully connected neural network would look as follows:
This shows that the fp32 accuracy and accumulator size increases with the number of hidden neurons, while the 3-bit accuracy remains low irrespective of the number of neurons. While all the configurations tried here were FHE-compatible (accumulator < 16 bits), it is often preferable to have a lower accumulator size in order to speed up the inference time.
The accumulator size is determined by Concrete-Numpy as being the maximum bit-width encountered anywhere in the encrypted circuit.
Brevitas provides a quantized version of almost all PyTorch layers (Linear
layer becomes QuantLinear
, ReLU
layer becomes QuantReLU
and so one), plus some extra quantization parameters, such as :
bit_width
: precision quantization bits for activations
act_quant
: quantization protocol for the activations
weight_bit_width
: precision quantization bits for weights
weight_quant
: quantization protocol for the weights
In order to use FHE, the network must be quantized from end to end, and thanks to the Brevitas's QuantIdentity
layer, it is possible to quantize the input by placing it at the entry point of the network. Moreover, it is also possible to combine PyTorch and Brevitas layers, provided that a QuantIdentity
is placed after this PyTorch layer. The following table gives the replacements to be made to convert a PyTorch NN for Concrete-ML compatibility.
Furthermore, some PyTorch operators (from the PyTorch functional API), require a brevitas.quant.QuantIdentity
to be applied on their inputs.
The QAT import tool in Concrete-ML is a work in progress. While it has been tested with some networks built with Brevitas, it is possible to use other tools to obtain QAT networks.
For instance, with Brevitas, the network above becomes :
Note that in the network above, biases are used for linear layers but are not quantized ("bias": True, "bias_quant": None
). The addition of the bias is an univariate operation and is fused into the activation function.
Training this network with pruning (see below) with 30 out of 100 total non-zero neurons gives good accuracy while keeping the accumulator size low.
The PyTorch QAT training loop is the same as the standard floating point training loop, but hyper-parameters such as learning rate might need to be adjusted.
Quantization Aware Training is somewhat slower than normal training. QAT introduces quantization during both the forward and backward passes. The quantization process is inefficient on GPUs as its computational intensity is low with respect to data transfer time.
Considering that FHE only works with limited integer precision, there is a risk of overflowing in the accumulator, which will make Concrete-ML raise an error.
The following code shows how to use pruning in the previous example:
Results with PrunedQuantNet
, a pruned version of the QuantSimpleNet
with 100 neurons on the hidden layers, are given below, showing a mean accumulator size measured over 10 runs of the experiment:
This shows that the fp32 accuracy has been improved while maintaining constant mean accumulator size.
When pruning a larger neural network during training, it is easier to obtain a low bit-width accumulator while maintaining better final accuracy. Thus, pruning is more robust than training a similar, smaller network.
| |
Concrete-ML is an open-source, privacy-preserving, machine learning inference framework based on fully homomorphic encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from Scikit-learn and PyTorch (see how it looks for , , and ).
Fully Homomorphic Encryption (FHE) is an encryption technique that allows computing directly on encrypted data, without needing to decrypt it. With FHE, you can build private-by-design applications without compromising on features. You can learn more about FHE in or by joining the community.
Here is a simple example of classification on encrypted data using logistic regression. More examples can be found .
Inference can then be done on encrypted data. The above example shows encrypted inference in the model-development phase. Alternatively, during in a client/server setting, the data is encrypted by the client, processed securely by the server, and then decrypted by the client.
Concrete-ML is built on top of Zama's Concrete framework. It uses , which itself uses the and the . To use these libraries directly, refer to the and documentations.
Various tutorials are available for the and for . In addition, several standalone demos for use-cases can be found in the section.
Support forum: (we answer in less than 24 hours).
Live discussion on the FHE.org Discord server: (inside the #concrete channel).
Do you have a question about Zama? You can write us on or send us an email at: hello@zama.ai
In addition to Concrete-ML models and , it is also possible to directly compile models. This can be particularly appealing, notably to import models trained with Keras.
This example uses Post-Training Quantization, i.e. the quantization is not performed during training. Thus, this model would not have good performance in FHE. Quantization Aware Training should be added by the model developer. Additionally, importing QAT ONNX models can be done .
Concrete-ML provides several of the most popular classification
and regression
tree models that can be found in :
In addition to support for scikit-learn, Concrete-ML also supports 's XGBClassifier
:
Here's an example of how to use this model in FHE on a popular data-set using some of scikit-learn's pre-processing tools. A more complete example can be found in the .
In a similar example, the decision boundaries of the Concrete-ML model can be plotted, and, then, compared to the results of the classical XGBoost model executed in the clear. A 6-bit model is shown in order to illustrate the impact of quantization on classification. Similar plots can be found in the .
These examples illustrate the basic usage of Concrete-ML to build various types of neural networks. They use simple data-sets, focusing on the syntax and usage of Concrete-ML. For examples showing how to train high-accuracy models on more complex data-sets, see the section.
The examples listed here make use of simulation (using the ) to perform evaluation over large test sets. Since FHE execution can be slow, only a few FHE executions can be performed. The of Concrete-ML ensure that accuracy measured with simulation is the same that will be obtained during FHE execution.
Following the , this notebook implements a Quantization Aware Training convolutional neural network on the MNIST data-set. It uses 3-bit weights and activations, giving a 7-bit accumulator.
Regarding FHE-friendly neural networks, QAT is the best way to reach optimal accuracy under . This technique allows weights and activations to be reduced to very low bit-widths (e.g. 2-3 bits), which, combined with pruning, can keep accumulator bit-widths low.
Concrete-ML uses the third party library to perform QAT for PyTorch NNs, but options exist for other frameworks such as Keras/Tensorflow.
Several that use Brevitas are available in Concrete-ML library, such as the .
This guide is based on a , from which some code blocks are documented here.
For a more formal description of the usage of Brevitas to build FHE-compatible neural networks, please see the .
The , example shows how to train a fully-connected neural network, similar to the one above, on a synthetic 2D data-set with a checkerboard grid pattern of 100 x 100 points. The data is split into 9500 training and 500 test samples.
Once trained, this PyTorch network can be imported using the function. This function uses simple Post-Training Quantization.
The network was trained using different numbers of neurons in the hidden layers, and quantized using 3-bits weights and activations. The mean accumulator size shown below was extracted using the and is measured as the mean over 10 runs of the experiment. An accumulator of 6.6 means that 4 times out of 10 the accumulator measured was 6 bits while 6 times it was 7 bits.
using is the best way to guarantee a good accuracy for Concrete-ML compatible neural networks.
To understand how to overcome this limitation, consider a scenario where 2 bits are used for weights and layer inputs/outputs. The Linear
layer computes a dot product between weights and inputs . With 2 bits, no overflow can occur during the computation of the Linear
layer as long the number of neurons does not exceed 14, i.e. the sum of 14 products of 2-bit numbers does not exceed 7 bits.
By default, Concrete-ML uses symmetric quantization for model weights, with values in the interval . For example, for the possible values are , for the values can be .
However, in a typical setting, the weights will not all have the maximum or minimum values (e.g. ). Instead, weights typically have a normal distribution around 0, which is one of the motivating factors for their symmetric quantization. A symmetric distribution and many zero-valued weights are desirable because opposite sign weights can cancel each other out and zero weights do not increase the accumulator size.
This fact can be leveraged to train a network with more neurons, while not overflowing the accumulator, using a technique called , where the developer can impose a number of zero-valued weights. Torch out of the box.