In addition to the built-in models, Concrete-ML supports generic machine learning models implemented with Torch, or exported as ONNX graphs.
As Quantization Aware Training (QAT) is the most appropriate method of training neural networks that are compatible with FHE constraints, Concrete-ML works with Brevitas, a library providing QAT support for PyTorch.
The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.
Once the model is trained, calling the compile_brevitas_qat_model
from Concrete-ML will automatically perform conversion and compilation of a QAT network. Here, 3-bit quantization is used for both the weights and activations.
The model can now be used to perform encrypted inference. Next, the test data is quantized:
and the encrypted inference run using either:
quantized_numpy_module.forward_and_dequant()
to compute predictions in the clear, on quantized data and then de-quantize the result. The return value of this function contains the dequantized (float) output of running the model in the clear. Calling the forward function on the clear data is useful when debugging. The results in FHE will be the same as those on clear quantized data.
quantized_numpy_module.forward_fhe.encrypt_run_decrypt()
to perform the FHE inference. In this case, dequantization is done in a second stage using quantized_numpy_module.dequantize_output()
.
While the example above shows how to import a Brevitas/PyTorch model, Concrete-ML also provides an option to import generic QAT models implemented either in PyTorch or through ONNX. Interestingly, deep learning models made with TensorFlow or Keras should be usable, by preliminary converting them to ONNX.
QAT models contain quantizers in the PyTorch graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized.
Suppose that n_bits_qat
is the bit-width of activations and weights during the QAT process. To import a PyTorch QAT network, you can use the compile_torch_model
library function, passing import_qat=True
:
Alternatively, if you want to import an ONNX model directly, please see the ONNX guide. The compile_onnx_model
also supports the import_qat
parameter.
When importing QAT models using this generic pipeline, a representative calibration set should be given as quantization parameters in the model need to be inferred from the statistics of the values encountered during inference.
Concrete-ML supports a variety of PyTorch operators that can be used to build fully connected or convolutional neural networks, with normalization and activation layers. Moreover, many element-wise operators are supported.
Please note that Concrete-ML supports these operators but also the Quantization Aware Training equivalents from Brevitas.
brevitas.nn.QuantLinear
brevitas.nn.QuantConv2d
brevitas.nn.QuantIdentity
torch.nn.Threshold
-- partial support
Note that the equivalent versions from torch.functional
are also supported.