This document explains how to compile ONNX models in Concrete ML. This is particularly useful for importing models trained with Keras.
You can compile ONNX models by directly importing models that are already quantized with Quantization Aware Training (QAT) or by performing Post Training Quantization (PTQ) with Concrete ML.
The following example shows how to compile an ONNX model using PTQ. The model was initially trained using Keras before being exported to ONNX. The training code is not shown here.
This example uses PTQ, meaning that the quantization is not performed during training. This model does not have the optimal performance in FHE.
To improve performance in FHE, you should add QAT. Additionally, you can also import QAT ONNX models as shown below.
While a Keras ONNX model was used in this example, Keras/Tensorflow support in Concrete ML is only partial and experimental.
Models trained using QAT contain quantizers in the ONNX graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized. Since these QAT models have quantizers configured to a specific number of bits during training, you must import the ONNX graph using the same settings:
Concrete ML supports the following operators for evaluation and conversion to an equivalent FHE circuit. Other operators were not implemented either due to FHE constraints or because they are rarely used in PyTorch activations or scikit-learn models.
Abs
Acos
Acosh
Add
Asin
Asinh
Atan
Atanh
AveragePool
BatchNormalization
Cast
Celu
Clip
Concat
Constant
ConstantOfShape
Conv
Cos
Cosh
Div
Elu
Equal
Erf
Exp
Expand
Flatten
Floor
Gather
Gemm
Greater
GreaterOrEqual
HardSigmoid
HardSwish
Identity
LeakyRelu
Less
LessOrEqual
Log
MatMul
Max
MaxPool
Min
Mul
Neg
Not
OneHot
Or
PRelu
Pad
Pow
ReduceSum
Relu
Reshape
Round
Selu
Shape
Sigmoid
Sign
Sin
Sinh
Slice
Softplus
Squeeze
Sub
Tan
Tanh
ThresholdedRelu
Transpose
Unfold
Unsqueeze
Where
onnx.brevitas.Quant