# Using Torch

In addition to the built-in models, Concrete ML supports generic machine learning models implemented with Torch, or exported as ONNX graphs.

There are two approaches to build FHE-compatible deep networks:

Quantization Aware Training (QAT) requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with Brevitas, a library providing QAT support for PyTorch. To use this mode, compile models using

`compile_brevitas_qat_model`

**Post-training Quantization**: This mode allows a vanilla PyTorch model to be compiled. However, when quantizing weights & activations to fewer than 7 bits, the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with`compile_torch_model`

.

Both approaches require the `rounding_threshold_bits`

parameter to be set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`

. See here for more details.

**See the ****common compilation errors page**** for an explanation of some error messages that the compilation function may raise.**

## Quantization-aware training

The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy. To use QAT, Brevitas `QuantIdentity`

nodes must be inserted in the PyTorch model, including one that quantizes the input of the `forward`

function.

Once the model is trained, calling the `compile_brevitas_qat_model`

from Concrete ML will automatically perform conversion and compilation of a QAT network. Here, 3-bit quantization is used for both the weights and activations. The `compile_brevitas_qat_model`

function automatically identifies the number of quantization bits used in the Brevitas model.

If `QuantIdentity`

layers are missing for any input or intermediate value, the compile function will raise an error. See the common compilation errors page for an explanation.

## Post-training quantization

The following example uses a simple PyTorch model that implements a fully connected neural network with two hidden layers. The model is compiled to use FHE using `compile_torch_model`

.

## Configuring quantization parameters

With QAT (the PyTorch/Brevitas models created following the example above), you need to configure quantization parameters such as `bit_width`

(activation bit-width) and `weight_bit_width`

. When using this mode, set `n_bits=None`

in the `compile_brevitas_qat_model`

.

With PTQ, you need to set the `n_bits`

value in the `compile_torch_model`

function and must manually determine the trade-off between accuracy, FHE compatibility, and latency.

The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.

## Running encrypted inference

The model can now perform encrypted inference.

In this example, the input values `x_test`

and the predicted values `y_pred`

are floating points. The quantization (resp. de-quantization) step is done in the clear within the `forward`

method, before (resp. after) any FHE computations.

## Simulated FHE Inference in the clear

You can perform the inference on clear data in order to evaluate the impact of quantization and of FHE computation on the accuracy of their model. See this section for more details. Two approaches exist:

`quantized_module.forward(quantized_x, fhe="simulate")`

: simulates FHE execution taking into account Table Lookup errors. De-quantization must be done in a second step as for actual FHE execution. Simulation takes into account the`p_error`

/`global_p_error`

parameters`quantized_module.forward(quantized_x, fhe="disable")`

: computes predictions in the clear on quantized data, and then de-quantize the result. The return value of this function contains the de-quantized (float) output of running the model in the clear. Calling this function on clear data is useful when debugging, but this does not perform actual FHE simulation.

FHE simulation allows to measure the impact of the Table Lookup error on the model accuracy. The Table Lookup error can be adjusted using `p_error`

/`global_p_error`

, as described in the approximate computation section.

## Supported operators and activations

Concrete ML supports a variety of PyTorch operators that can be used to build fully connected or convolutional neural networks, with normalization and activation layers. Moreover, many element-wise operators are supported.

### Operators

#### Univariate operators

#### Shape modifying operators

#### Tensor operators

`torch.Tensor.to`

-- for casting to dtype

#### Multi-variate operators: encrypted input and unencrypted constants

Concrete ML also supports some of their QAT equivalents from Brevitas.

`brevitas.nn.QuantLinear`

`brevitas.nn.QuantConv1d`

`brevitas.nn.QuantConv2d`

#### Multi-variate operators: encrypted+unencrypted or encrypted+encrypted inputs

### Quantizers

`brevitas.nn.QuantIdentity`

### Activation functions

`torch.nn.Threshold`

-- partial support

The equivalent versions from `torch.functional`

are also supported.

**Zama 5-Question Developer Survey**

We want to hear from you! Take 1 minute to share your thoughts and helping us enhance our documentation and libraries. **👉** **Click here** to participate.

Last updated