1 of 1

Examples

This section includes a complete example of a neural network in Torch, as well as links to additional examples.

Post-training quantization

In this example, we will train a fully-connected neural network on a synthetic 2D dataset with a checkerboard grid pattern of 100 x 100 points. The data is split into 9500 training and 500 test samples.

This shows that the fp32 accuracy and accumulator size increases with the number of hidden neurons, while the 3-bit accuracy remains low irrespective of to the number of neurons. While all the configurations tried here were FHE compatible (accumulator < 8 bits), it is sometimes preferable to have lower accumulator size in order for the inference time to be faster.

The accumulator size is determined by Concrete Numpy as being the maximum bitwidth encountered anywhere in the encrypted circuit

Pruning using Torch

Considering that FHE only works with limited integer precision, there is a risk of overflowing in the accumulator, resulting in unpredictable results.

The following code shows how to use pruning in our previous example:

Results with PrunedSimpleNet, a pruned version of the SimpleNet with 100 neurons on the hidden layers are given below:

This shows that the fp32 accuracy has been improved while maintaining constant mean accumulator size.

When pruning a larger neural network during training, it is easier to obtain a low a bitwidth accumulator while maintaining better final accuracy. Thus, pruning is more robust than training a similar smaller network.

Quantization-aware training (QAT)

The quantization-aware training (QAT) import tool in Concrete-ML is a work in progress. While it has been tested with some networks built with Brevitas, it is possible to use other tools to obtain QAT networks.

Training this network with 30 non-zero neurons out of 100 total gives good accuracy while being FHE compatible (accumulator size < 8 bits).

The torch QAT training loop is the same as the standard floating point training loop, but hyperparameters such as learning rate might need to be adjusted.

Quantization Aware Training is somewhat slower thant normal training. QAT introduces quantization during both the forward and backward passes. The quantization process is inefficient on GPUs as its computational intensity is low with respect to data transfer time.

Additional examples

The following table summarizes the examples in this section.

In this table, ** means that the accuracy is actually random-like, because the quantization we need to set to fullfil bitwidth constraints is too strong.