Concrete ML
WebsiteLibrariesProducts & ServicesDevelopersSupport
0.3
0.3
  • What is Concrete ML?
  • Getting Started
    • Installation
    • Key Concepts
  • Built-in Models
    • Linear Models
    • Tree-based Models
    • Neural Networks
    • Examples
  • Deep Learning
    • Using Torch
    • Using ONNX
    • Examples
    • Debugging Models
  • Advanced topics
    • Quantization
    • Pruning
    • Production Deployment
    • Compilation
    • More about ONNX
    • FHE Op-graphs
    • Using Hummingbird
    • Using Skorch
  • Developer Guide
    • Set Up the Project
    • Set Up Docker
    • Documentation
    • Support and Issues
    • Contributing
    • API
Powered by GitBook

Libraries

  • TFHE-rs
  • Concrete
  • Concrete ML
  • fhEVM

Developers

  • Blog
  • Documentation
  • Github
  • FHE resources

Company

  • About
  • Introduction to FHE
  • Media
  • Careers
On this page
  • Torch to NumPy conversion using ONNX
  • Calibration
  • Quantization
  • Inspecting the ONNX models

Was this helpful?

Export as PDF
  1. Advanced topics

More about ONNX

PreviousCompilationNextFHE Op-graphs

Last updated 2 years ago

Was this helpful?

Internally, Concrete-ML uses operators as intermediate representation (or IR) for manipulating machine learning models produced through export for , and .

As ONNX is becoming the standard exchange format for neural networks, this allows Concrete-ML to be flexible while also making model representation manipulation quite easy. In addition, it allows for straight-forward mapping to NumPy operators, supported by Concrete-Numpy to use the Concrete stack FHE conversion capabilities.

Torch to NumPy conversion using ONNX

The diagram below gives an overview of the steps involved in the conversion of an ONNX graph to a FHE compatible format, i.e. a format that can be compiled to FHE through Concrete-Numpy.

All Concrete-ML builtin models follow the same pattern for FHE conversion:

  1. The models are trained with sklearn or torch

  2. All models have a torch implementation for inference. This implementation is provided either by third-party tool such as , or is implemented in Concrete-ML.

  3. The torch model is exported to ONNX. For more information on the use of ONNX in Concrete-ML see

  4. The Concrete-ML ONNX parser checks that all the operations in the ONNX graph are supported and assigns reference numpy operations to them. This step produces a NumpyModule

  5. Quantization is performed on the , producing a . Two steps are performed: calibration and assignment of equivalent objects to each ONNX operation. The QuantizedModule class is the quantized counterpart of the NumpyModule.

  6. Once the QuantizedModule is built, Concrete-Numpy is used to trace the ._forward() function of the QuantizedModule

Moreover, by passing a user provided nn.Module to step 2 of the above process, Concrete-ML supports custom user models. See the associated for instructions about working with such models.

Once an ONNX model is imported, it is converted to a NumpyModule, then to a QuantizedModule and, finally, to an FHE circuit. However, as the diagram shows, it is perfectly possible to stop at the NumpyModule level if you just want to run the torch model as NumPy code without doing quantization.

Note that if you keep the obtained NumpyModule without quantizing it with Post Training Quantization (PTQ), it will not be convertible to FHE since the Concrete stack requires operators to use integers for computations.

Calibration

Calibration is the process of executing the NumpyModule with a representative set of data, in floating point. It allows to compute statistics for all the intermediate tensors used in the network to determine quantization parameters.

Quantization

Quantization is the process of converting floating point weights, inputs and activations to integer, according to the quantization parameters computed during Calibration.

Initializers (model trained parameters) are quantized according to n_bits and passed to the Post Training Quantization (PTQ) process.

Quantized operators are then used to create a QuantizedModule that, similarly to the NumpyModule, runs through the operators to perform the quantized inference with integers-only operations.

That QuantizedModule is then compilable to FHE if the intermediate values conform to the 8 bits precision limit of the Concrete stack.

Inspecting the ONNX models

import onnx
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

from concrete.ml.sklearn import LogisticRegression

# Create the data for classification
x, y = make_classification(n_samples=100, class_sep=2, n_features=4, random_state=42)

# Retrieve train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=10, random_state=42
)

# Fix the number of bits to used for quantization
model = LogisticRegression(n_bits=2)

# Fit the model
model.fit(X_train, y_train)

# Access to the model
onnx_model = model.onnx_model

# Print the model
print(onnx.helper.printable_graph(onnx_model.graph))

# Save the model
onnx.save(onnx_model, "tmp.onnx")

# And then visualize it with Netron

The NumpyModule stores the ONNX model that it interprets. The interpreter works by going through the ONNX graph in , and storing the intermediate results as it goes. To execute a node, the interpreter feeds the required inputs - taken either from the model inputs or the intermediate results - to the NumPy implementation of each ONNX node.

Note that the NumpyModule interpreter currently .

During the PTQ process, the ONNX model stored in the NumpyModule is interpreted and calibrated using ONNX_OPS_TO_QUANTIZED_IMPL dictionary, which maps ONNX operators (e.g. Gemm) to their quantized equivalent (e.g. QuantizedGemm). For more information on implementing these operations, please see the .

In order to better understand how Concrete-ML works under the hood, it is possible to access each model in their ONNX format and then either either print it or visualize it by importing the associated file in . For example, with LogisticRegression:

topological order
FHE compatible op-graph section
Netron
ONNX
PyTorch
Hummingbird
skorch
hummingbird
NumpyModule
QuantizedModule
QuantizedOp
FHE-friendly model documentation
here
supports the following ONNX operators
Torch compilation flow with ONNX