Concrete ML
WebsiteLibrariesProducts & ServicesDevelopersSupport
1.2
1.2
  • What is Concrete ML?
  • Getting Started
    • Installation
    • Key Concepts
    • Inference in the Cloud
    • Demos and Tutorials
  • Built-in Models
    • Linear Models
    • Tree-based Models
    • Neural Networks
    • Nearest Neighbors
    • Pandas
    • Built-in Model Examples
  • Deep Learning
    • Using Torch
    • Using ONNX
    • Step-by-step Guide
    • Deep Learning Examples
    • Debugging Models
    • Optimizing Inference
  • Deployment
    • Prediction with FHE
    • Hybrid models
    • Production Deployment
    • Serialization
  • Advanced topics
    • Quantization
    • Pruning
    • Compilation
    • Advanced Features
  • Developer Guide
    • Workflow
      • Set Up the Project
      • Set Up Docker
      • Documentation
      • Support and Issues
      • Contributing
    • Inner Workings
      • Importing ONNX
      • Quantization Tools
      • FHE Op-graph Design
      • External Libraries
    • API
      • concrete.ml.common.check_inputs.md
      • concrete.ml.common.debugging.custom_assert.md
      • concrete.ml.common.debugging.md
      • concrete.ml.common.md
      • concrete.ml.common.serialization.decoder.md
      • concrete.ml.common.serialization.dumpers.md
      • concrete.ml.common.serialization.encoder.md
      • concrete.ml.common.serialization.loaders.md
      • concrete.ml.common.serialization.md
      • concrete.ml.common.utils.md
      • concrete.ml.deployment.deploy_to_aws.md
      • concrete.ml.deployment.deploy_to_docker.md
      • concrete.ml.deployment.fhe_client_server.md
      • concrete.ml.deployment.md
      • concrete.ml.deployment.server.md
      • concrete.ml.deployment.utils.md
      • concrete.ml.onnx.convert.md
      • concrete.ml.onnx.md
      • concrete.ml.onnx.onnx_impl_utils.md
      • concrete.ml.onnx.onnx_model_manipulations.md
      • concrete.ml.onnx.onnx_utils.md
      • concrete.ml.onnx.ops_impl.md
      • concrete.ml.pytest.md
      • concrete.ml.pytest.torch_models.md
      • concrete.ml.pytest.utils.md
      • concrete.ml.quantization.base_quantized_op.md
      • concrete.ml.quantization.md
      • concrete.ml.quantization.post_training.md
      • concrete.ml.quantization.quantized_module.md
      • concrete.ml.quantization.quantized_module_passes.md
      • concrete.ml.quantization.quantized_ops.md
      • concrete.ml.quantization.quantizers.md
      • concrete.ml.search_parameters.md
      • concrete.ml.search_parameters.p_error_search.md
      • concrete.ml.sklearn.base.md
      • concrete.ml.sklearn.glm.md
      • concrete.ml.sklearn.linear_model.md
      • concrete.ml.sklearn.md
      • concrete.ml.sklearn.neighbors.md
      • concrete.ml.sklearn.qnn.md
      • concrete.ml.sklearn.qnn_module.md
      • concrete.ml.sklearn.rf.md
      • concrete.ml.sklearn.svm.md
      • concrete.ml.sklearn.tree.md
      • concrete.ml.sklearn.tree_to_numpy.md
      • concrete.ml.sklearn.xgb.md
      • concrete.ml.torch.compile.md
      • concrete.ml.torch.hybrid_model.md
      • concrete.ml.torch.md
      • concrete.ml.torch.numpy_module.md
      • concrete.ml.version.md
Powered by GitBook

Libraries

  • TFHE-rs
  • Concrete
  • Concrete ML
  • fhEVM

Developers

  • Blog
  • Documentation
  • Github
  • FHE resources

Company

  • About
  • Introduction to FHE
  • Media
  • Careers
On this page
  • Quantizing data
  • Quantized modules
  • Resources

Was this helpful?

Export as PDF
  1. Developer Guide
  2. Inner Workings

Quantization Tools

PreviousImporting ONNXNextFHE Op-graph Design

Last updated 1 year ago

Was this helpful?

Quantizing data

Concrete ML has support for quantized ML models and also provides quantization tools for Quantization Aware Training and Post-Training Quantization. The core of this functionality is the conversion of floating point values to integers and back. This is done using QuantizedArray in concrete.ml.quantization.

The class takes several arguments that determine how float values are quantized:

  • n_bits defines the precision used in quantization

  • values are floating point values that will be converted to integers

  • is_signed determines if the quantized integer values should allow negative values

  • is_symmetric determines if the range of floating point values to be quantized should be taken as symmetric around zero

See also the reference for more information:

from concrete.ml.quantization import QuantizedArray
import numpy
numpy.random.seed(0)
A = numpy.random.uniform(-2, 2, 10)
print("A = ", A)
# array([ 0.19525402,  0.86075747,  0.4110535,  0.17953273, -0.3053808,
#         0.58357645, -0.24965115,  1.567092 ,  1.85465104, -0.46623392])
q_A = QuantizedArray(7, A)
print("q_A.qvalues = ", q_A.qvalues)
# array([ 37,          73,          48,         36,          9,
#         58,          12,          112,        127,         0])
# the quantized integers values from A.
print("q_A.quantizer.scale = ", q_A.quantizer.scale)
# 0.018274684777173276, the scale S.
print("q_A.quantizer.zero_point = ", q_A.quantizer.zero_point)
# 26, the zero point Z.
print("q_A.dequant() = ", q_A.dequant())
# array([ 0.20102153,  0.85891018,  0.40204307,  0.18274685, -0.31066964,
#         0.58478991, -0.25584559,  1.57162289,  1.84574316, -0.4751418 ])
# Dequantized values.

It is also possible to use symmetric quantization, where the integer values are centered around 0:

q_A = QuantizedArray(3, A)
print("Unsigned: q_A.qvalues = ", q_A.qvalues)
print("q_A.quantizer.zero_point = ", q_A.quantizer.zero_point)
# Unsigned: q_A.qvalues =  [2 4 2 2 0 3 0 6 7 0]
# q_A.quantizer.zero_point =  1

q_A = QuantizedArray(3, A, is_signed=True, is_symmetric=True)
print("Signed Symmetric: q_A.qvalues = ", q_A.qvalues)
print("q_A.quantizer.zero_point = ", q_A.quantizer.zero_point)
# Signed Symmetric: q_A.qvalues =  [ 0  1  1  0  0  1  0  3  3 -1]
# q_A.quantizer.zero_point =  0

In the following example, showing the de-quantization of model outputs, the QuantizedArray class is used in a different way. Here it uses pre-quantized integer values and has the scale and zero-point set explicitly. Once the QuantizedArray is constructed, calling dequant() will compute the floating point values corresponding to the integer values qvalues, which are the output of the fhe_circuit.encrypt_run_decrypt(..) call.

import numpy
from concrete.ml.quantization.quantizers import QuantizationOptions

q_values = [0, 0, 1, 2, 3, -1]
QuantizedArray(
        q_A.quantizer.n_bits,
        q_values,
        value_is_float=False,
        options=q_A.quantizer.quant_options,
        stats=q_A.quantizer.quant_stats,
        params=q_A.quantizer.quant_params,
).dequant()

Quantized modules

Machine learning models are implemented with a diverse set of operations, such as convolution, linear transformations, activation functions, and element-wise operations. When working with quantized values, these operations cannot be carried out in an equivalent way to floating point values. With quantization, it is necessary to re-scale the input and output values of each operation to fit in the quantization domain.

In Concrete ML, the quantized equivalent of a scikit-learn model or a PyTorch nn.Module is the QuantizedModule. Note that only inference is implemented in the QuantizedModule, and it is built through a conversion of the inference function of the corresponding scikit-learn or PyTorch module.

Built-in neural networks expose the quantized_module member, while a QuantizedModule is also the result of the compilation of custom models through compile_torch_model and compile_brevitas_qat_model.

Calibration is the process of determining the typical distributions of values encountered for the intermediate values of a model during inference.

Resources

The quantized versions of floating point model operations are stored in the QuantizedModule. The ONNX_OPS_TO_QUANTIZED_IMPL dictionary maps ONNX floating point operators (e.g., Gemm) to their quantized equivalent (e.g., QuantizedGemm). For more information on implementing these operations, please see the .

The computation graph is taken from the corresponding floating point ONNX graph exported from scikit-learn , or from the ONNX graph exported by PyTorch. Calibration is used to obtain quantized parameters for the operations in the QuantizedModule. Parameters are also determined for the quantization of inputs during model deployment.

To perform calibration, an interpreter goes through the ONNX graph in and stores the intermediate results as it goes. The statistics of these values determine quantization parameters.

That QuantizedModule generates the Concrete function that is compiled to FHE. The compilation will succeed if the intermediate values conform to the 16-bits precision limit of the Concrete stack. See for details.

Lei Mao's blog on quantization:

Google paper on neural network quantization and integer-only inference:

FHE-compatible op-graph section
topological order
the compilation section
Quantization for Neural Networks
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
using HummingBird
QuantizedArray
UniformQuantizer