concrete.ml.quantization.post_training.md

module concrete.ml.quantization.post_training

Post Training Quantization methods.

Global Variables

  • ONNX_OPS_TO_NUMPY_IMPL

  • DEFAULT_MODEL_BITS

  • ONNX_OPS_TO_QUANTIZED_IMPL


function get_n_bits_dict

get_n_bits_dict(n_bits: Union[int, Dict[str, int]]) → Dict[str, int]

Convert the n_bits parameter into a proper dictionary.

Args:

  • n_bits (int, Dict[str, int]): number of bits for quantization, can be a single value or a dictionary with the following keys : - "op_inputs" and "op_weights" (mandatory) - "model_inputs" and "model_outputs" (optional, default to 5 bits). When using a single integer for n_bits, its value is assigned to "op_inputs" and "op_weights" bits. The maximum between this value and a default value (5) is then assigned to the number of "model_inputs" "model_outputs". This default value is a compromise between model accuracy and runtime performance in FHE. "model_outputs" gives the precision of the final network's outputs, while "model_inputs" gives the precision of the network's inputs. "op_inputs" and "op_weights" both control the quantization for inputs and weights of all layers.

Returns:

  • n_bits_dict (Dict[str, int]): A dictionary properly representing the number of bits to use for quantization.


class ONNXConverter

Base ONNX to Concrete ML computation graph conversion class.

This class provides a method to parse an ONNX graph and apply several transformations. First, it creates QuantizedOps for each ONNX graph op. These quantized ops have calibrated quantizers that are useful when the operators work on integer data or when the output of the ops is the output of the encrypted program. For operators that compute in float and will be merged to TLUs, these quantizers are not used. Second, this converter creates quantized tensors for initializer and weights stored in the graph.

This class should be sub-classed to provide specific calibration and quantization options depending on the usage (Post-training quantization vs Quantization Aware training).

Arguments:

  • n_bits (int, Dict[str, int]): number of bits for quantization, can be a single value or a dictionary with the following keys : - "op_inputs" and "op_weights" (mandatory) - "model_inputs" and "model_outputs" (optional, default to 5 bits). When using a single integer for n_bits, its value is assigned to "op_inputs" and "op_weights" bits. The maximum between this value and a default value (5) is then assigned to the number of "model_inputs" "model_outputs". This default value is a compromise between model accuracy and runtime performance in FHE. "model_outputs" gives the precision of the final network's outputs, while "model_inputs" gives the precision of the network's inputs. "op_inputs" and "op_weights" both control the quantization for inputs and weights of all layers.

  • numpy_model (NumpyModule): Model in numpy.

  • rounding_threshold_bits (int): if not None, every accumulators in the model are rounded down to the given bits of precision

method __init__

__init__(
    n_bits: Union[int, Dict],
    numpy_model: NumpyModule,
    rounding_threshold_bits: Optional[int] = None
)

property n_bits_model_inputs

Get the number of bits to use for the quantization of the first layer's output.

Returns:

  • n_bits (int): number of bits for input quantization


property n_bits_model_outputs

Get the number of bits to use for the quantization of the last layer's output.

Returns:

  • n_bits (int): number of bits for output quantization


property n_bits_op_inputs

Get the number of bits to use for the quantization of any operators' inputs.

Returns:

  • n_bits (int): number of bits for the quantization of the operators' inputs


property n_bits_op_weights

Get the number of bits to use for the quantization of any constants (usually weights).

Returns:

  • n_bits (int): number of bits for quantizing constants used by operators


method quantize_module

quantize_module(*calibration_data: ndarray) → QuantizedModule

Quantize numpy module.

Following https://arxiv.org/abs/1712.05877 guidelines.

Args:

  • *calibration_data (numpy.ndarray): Data that will be used to compute the bounds, scales and zero point values for every quantized object.

Returns:

  • QuantizedModule: Quantized numpy module


class PostTrainingAffineQuantization

Post-training Affine Quantization.

Create the quantized version of the passed numpy module.

Args:

  • n_bits (int, Dict): Number of bits to quantize the model. If an int is passed for n_bits, the value will be used for activation, inputs and weights. If a dict is passed, then it should contain "model_inputs", "op_inputs", "op_weights" and "model_outputs" keys with corresponding number of quantization bits for: - model_inputs : number of bits for model input - op_inputs : number of bits to quantize layer input values - op_weights: learned parameters or constants in the network - model_outputs: final model output quantization bits

  • numpy_model (NumpyModule): Model in numpy.

  • rounding_threshold_bits (int): if not None, every accumulators in the model are rounded down to the given bits of precision

  • is_signed: Whether the weights of the layers can be signed. Currently, only the weights can be signed.

Returns:

  • QuantizedModule: A quantized version of the numpy model.

method __init__

__init__(
    n_bits: Union[int, Dict],
    numpy_model: NumpyModule,
    rounding_threshold_bits: Optional[int] = None
)

property n_bits_model_inputs

Get the number of bits to use for the quantization of the first layer's output.

Returns:

  • n_bits (int): number of bits for input quantization


property n_bits_model_outputs

Get the number of bits to use for the quantization of the last layer's output.

Returns:

  • n_bits (int): number of bits for output quantization


property n_bits_op_inputs

Get the number of bits to use for the quantization of any operators' inputs.

Returns:

  • n_bits (int): number of bits for the quantization of the operators' inputs


property n_bits_op_weights

Get the number of bits to use for the quantization of any constants (usually weights).

Returns:

  • n_bits (int): number of bits for quantizing constants used by operators


method quantize_module

quantize_module(*calibration_data: ndarray) → QuantizedModule

Quantize numpy module.

Following https://arxiv.org/abs/1712.05877 guidelines.

Args:

  • *calibration_data (numpy.ndarray): Data that will be used to compute the bounds, scales and zero point values for every quantized object.

Returns:

  • QuantizedModule: Quantized numpy module


class PostTrainingQATImporter

Converter of Quantization Aware Training networks.

This class provides specific configuration for QAT networks during ONNX network conversion to Concrete ML computation graphs.

method __init__

__init__(
    n_bits: Union[int, Dict],
    numpy_model: NumpyModule,
    rounding_threshold_bits: Optional[int] = None
)

property n_bits_model_inputs

Get the number of bits to use for the quantization of the first layer's output.

Returns:

  • n_bits (int): number of bits for input quantization


property n_bits_model_outputs

Get the number of bits to use for the quantization of the last layer's output.

Returns:

  • n_bits (int): number of bits for output quantization


property n_bits_op_inputs

Get the number of bits to use for the quantization of any operators' inputs.

Returns:

  • n_bits (int): number of bits for the quantization of the operators' inputs


property n_bits_op_weights

Get the number of bits to use for the quantization of any constants (usually weights).

Returns:

  • n_bits (int): number of bits for quantizing constants used by operators


method quantize_module

quantize_module(*calibration_data: ndarray) → QuantizedModule

Quantize numpy module.

Following https://arxiv.org/abs/1712.05877 guidelines.

Args:

  • *calibration_data (numpy.ndarray): Data that will be used to compute the bounds, scales and zero point values for every quantized object.

Returns:

  • QuantizedModule: Quantized numpy module

Last updated