concrete.ml.quantization.post_training
Post Training Quantization methods.
ONNX_OPS_TO_NUMPY_IMPL
DEFAULT_MODEL_BITS
ONNX_OPS_TO_QUANTIZED_IMPL
get_n_bits_dict
Convert the n_bits parameter into a proper dictionary.
Args:
n_bits
(int, Dict[str, int]): number of bits for quantization, can be a single value or a dictionary with the following keys : - "op_inputs" and "op_weights" (mandatory) - "model_inputs" and "model_outputs" (optional, default to 5 bits). When using a single integer for n_bits, its value is assigned to "op_inputs" and "op_weights" bits. The maximum between this value and a default value (5) is then assigned to the number of "model_inputs" "model_outputs". This default value is a compromise between model accuracy and runtime performance in FHE. "model_outputs" gives the precision of the final network's outputs, while "model_inputs" gives the precision of the network's inputs. "op_inputs" and "op_weights" both control the quantization for inputs and weights of all layers.
Returns:
n_bits_dict
(Dict[str, int]): A dictionary properly representing the number of bits to use for quantization.
ONNXConverter
Base ONNX to Concrete ML computation graph conversion class.
This class provides a method to parse an ONNX graph and apply several transformations. First, it creates QuantizedOps for each ONNX graph op. These quantized ops have calibrated quantizers that are useful when the operators work on integer data or when the output of the ops is the output of the encrypted program. For operators that compute in float and will be merged to TLUs, these quantizers are not used. Second, this converter creates quantized tensors for initializer and weights stored in the graph.
This class should be sub-classed to provide specific calibration and quantization options depending on the usage (Post-training quantization vs Quantization Aware training).
Arguments:
n_bits
(int, Dict[str, int]): number of bits for quantization, can be a single value or a dictionary with the following keys : - "op_inputs" and "op_weights" (mandatory) - "model_inputs" and "model_outputs" (optional, default to 5 bits). When using a single integer for n_bits, its value is assigned to "op_inputs" and "op_weights" bits. The maximum between this value and a default value (5) is then assigned to the number of "model_inputs" "model_outputs". This default value is a compromise between model accuracy and runtime performance in FHE. "model_outputs" gives the precision of the final network's outputs, while "model_inputs" gives the precision of the network's inputs. "op_inputs" and "op_weights" both control the quantization for inputs and weights of all layers.
numpy_model
(NumpyModule): Model in numpy.
rounding_threshold_bits
(int): if not None, every accumulators in the model are rounded down to the given bits of precision
__init__
property n_bits_model_inputs
Get the number of bits to use for the quantization of the first layer's output.
Returns:
n_bits
(int): number of bits for input quantization
property n_bits_model_outputs
Get the number of bits to use for the quantization of the last layer's output.
Returns:
n_bits
(int): number of bits for output quantization
property n_bits_op_inputs
Get the number of bits to use for the quantization of any operators' inputs.
Returns:
n_bits
(int): number of bits for the quantization of the operators' inputs
property n_bits_op_weights
Get the number of bits to use for the quantization of any constants (usually weights).
Returns:
n_bits
(int): number of bits for quantizing constants used by operators
quantize_module
Quantize numpy module.
Following https://arxiv.org/abs/1712.05877 guidelines.
Args:
*calibration_data (numpy.ndarray)
: Data that will be used to compute the bounds, scales and zero point values for every quantized object.
Returns:
QuantizedModule
: Quantized numpy module
PostTrainingAffineQuantization
Post-training Affine Quantization.
Create the quantized version of the passed numpy module.
Args:
n_bits
(int, Dict): Number of bits to quantize the model. If an int is passed for n_bits, the value will be used for activation, inputs and weights. If a dict is passed, then it should contain "model_inputs", "op_inputs", "op_weights" and "model_outputs" keys with corresponding number of quantization bits for: - model_inputs : number of bits for model input - op_inputs : number of bits to quantize layer input values - op_weights: learned parameters or constants in the network - model_outputs: final model output quantization bits
numpy_model
(NumpyModule): Model in numpy.
rounding_threshold_bits
(int): if not None, every accumulators in the model are rounded down to the given bits of precision
is_signed
: Whether the weights of the layers can be signed. Currently, only the weights can be signed.
Returns:
QuantizedModule
: A quantized version of the numpy model.
__init__
property n_bits_model_inputs
Get the number of bits to use for the quantization of the first layer's output.
Returns:
n_bits
(int): number of bits for input quantization
property n_bits_model_outputs
Get the number of bits to use for the quantization of the last layer's output.
Returns:
n_bits
(int): number of bits for output quantization
property n_bits_op_inputs
Get the number of bits to use for the quantization of any operators' inputs.
Returns:
n_bits
(int): number of bits for the quantization of the operators' inputs
property n_bits_op_weights
Get the number of bits to use for the quantization of any constants (usually weights).
Returns:
n_bits
(int): number of bits for quantizing constants used by operators
quantize_module
Quantize numpy module.
Following https://arxiv.org/abs/1712.05877 guidelines.
Args:
*calibration_data (numpy.ndarray)
: Data that will be used to compute the bounds, scales and zero point values for every quantized object.
Returns:
QuantizedModule
: Quantized numpy module
PostTrainingQATImporter
Converter of Quantization Aware Training networks.
This class provides specific configuration for QAT networks during ONNX network conversion to Concrete ML computation graphs.
__init__
property n_bits_model_inputs
Get the number of bits to use for the quantization of the first layer's output.
Returns:
n_bits
(int): number of bits for input quantization
property n_bits_model_outputs
Get the number of bits to use for the quantization of the last layer's output.
Returns:
n_bits
(int): number of bits for output quantization
property n_bits_op_inputs
Get the number of bits to use for the quantization of any operators' inputs.
Returns:
n_bits
(int): number of bits for the quantization of the operators' inputs
property n_bits_op_weights
Get the number of bits to use for the quantization of any constants (usually weights).
Returns:
n_bits
(int): number of bits for quantizing constants used by operators
quantize_module
Quantize numpy module.
Following https://arxiv.org/abs/1712.05877 guidelines.
Args:
*calibration_data (numpy.ndarray)
: Data that will be used to compute the bounds, scales and zero point values for every quantized object.
Returns:
QuantizedModule
: Quantized numpy module