QuantizedOpinstances may need to quantize their inputs or the result of their computation, depending on their position in the graph.
QuantizedOpclass provides a generic implementation of an ONNX operation, including quantization of inputs and outputs, with the computation implemented in numpy in
ops_impl.py. We can picture at the architecture of the
QuantizedOpas the following structure:
QuantizedOphas a body that implements the computation of the operation, following the ONNX spec. The operation's body can take either integer or float inputs and can output float or integer values. Two quantizers are attached to the operation, one that takes float inputs and produces the integer inputs, and one that does the same for the output.
QuantizedOpcan be fully fused to a TLU.
QuantizedOpis applied. These types of operation are implemented with a class that derives from
q_impl, such as
QuantizedOpinstances in the graph in sequence, and uses Concrete-Numpy to trace the execution and compile to FHE. Thus, in this chain of function calls, all groups of that instructions that operate in floating point will be fused to table lookups (TLUs). In FHE this lookup table is computed with a PBS.
QuantizedOp. Since the encrypted function takes integers as inputs, the input needs to be dequantized first.
QuantizedOpis the base class for all ONNX quantized operators. It abstracts away many things to allow easy implementation of new quantized ops.
QuantizedOpclass exposes a function
QuantizedOpare produced by a unique integer tensor. Thus, the
can_fusefunction of some
QuantizedOptypes (addition, subtraction) will allow fusion to take place if both operands are produced by a unique integer tensor:
ops_impl.pyto see how some operations are implemented in with NumPy. The declaration convention for these operations is as follows:
/, which marks the limit of the positional arguments
*, which marks the limits of positional or keyword arguments
QuantizedOpclass to properly populate metadata automatically. It uses Python inspect modules and stores relevant information for each argument related to its positional/keyword status. This allows using the Concrete-NumPy implementation as specifications for
QuantizedOp, which removes some data duplication and allows having a single source of truth for
QuantizedOpand ONNX NumPy implementations.
QuantizedGemm), you can just set
_impl_for_op_namedto the name of the ONNX op for which the quantized class is implemented (this uses the mapping
onnx_utils.pyto get the correct implementation).
QuantizedOpto create a new operation. This sub-class must override
q_implin order to provide an integer implementation.
QuantizedGemmis an example of such a case where quantized matrix multiplication requires proper handling of scales and zero points. The
q_implof that class reflects that.
q_impl, in order to obtain quantized integer values you can use the
_prepare_inputs_with_constantsfunction as such:
prepared_inputswill contain one or more
QuantizedArrayof which the
qvaluesare the quantized integers.
q_implfunction must be a implemented as a single
QuantizedArray. Most commonly, this is built using the dequantized results of the processing done in
q_implyou can check wether the current operation can be fused by calling
self.can_fuse(). You can then have both a floating point and an integer implementation, the traced execution path will depend on