This document details the concept of rounding, and how it is used in Concrete to make some FHE computations especially faster.
Table lookups have a strict constraint on the number of bits they support. This can be limiting, especially if you don't need exact precision. As well as this, using larger bit-widths leads to slower table lookups.
To overcome these issues, rounded table lookups are introduced. This operation provides a way to round the least significant bits of a large integer and then apply the table lookup on the resulting (smaller) value.
Imagine you have a 5-bit value, but you want to have a 3-bit table lookup. You can call fhe.round_bit_pattern(input, lsbs_to_remove=2) and use the 3-bit value you receive as input to the table lookup.
If the rounded number is one of the last 2**(lsbs_to_remove - 1) numbers in the input range [0, 2**original_bit_width), an overflow will happen.
By default, if an overflow is encountered during inputset evaluation, bit-widths will be adjusted accordingly. This results in a loss of speed, but ensures accuracy.
You can turn this overflow protection off (e.g., for performance) by using fhe.round_bit_pattern(..., overflow_protection=False). However, this could lead to unexpected behavior at runtime.
The reason why the speed-up is not increasing with lsbs_to_remove is because the rounding operation itself has a cost: each bit removal is a PBS. Therefore, if a lot of bits are removed, rounding itself could take longer than the bigger TLU which is evaluated afterwards.
and displays:
Feel free to disable overflow protection and see what happens.
Auto Rounders
Rounding is very useful but, in some cases, you don't know how many bits your input contains, so it's not reliable to specify lsbs_to_remove manually. For this reason, the AutoRounder class is introduced.
AutoRounder allows you to set how many of the most significant bits to keep, but they need to be adjusted using an inputset to determine how many of the least significant bits to remove. This can be done manually using fhe.AutoRounder.adjust(function, inputset), or by setting auto_adjust_rounders configuration to True during compilation.
AutoRounders should be defined outside the function that is being compiled. They are used to store the result of the adjustment process, so they shouldn't be created each time the function is called. Furthermore, each AutoRounder should be used with exactly one round_bit_pattern call.
Exactness
One use of rounding is doing faster computation by ignoring the lower significant bits. For this usage, you can even get faster results if you accept the rounding it-self to be slightly inexact. The speedup is usually around 2x-3x but can be higher for big precision reduction. This also enable higher precisions values that are not possible otherwise.
*Using the default configuration in approximate mode. For 3, 4, 5 and 6 reduced precision bits and accumulator precision up to 32bits
You can turn on this mode either globally on the configuration:
v = fhe.round_bit_pattern(v, lsbs_to_remove=2, exactness=fhe.Exactness.APPROXIMATE)v = fhe.round_bit_pattern(v, lsbs_to_remove=2, exactness=fhe.Exactness.EXACT)
In approximate mode the rounding threshold up or down is not perfectly centered: The off-centering is:
is bounded, i.e. at worst an off-by-one on the reduced precision value compared to the exact result,
is pseudo-random, i.e. it will be different on each call,
almost symmetrically distributed,
depends on cryptographic properties like the encryption mask, the encryption noise and the crypto-parameters.
In blue the exact value, the red dots are approximate values due to off-centered transition in approximate mode.
Histogram of transitions off-centering delta. Each count correspond to a specific random mask and a specific encryption noise.
Approximate rounding features
With approximate rounding, you can enable an approximate clipping to get further improve performance in the case of overflow handling. Approximate clipping enable to discard the extra bit of overflow protection bit in the successor TLU. For consistency a logical clipping is available when this optimization is not suitable.
Logical clipping
When fast approximate clipping is not suitable (i.e. slower), it's better to apply logical clipping for consistency and better resilience to code change. It has no extra cost since it's fuzed with the successor TLU.
Only the last step is clipped.
Approximate clipping
This set the first precision where approximate clipping is enabled, starting from this precision, an extra small precision TLU is introduced to safely remove the extra precision bit used to contain overflow. This way the successor TLU is faster. E.g. for a rounding to 7bits, that finishes to a TLU of 8bits due to overflow, forcing to use a TLU of 7bits is 3x faster.