Table lookups have a strict constraint on the number of bits they support. This can be limiting, especially if you don't need exact precision. As well as this, using larger bit-widths leads to slower table lookups.
To overcome these issues, rounded table lookups are introduced. This operation provides a way to round the least significant bits of a large integer and then apply the table lookup on the resulting (smaller) value.
Imagine you have a 5-bit value, but you want to have a 3-bit table lookup. You can call fhe.round_bit_pattern(input, lsbs_to_remove=2) and use the 3-bit value you receive as input to the table lookup.
If the rounded number is one of the last 2**(lsbs_to_remove - 1) numbers in the input range [0, 2**original_bit_width), an overflow will happen.
By default, if an overflow is encountered during inputset evaluation, bit-widths will be adjusted accordingly. This results in a loss of speed, but ensures accuracy.
You can turn this overflow protection off (e.g., for performance) by using fhe.round_bit_pattern(..., overflow_protection=False). However, this could lead to unexpected behavior at runtime.
The reason why the speed-up is not increasing with lsbs_to_remove is because the rounding operation itself has a cost: each bit removal is a PBS. Therefore, if a lot of bits are removed, rounding itself could take longer than the bigger TLU which is evaluated afterwards.
and displays:
Feel free to disable overflow protection and see what happens.
Auto Rounders
Rounding is very useful but, in some cases, you don't know how many bits your input contains, so it's not reliable to specify lsbs_to_remove manually. For this reason, the AutoRounder class is introduced.
AutoRounder allows you to set how many of the most significant bits to keep, but they need to be adjusted using an inputset to determine how many of the least significant bits to remove. This can be done manually using fhe.AutoRounder.adjust(function, inputset), or by setting auto_adjust_rounders configuration to True during compilation.
AutoRounders should be defined outside the function that is being compiled. They are used to store the result of the adjustment process, so they shouldn't be created each time the function is called. Furthermore, each AutoRounder should be used with exactly one round_bit_pattern call.