# Rounding

This document details the concept of rounding, and how it is used in Concrete to make some FHE computations especially faster.

Table lookups have a strict constraint on the number of bits they support. This can be limiting, especially if you don't need exact precision. As well as this, using larger bit-widths leads to slower table lookups.

To overcome these issues, rounded table lookups are introduced. This operation provides a way to round the least significant bits of a large integer and then apply the table lookup on the resulting (smaller) value.

Imagine you have a 5-bit value, but you want to have a 3-bit table lookup. You can call `fhe.round_bit_pattern(input, lsbs_to_remove=2)`

and use the 3-bit value you receive as input to the table lookup.

Let's see how rounding works in practice:

prints:

and displays:

If the rounded number is one of the last `2**(lsbs_to_remove - 1)`

numbers in the input range `[0, 2**original_bit_width)`

, an overflow **will** happen.

By default, if an overflow is encountered during inputset evaluation, bit-widths will be adjusted accordingly. This results in a loss of speed, but ensures accuracy.

You can turn this overflow protection off (e.g., for performance) by using `fhe.round_bit_pattern(..., overflow_protection=False)`

. However, this could lead to unexpected behavior at runtime.

Now, let's see how rounding can be used in FHE.

prints:

These speed-ups can vary from system to system.

The reason why the speed-up is not increasing with `lsbs_to_remove`

is because the rounding operation itself has a cost: each bit removal is a PBS. Therefore, if a lot of bits are removed, rounding itself could take longer than the bigger TLU which is evaluated afterwards.

and displays:

Feel free to disable overflow protection and see what happens.

## Auto Rounders

Rounding is very useful but, in some cases, you don't know how many bits your input contains, so it's not reliable to specify `lsbs_to_remove`

manually. For this reason, the `AutoRounder`

class is introduced.

`AutoRounder`

allows you to set how many of the most significant bits to keep, but they need to be adjusted using an inputset to determine how many of the least significant bits to remove. This can be done manually using `fhe.AutoRounder.adjust(function, inputset)`

, or by setting `auto_adjust_rounders`

configuration to `True`

during compilation.

Here is how auto rounders can be used in FHE:

prints:

and displays:

`AutoRounder`

s should be defined outside the function that is being compiled. They are used to store the result of the adjustment process, so they shouldn't be created each time the function is called. Furthermore, each `AutoRounder`

should be used with exactly one `round_bit_pattern`

call.

## Exactness

One use of rounding is doing faster computation by ignoring the lower significant bits. For this usage, you can even get faster results if you accept the rounding it-self to be slightly inexact. The speedup is usually around 2x-3x but can be higher for big precision reduction. This also enable higher precisions values that are not possible otherwise.

You can turn on this mode either globally on the configuration:

or on/off locally:

In approximate mode the rounding threshold up or down is not perfectly centered: The off-centering is:

is bounded, i.e. at worst an off-by-one on the reduced precision value compared to the exact result,

is pseudo-random, i.e. it will be different on each call,

almost symmetrically distributed,

depends on cryptographic properties like the encryption mask, the encryption noise and the crypto-parameters.

## Approximate rounding features

With approximate rounding, you can enable an approximate clipping to get further improve performance in the case of overflow handling. Approximate clipping enable to discard the extra bit of overflow protection bit in the successor TLU. For consistency a logical clipping is available when this optimization is not suitable.

### Logical clipping

When fast approximate clipping is not suitable (i.e. slower), it's better to apply logical clipping for consistency and better resilience to code change. It has no extra cost since it's fuzed with the successor TLU.

### Approximate clipping

This set the first precision where approximate clipping is enabled, starting from this precision, an extra small precision TLU is introduced to safely remove the extra precision bit used to contain overflow. This way the successor TLU is faster. E.g. for a rounding to 7bits, that finishes to a TLU of 8bits due to overflow, forcing to use a TLU of 7bits is 3x faster.

Last updated