Benchmarks
Due to their nature, homomorphic operations are naturally slower than their cleartext equivalents. Some timings are exposed for basic operations. For completeness, benchmarks for other libraries are also given.
All benchmarks were launched on an AWS m6i.metal with the following specifications: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz and 512GB of RAM.
Integer
This measures the execution time for some operation sets of tfhe-rs::integer (the unsigned version). Note that the timings for FheInt
(i.e., the signed integers) are similar.
Operation \ Size
FheUint8
FheUint16
FheUint32
FheUint64
FheUint128
FheUint256
Negation (-
)
70.9 ms
99.3 ms
129 ms
180 ms
239 ms
333 ms
Add / Sub (+
,-
)
70.5 ms
100 ms
132 ms
186 ms
249 ms
334 ms
Mul (x
)
144 ms
216 ms
333 ms
832 ms
2.50 s
8.85 s
Equal / Not Equal (eq
, ne
)
36.1 ms
36.5 ms
57.4 ms
64.2 ms
67.3 ms
78.1 ms
Comparisons (ge
, gt
, le
, lt
)
52.6 ms
73.1 ms
98.8 ms
124 ms
165 ms
201 ms
Max / Min (max
,min
)
76.2 ms
102 ms
135 ms
171 ms
212 ms
301 ms
Bitwise operations (&
, |
, ^
)
19.4 ms
20.3 ms
21.0 ms
27.2 ms
31.6 ms
40.2 ms
Div / Rem (/
, %
)
729 ms
1.93 s
4.81 s
12.2 s
30.7 s
89.6 s
Left / Right Shifts (<<
, >>
)
99.4 ms
129 ms
180 ms
243 ms
372 ms
762 ms
Left / Right Rotations (left_rotate
, right_rotate
)
103 ms
128 ms
182 ms
241 ms
374 ms
763 ms
All timings are related to parallelized Radix-based integer operations, where each block is encrypted using the default parameters (i.e., PARAM_MESSAGE_2_CARRY_2_KS_PBS, more information about parameters can be found here). To ensure predictable timings, the operation flavor is the default
one: the carry is propagated if needed. The operation costs may be reduced by using unchecked
, checked
, or smart
.
Shortint
This measures the execution time for some operations using various parameter sets of tfhe-rs::shortint. Except for unchecked_add
, all timings are related to the default
operations. This flavor ensures predictable timings for an operation along the entire circuit by clearing the carry space after each operation.
This uses the Concrete FFT + AVX-512 configuration.
unchecked_add
348 ns
413 ns
2.95 µs
12.1 µs
add
7.59 ms
17.0 ms
121 ms
835 ms
mul_lsb
8.13 ms
16.8 ms
121 ms
827 ms
keyswitch_programmable_bootstrap
7.28 ms
16.6 ms
121 ms
811 ms
Boolean
This measures the execution time of a single binary Boolean gate.
tfhe-rs::boolean.
DEFAULT_PARAMETERS_KS_PBS
9.19 ms
PARAMETERS_ERROR_PROB_2_POW_MINUS_165_KS_PBS
14.1 ms
TFHE_LIB_PARAMETERS
10.0 ms
tfhe-lib.
Using the same m6i.metal machine as the one for tfhe-rs, the timings are:
default_128bit_gate_bootstrapping_parameters
15.4 ms
OpenFHE (v1.1.1).
Following the official instructions from OpenFHE, clang14
and the following command are used to setup the project: cmake -DNATIVE_SIZE=32 -DWITH_NATIVEOPT=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DWITH_OPENMP=OFF ..
To use the HEXL library, the configuration used is as follows:
Using the same m6i.metal machine as the one for tfhe-rs, the timings are:
FHEW_BINGATE/STD128_OR
40.2 ms
31.0 ms
FHEW_BINGATE/STD128_LMKCDEY_OR
38.6 ms
28.4 ms
How to reproduce TFHE-rs benchmarks
TFHE-rs benchmarks can be easily reproduced from source.
If the host machine does not support AVX512, then turning on AVX512_SUPPORT
will not provide any speed-up.
Last updated