Benchmarks
Due to their nature, homomorphic operations are naturally slower than their cleartext equivalents. Some timings are exposed for basic operations. For completeness, benchmarks for other libraries are also given.
All benchmarks were launched on an AWS hpc7a.96xlarge instance with the following specifications: AMD EPYC 9R14 CPU @ 2.60GHz and 740GB of RAM.
Integer
This measures the execution time for some operation sets of tfhe-rs::integer (the unsigned version). Note that the timings for FheInt
(i.e., the signed integers) are similar.
The table below reports the timing when the inputs of the benchmarked operation are encrypted.
Operation \ Size |
|
|
|
|
|
|
Negation ( | 55.4 ms | 79.7 ms | 105 ms | 133 ms | 163 ms | 199 ms |
Add / Sub ( | 58.9 ms | 86.0 ms | 106 ms | 124 ms | 151 ms | 193 ms |
Mul ( | 122 ms | 164 ms | 227 ms | 410 ms | 1,04 s | 3,41 s |
Equal / Not Equal ( | 32.0 ms | 32.0 ms | 50.4 ms | 50.9 ms | 53.1 ms | 54.6 ms |
Comparisons ( | 43.7 ms | 65.2 ms | 84.3 ms | 107 ms | 132 ms | 159 ms |
Max / Min ( | 68.4 ms | 86.8 ms | 106 ms | 132 ms | 160 ms | 200 ms |
Bitwise operations ( | 17.1 ms | 17.3 ms | 17.8 ms | 18.8 ms | 20.2 ms | 22.2 ms |
Div / Rem ( | 631 ms | 1.59 s | 3.77 s | 8,64 s | 20,3 s | 53,4 s |
Left / Right Shifts ( | 82.8 ms | 99.2 ms | 121 ms | 149 ms | 194 ms | 401 ms |
Left / Right Rotations ( | 82.1 ms | 99.4 ms | 120 ms | 149 ms | 194 ms | 402 ms |
The table below reports the timing when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size.
Operation \ Size |
|
|
|
|
|
|
Add / Sub ( | 68.3 ms | 82.4 ms | 102 ms | 122 ms | 151 ms | 191 ms |
Mul ( | 93.7 ms | 139 ms | 178 ms | 242 ms | 516 ms | 1.02 s |
Equal / Not Equal ( | 30.2 ms | 30.8 ms | 32.7 ms | 50.4 ms | 51.2 ms | 54.8 ms |
Comparisons ( | 47.3 ms | 69.9 ms | 96.3 ms | 102 ms | 138 ms | 141 ms |
Max / Min ( | 75.4 ms | 99.7 ms | 120 ms | 126 ms | 150 ms | 186 ms |
Bitwise operations ( | 17.1 ms | 17.4 ms | 18.2 ms | 19.2 ms | 19.7 ms | 22.6 ms |
Div ( | 160 ms | 212 ms | 272 ms | 402 ms | 796 ms | 2.27 s |
Rem ( | 315 ms | 428 ms | 556 ms | 767 ms | 1.27 s | 2.86 s |
Left / Right Shifts ( | 16.8 ms | 16.8 ms | 17.3 ms | 18.0 ms | 18.9 ms | 22.6 ms |
Left / Right Rotations ( | 16.8 ms | 16.9 ms | 17.3 ms | 18.3 ms | 19.0 ms | 22.8 ms |
All timings are related to parallelized Radix-based integer operations, where each block is encrypted using the default parameters (i.e., PARAM_MESSAGE_2_CARRY_2_KS_PBS, more information about parameters can be found here). To ensure predictable timings, the operation flavor is the default
one: the carry is propagated if needed. The operation costs may be reduced by using unchecked
, checked
, or smart
.
Shortint
This measures the execution time for some operations using various parameter sets of tfhe-rs::shortint. Except for unchecked_add
, all timings are related to the default
operations. This flavor ensures predictable timings for an operation along the entire circuit by clearing the carry space after each operation.
This uses the Concrete FFT + AVX-512 configuration.
Parameter set | PARAM_MESSAGE_1_CARRY_1 | PARAM_MESSAGE_2_CARRY_2 | PARAM_MESSAGE_3_CARRY_3 | PARAM_MESSAGE_4_CARRY_4 |
---|---|---|---|---|
unchecked_add | 341 ns | 555 ns | 2.47 µs | 9.77 µs |
add | 5.96 ms | 12.6 ms | 102 ms | 508 ms |
mul_lsb | 5.99 ms | 12.3 ms | 101 ms | 500 ms |
keyswitch_programmable_bootstrap | 6.40 ms | 12.9 ms | 104 ms | 489 ms |
Boolean
This measures the execution time of a single binary Boolean gate.
tfhe-rs::boolean.
Parameter set | Concrete FFT + AVX-512 |
---|---|
DEFAULT_PARAMETERS_KS_PBS | 8.49 ms |
PARAMETERS_ERROR_PROB_2_POW_MINUS_165_KS_PBS | 13.7 ms |
TFHE_LIB_PARAMETERS | 9.90 ms |
tfhe-lib.
Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are:
Parameter set | spqlios-fma |
---|---|
default_128bit_gate_bootstrapping_parameters | 13.5 ms |
OpenFHE (v1.1.2).
Following the official instructions from OpenFHE, clang14
and the following command are used to setup the project: cmake -DNATIVE_SIZE=32 -DWITH_NATIVEOPT=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DWITH_OPENMP=OFF ..
To use the HEXL library, the configuration used is as follows:
Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are:
Parameter set | GINX | GINX w/ Intel HEXL |
---|---|---|
FHEW_BINGATE/STD128_OR | 25.5 ms | 21,6 ms |
FHEW_BINGATE/STD128_LMKCDEY_OR | 25.4 ms | 19.9 ms |
How to reproduce TFHE-rs benchmarks
TFHE-rs benchmarks can be easily reproduced from source.
If the host machine does not support AVX512, then turning on AVX512_SUPPORT
will not provide any speed-up.
Last updated