Benchmarks
This document details the performance benchmarks of homomorphic operations using TFHE-rs.
By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.
All CPU benchmarks were launched on an AWS hpc7a.96xlarge
instance equipped with an AMD EPYC 9R14 CPU @ 2.60GHz
and 740GB of RAM.
Integer operations
The following tables benchmark the execution time of some operation sets using FheUint
(unsigned integers). The FheInt
(signed integers) performs similarly.
The next table shows the operation timings on CPU when all inputs are encrypted:
Operation \ Size |
|
|
|
|
|
|
Negation ( | 65.1 ms | 97.0 ms | 116 ms | 141 ms | 186 ms | 227 ms |
Add / Sub ( | 75.8 ms | 96.7 ms | 118 ms | 150 ms | 186 ms | 230 ms |
Mul ( | 96.1 ms | 180 ms | 251 ms | 425 ms | 1.1 s | 3.66 s |
Equal / Not Equal ( | 32.2 ms | 35.0 ms | 55.4 ms | 56.0 ms | 59.5 ms | 60.7 ms |
Comparisons ( | 57.1 ms | 72.9 ms | 93.0 ms | 116 ms | 138 ms | 164 ms |
Max / Min ( | 94.3 ms | 114 ms | 138 ms | 159 ms | 189 ms | 233 ms |
Bitwise operations ( | 19.6 ms | 20.1 ms | 20.2 ms | 21.7 ms | 23.9 ms | 25.7 ms |
Div / Rem ( | 711 ms | 1.81 s | 4.43 s | 10.5 s | 25.1 s | 63.2 s |
Left / Right Shifts ( | 99.5 ms | 125 ms | 155 ms | 190 ms | 234 ms | 434 ms |
Left / Right Rotations ( | 101 ms | 125 ms | 154 ms | 188 ms | 234 ms | 430 ms |
Leading / Trailing zeros/ones | 96.7 ms | 155 ms | 181 ms | 241 ms | 307 ms | 367 ms |
Log2 | 112 ms | 176 ms | 200 ms | 265 ms | 320 ms | 379 ms |
The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:
Operation \ Size |
|
|
|
|
|
|
Add / Sub ( | 75.9 ms | 95.3 ms | 119 ms | 150 ms | 182 ms | 224 ms |
Mul ( | 79.3 ms | 163 ms | 211 ms | 273 ms | 467 ms | 1.09 s |
Equal / Not Equal ( | 31.2 ms | 30.9 ms | 34.4 ms | 54.5 ms | 57.0 ms | 58.0 ms |
Comparisons ( | 38.6 ms | 56.3 ms | 76.1 ms | 99.0 ms | 124 ms | 141 ms |
Max / Min ( | 74.0 ms | 103 ms | 122 ms | 144 ms | 171 ms | 214 ms |
Bitwise operations ( | 19.0 ms | 19.8 ms | 20.5 ms | 21.6 ms | 23.8 ms | 25.8 ms |
Div ( | 192 ms | 255 ms | 322 ms | 459 ms | 877 ms | 2.61 s |
Rem ( | 336 ms | 482 ms | 650 ms | 871 ms | 1.39 s | 3.05 s |
Left / Right Shifts ( | 19.5 ms | 20.2 ms | 20.7 ms | 22.1 ms | 23.8 ms | 25.6 ms |
Left / Right Rotations ( | 19.0 ms | 20.0 ms | 20.8 ms | 21.7 ms | 23.9 ms | 25.7 ms |
All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters PARAM_MESSAGE_2_CARRY_2_KS_PBS
. To ensure predictable timings, we perform operations in the default
mode, which propagates the carry bit as needed. You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes, each balancing performance and security differently.
For more details about parameters, see here. You can find the benchmark results on GPU for all these operations here.
Shortint operations
The next table shows the execution time of some operations using various parameter sets of tfhe-rs::shortint. Except for unchecked_add
, we perform all the operations in the default
mode. This mode ensures predictable timings along the entire circuit by clearing the carry space after each operation. The configuration is Concrete FFT + AVX-512.
Parameter set | PARAM_MESSAGE_1_CARRY_1 | PARAM_MESSAGE_2_CARRY_2 | PARAM_MESSAGE_3_CARRY_3 | PARAM_MESSAGE_4_CARRY_4 |
---|---|---|---|---|
unchecked_add | 559 ns | 544 ns | 2.26 µs | 9.53 µs |
add | 9.98 ms | 14.1 ms | 113 ms | 873 ms |
mul_lsb | 9.79 ms | 13.8 ms | 113 ms | 794 ms |
keyswitch_programmable_bootstrap | 9.85 ms | 13.9 ms | 114 ms | 791 ms |
Boolean operations
The next table shows the execution time of a single binary Boolean gate.
tfhe-rs::boolean
Parameter set | Concrete FFT + AVX-512 |
---|---|
DEFAULT_PARAMETERS_KS_PBS | 9.98 ms |
PARAMETERS_ERROR_PROB_2_POW_MINUS_165_KS_PBS | 17.0 ms |
TFHE_LIB_PARAMETERS | 9.64 ms |
tfhe-lib
Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are as follows:
Parameter set | spqlios-fma |
---|---|
default_128bit_gate_bootstrapping_parameters | 13.5 ms |
OpenFHE (v1.1.2)
Following the official instructions from OpenFHE, we use clang14
and the following command to setup the project: cmake -DNATIVE_SIZE=32 -DWITH_NATIVEOPT=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DWITH_OPENMP=OFF ..
The following example shows how to initialize the configuration to use the HEXL library:
Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are as follows:
Parameter set | GINX | GINX w/ Intel HEXL |
---|---|---|
FHEW_BINGATE/STD128_OR | 25.5 ms | 24,0 ms |
FHEW_BINGATE/STD128_LMKCDEY_OR | 25.4 ms | 23.6 ms |
Reproducing TFHE-rs benchmarks
TFHE-rs benchmarks can be easily reproduced from the source.
AVX512 is now enabled by default for benchmarks when available
The following example shows how to reproduce TFHE-rs benchmarks:
Last updated