Benchmarks

This document details the performance benchmarks of homomorphic operations using TFHE-rs.

By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.

All CPU benchmarks were launched on an AWS hpc7a.96xlarge instance equipped with an AMD EPYC 9R14 CPU @ 2.60GHz and 740GB of RAM.

Integer operations

The following tables benchmark the execution time of some operation sets using FheUint (unsigned integers). The FheInt (signed integers) performs similarly.

The next table shows the operation timings on CPU when all inputs are encrypted:

Operation \ Size

FheUint8

FheUint16

FheUint32

FheUint64

FheUint128

FheUint256

Negation (-)

65.1 ms

97.0 ms

116 ms

141 ms

186 ms

227 ms

Add / Sub (+,-)

75.8 ms

96.7 ms

118 ms

150 ms

186 ms

230 ms

Mul (x)

96.1 ms

180 ms

251 ms

425 ms

1.1 s

3.66 s

Equal / Not Equal (eq, ne)

32.2 ms

35.0 ms

55.4 ms

56.0 ms

59.5 ms

60.7 ms

Comparisons (ge, gt, le, lt)

57.1 ms

72.9 ms

93.0 ms

116 ms

138 ms

164 ms

Max / Min (max,min)

94.3 ms

114 ms

138 ms

159 ms

189 ms

233 ms

Bitwise operations (&, |, ^)

19.6 ms

20.1 ms

20.2 ms

21.7 ms

23.9 ms

25.7 ms

Div / Rem (/, %)

711 ms

1.81 s

4.43 s

10.5 s

25.1 s

63.2 s

Left / Right Shifts (<<, >>)

99.5 ms

125 ms

155 ms

190 ms

234 ms

434 ms

Left / Right Rotations (left_rotate, right_rotate)

101 ms

125 ms

154 ms

188 ms

234 ms

430 ms

Leading / Trailing zeros/ones

96.7 ms

155 ms

181 ms

241 ms

307 ms

367 ms

Log2

112 ms

176 ms

200 ms

265 ms

320 ms

379 ms

The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:

Operation \ Size

FheUint8

FheUint16

FheUint32

FheUint64

FheUint128

FheUint256

Add / Sub (+,-)

75.9 ms

95.3 ms

119 ms

150 ms

182 ms

224 ms

Mul (x)

79.3 ms

163 ms

211 ms

273 ms

467 ms

1.09 s

Equal / Not Equal (eq, ne)

31.2 ms

30.9 ms

34.4 ms

54.5 ms

57.0 ms

58.0 ms

Comparisons (ge, gt, le, lt)

38.6 ms

56.3 ms

76.1 ms

99.0 ms

124 ms

141 ms

Max / Min (max,min)

74.0 ms

103 ms

122 ms

144 ms

171 ms

214 ms

Bitwise operations (&, |, ^)

19.0 ms

19.8 ms

20.5 ms

21.6 ms

23.8 ms

25.8 ms

Div (/)

192 ms

255 ms

322 ms

459 ms

877 ms

2.61 s

Rem (%)

336 ms

482 ms

650 ms

871 ms

1.39 s

3.05 s

Left / Right Shifts (<<, >>)

19.5 ms

20.2 ms

20.7 ms

22.1 ms

23.8 ms

25.6 ms

Left / Right Rotations (left_rotate, right_rotate)

19.0 ms

20.0 ms

20.8 ms

21.7 ms

23.9 ms

25.7 ms

All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters PARAM_MESSAGE_2_CARRY_2_KS_PBS. To ensure predictable timings, we perform operations in the default mode, which propagates the carry bit as needed. You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes, each balancing performance and security differently.

For more details about parameters, see here. You can find the benchmark results on GPU for all these operations here.

Shortint operations

The next table shows the execution time of some operations using various parameter sets of tfhe-rs::shortint. Except for unchecked_add, we perform all the operations in the default mode. This mode ensures predictable timings along the entire circuit by clearing the carry space after each operation. The configuration is Concrete FFT + AVX-512.

Parameter setPARAM_MESSAGE_1_CARRY_1PARAM_MESSAGE_2_CARRY_2PARAM_MESSAGE_3_CARRY_3PARAM_MESSAGE_4_CARRY_4

unchecked_add

559 ns

544 ns

2.26 µs

9.53 µs

add

9.98 ms

14.1 ms

113 ms

873 ms

mul_lsb

9.79 ms

13.8 ms

113 ms

794 ms

keyswitch_programmable_bootstrap

9.85 ms

13.9 ms

114 ms

791 ms

Boolean operations

The next table shows the execution time of a single binary Boolean gate.

tfhe-rs::boolean

Parameter setConcrete FFT + AVX-512

DEFAULT_PARAMETERS_KS_PBS

9.98 ms

PARAMETERS_ERROR_PROB_2_POW_MINUS_165_KS_PBS

17.0 ms

TFHE_LIB_PARAMETERS

9.64 ms

tfhe-lib

Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are as follows:

Parameter setspqlios-fma

default_128bit_gate_bootstrapping_parameters

13.5 ms

OpenFHE (v1.1.2)

Following the official instructions from OpenFHE, we use clang14 and the following command to setup the project: cmake -DNATIVE_SIZE=32 -DWITH_NATIVEOPT=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DWITH_OPENMP=OFF ..

The following example shows how to initialize the configuration to use the HEXL library:

export CXX=clang++
export CC=clang

scripts/configure.sh
Release -> y
hexl -> y

scripts/build-openfhe-development-hexl.sh

Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are as follows:

Parameter setGINXGINX w/ Intel HEXL

FHEW_BINGATE/STD128_OR

25.5 ms

24,0 ms

FHEW_BINGATE/STD128_LMKCDEY_OR

25.4 ms

23.6 ms

Reproducing TFHE-rs benchmarks

TFHE-rs benchmarks can be easily reproduced from the source.

AVX512 is now enabled by default for benchmarks when available

The following example shows how to reproduce TFHE-rs benchmarks:

#Boolean benchmarks:
make bench_boolean

#Integer benchmarks:
make bench_integer

#Shortint benchmarks:
make bench_shortint

Last updated