This document details the CPU performance benchmarks of homomorphic operations using TFHE-rs.
By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.
All CPU benchmarks were launched on an AWS hpc7a.96xlarge
instance equipped with an AMD EPYC 9R14 CPU @ 2.60GHz
and 740GB of RAM.
The following tables benchmark the execution time of some operation sets using FheUint
(unsigned integers). The FheInt
(signed integers) performs similarly.
The next table shows the operation timings on CPU when all inputs are encrypted
The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:
All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters PARAM_MESSAGE_2_CARRY_2_KS_PBS
. To ensure predictable timings, we perform operations in the default
mode, which ensures that the input and output encoding are similar (i.e., the carries are always emptied).
You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes from the fine-grained APIs, each balancing performance and correctness differently. For more details about parameters, see here. You can find the benchmark results on GPU for all these operations here.
The next table shows the execution time of a keyswitch followed by a programmable bootstrapping depending on the precision of the input message. The associated parameter set is given. The configuration is Concrete FFT + AVX-512.
TFHE-rs benchmarks can be easily reproduced from the source.
AVX512 is now enabled by default for benchmarks when available
The following example shows how to reproduce TFHE-rs benchmarks: