This document summarizes the timings of some homomorphic operations over 64-bit encrypted integers, depending on the hardware. More details are given for the CPU, the GPU, or zeros-knowledge proofs.
The cryptographic parameters used for benchmarking follow a tweaked uniform (TUniform) noise distribution instead of a Gaussian. The main advantage of this distribution is to be bounded, whereas the usual Gaussian one is not. In some practical cases, this can simplify the use of homomorphic computation. See the noise section of the Security and cryptography documentation page for more information on the noise distributions.
You can get the parameters used for benchmarks by cloning the repository and checking out the commit you want to use (starting with the v0.11.0 release) and run the following make command:
This document details the GPU performance benchmarks of homomorphic operations using TFHE-rs.
All GPU benchmarks presented here were obtained on H100 GPUs, and rely on the multithreaded PBS algorithm. The cryptographic parameters PARAM_GPU_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS
were used.
Below come the results for the execution on a single H100. The following table shows the performance when the inputs of the benchmarked operation are encrypted:
The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
Below come the results for the execution on two H100's. The following table shows the performance when the inputs of the benchmarked operation are encrypted:
The following table shows the performance when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size:
The next table shows the execution time of a keyswitch followed by a programmable bootstrapping depending on the precision of the input message. The associated parameter set is given.
Note that these benchmarks use Gaussian parameters.
This document details the performance benchmarks of for using TFHE-rs.
Benchmarks for the zero-knowledge proofs have been run on a m6i.4xlarge
with 16 cores to simulate an usual client configuration. The verification are done on a hpc7a.96xlarge
AWS instances to mimic a powerful server.
This document details the CPU performance benchmarks of homomorphic operations using TFHE-rs.
By their nature, homomorphic operations run slower than their cleartext equivalents. The following are the timings for basic operations, including benchmarks from other libraries for comparison.
All CPU benchmarks were launched on an AWS hpc7a.96xlarge
instance equipped with an AMD EPYC 9R14 CPU @ 2.60GHz
and 740GB of RAM.
The following tables benchmark the execution time of some operation sets using FheUint
(unsigned integers). The FheInt
(signed integers) performs similarly.
The next table shows the operation timings on CPU when all inputs are encrypted
The next table shows the operation timings on CPU when the left input is encrypted and the right is a clear scalar of the same size:
All timings are based on parallelized Radix-based integer operations where each block is encrypted using the default parameters PARAM_MESSAGE_2_CARRY_2_KS_PBS
. To ensure predictable timings, we perform operations in the default
mode, which ensures that the input and output encoding are similar (i.e., the carries are always emptied).
You can minimize operational costs by selecting from 'unchecked', 'checked', or 'smart' modes from the fine-grained APIs, each balancing performance and correctness differently. For more details about parameters, see here. You can find the benchmark results on GPU for all these operations here.
The next table shows the execution time of a keyswitch followed by a programmable bootstrapping depending on the precision of the input message. The associated parameter set is given. The configuration is Concrete FFT + AVX-512.
Note that these benchmarks use Gaussian parameters.
TFHE-rs benchmarks can be easily reproduced from the source.
AVX512 is now enabled by default for benchmarks when available
The following example shows how to reproduce TFHE-rs benchmarks: