TFHE-rs v1.1 - April 2025

Summary

TFHE-rs v1.1.0 brings several new features and improvements on both the CPU & GPU backends:

CPU: This release introduces new scalar operations including CMUX/Select, subtraction with the scalar on the left, and dot product between a vector of Booleans and scalars. It also adds user-friendly APIs to manage noise squashing.
GPU: This release adds 128-bit Programmable Bootstrapping (PBS) and upgrades cryptographic parameters to match the CPU standard, now offering a failure probability of 2⁻¹²⁸ for FHE operations.

See full details below:

Integer block rotations and block shift primitives' directions have been inverted to fix their meaning.
The NTT for the prime $2^{64} - 2^{32} + 1$ now uses new twiddle factors, allowing bit shifts instead of multiplications. Older NTT keys are now incompatible.

Add scalar subtraction with the scalar as the left operand in the integer and High-Level API
Add scalar Select in the integer and High-Level API, allowing use of scalar values
Add dot product between vectors of FheBool
Add trivial encrypt/decrypt support for string types
Add chunked LweBootstrapKey and SeededLweBootstrapKey generation for memory-constrained systems
Add a noise squashing API in the integer and High-Level API to support use cases requiring noise flooding
Add the extended-types feature, enabling more static typing in the High-Level API
Add GLWE keyswitch primitives

The NTT for the Solinas prime $2^{64} - 2^{32} + 1$ now uses twiddles enabling bit shifts instead of costly multiplications
Removed usage of unwrap in various conformance checks

Fix a corner case in encryption where negative values were sometimes not sign-extended

Add modulus-switch noise reduction on GPU for the classical PBS
Update GPU cryptographic parameters to reach a 2⁻¹²⁸ probability of failure, as on CPU
Use hexes to initialize twiddles for 64-bit FFT for better precision
Refactor double2 operators to use CUDA intrinsics and match CPU floating-point arithmetic
Track degree and noise level in all integer operations in the CUDA backend
Fix block comparison logic with zero to match the CPU implementation
Retain LUT indexes on the CPU for each LUT application to avoid copying them back from GPU
Add alias for GPU compression parameters
Detect first/last iteration of split-kernel multi-bit & classical PBS via template argument
Detect first/last iteration of 128-bit PBS via template argument
Modify integer & ERC20 throughput benchmarks for better multi-GPU performance

Last updated 11 minutes ago