TFHE-rs v1.1 - April 2025
Summary
TFHE-rs v1.1.0 brings several new features and improvements on both the CPU & GPU backends:
CPU: This release introduces new scalar operations including CMUX/Select, subtraction with the scalar on the left, and dot product between a vector of Booleans and scalars. It also adds user-friendly APIs to manage noise squashing.
GPU: This release adds 128-bit Programmable Bootstrapping (PBS) and upgrades cryptographic parameters to match the CPU standard, now offering a failure probability of 2⁻¹²⁸ for FHE operations.
See full details below:
Breaking changes
Integer block rotations and block shift primitives' directions have been inverted to fix their meaning.
The NTT for the prime now uses new twiddle factors, allowing bit shifts instead of multiplications. Older NTT keys are now incompatible.
CPU
New features
Add scalar subtraction with the scalar as the left operand in the integer and High-Level API
Add scalar
Select
in the integer and High-Level API, allowing use of scalar valuesAdd dot product between vectors of
FheBool
Add trivial encrypt/decrypt support for string types
Add chunked
LweBootstrapKey
andSeededLweBootstrapKey
generation for memory-constrained systemsAdd a noise squashing API in the integer and High-Level API to support use cases requiring noise flooding
Add the
extended-types
feature, enabling more static typing in the High-Level APIAdd GLWE keyswitch primitives
Improvements
The NTT for the Solinas prime now uses twiddles enabling bit shifts instead of costly multiplications
Removed usage of
unwrap
in various conformance checks
Fixes
Fix a corner case in encryption where negative values were sometimes not sign-extended
GPU
New features
Implement
fft128
in the CUDA backendImplement 128-bit classic PBS
Improvements
Add modulus-switch noise reduction on GPU for the classical PBS
Update GPU cryptographic parameters to reach a 2⁻¹²⁸ probability of failure, as on CPU
Use hexes to initialize twiddles for 64-bit FFT for better precision
Refactor
double2
operators to use CUDA intrinsics and match CPU floating-point arithmeticTrack degree and noise level in all integer operations in the CUDA backend
Fix block comparison logic with zero to match the CPU implementation
Retain LUT indexes on the CPU for each LUT application to avoid copying them back from GPU
Add alias for GPU compression parameters
Detect first/last iteration of split-kernel multi-bit & classical PBS via template argument
Detect first/last iteration of 128-bit PBS via template argument
Modify integer & ERC20 throughput benchmarks for better multi-GPU performance
Fixes
Fix max shared memory bug for cooperative-groups PBS
Resources
Last updated