TFHE-rs v1.1 - April 2025

Summary


TFHE-rs v1.1.0 brings several new features and improvements on both the CPU & GPU backends:

  • CPU: This release introduces new scalar operations including CMUX/Select, subtraction with the scalar on the left, and dot product between a vector of Booleans and scalars. It also adds user-friendly APIs to manage noise squashing.

  • GPU: This release adds 128-bit Programmable Bootstrapping (PBS) and upgrades cryptographic parameters to match the CPU standard, now offering a failure probability of 2⁻¹²⁸ for FHE operations.

See full details below:

Breaking changes

CPU


New features

  • Add scalar subtraction with the scalar as the left operand in the integer and High-Level API

  • Add scalar Select in the integer and High-Level API, allowing use of scalar values

  • Add dot product between vectors of FheBool

  • Add trivial encrypt/decrypt support for string types

  • Add chunked LweBootstrapKey and SeededLweBootstrapKey generation for memory-constrained systems

  • Add a noise squashing API in the integer and High-Level API to support use cases requiring noise flooding

  • Add the extended-types feature, enabling more static typing in the High-Level API

  • Add GLWE keyswitch primitives

Improvements

  • The NTT for the Solinas prime 264232+12^{64} - 2^{32} + 1 now uses twiddles enabling bit shifts instead of costly multiplications

  • Removed usage of unwrap in various conformance checks

Fixes

  • Fix a corner case in encryption where negative values were sometimes not sign-extended

GPU


New features

  • Implement fft128 in the CUDA backend

  • Implement 128-bit classic PBS

Improvements

  • Add modulus-switch noise reduction on GPU for the classical PBS

  • Update GPU cryptographic parameters to reach a 2⁻¹²⁸ probability of failure, as on CPU

  • Use hexes to initialize twiddles for 64-bit FFT for better precision

  • Refactor double2 operators to use CUDA intrinsics and match CPU floating-point arithmetic

  • Track degree and noise level in all integer operations in the CUDA backend

  • Fix block comparison logic with zero to match the CPU implementation

  • Retain LUT indexes on the CPU for each LUT application to avoid copying them back from GPU

  • Add alias for GPU compression parameters

  • Detect first/last iteration of split-kernel multi-bit & classical PBS via template argument

  • Detect first/last iteration of 128-bit PBS via template argument

  • Modify integer & ERC20 throughput benchmarks for better multi-GPU performance

Fixes

  • Fix max shared memory bug for cooperative-groups PBS

Resources

Last updated