TFHE-rs v1.3 - July 2025

Summary

TFHE-rs v1.3.0 introduces several new features both focused on performance and on the usability of the library. The HPU now supports more operations and now has a parameter set matching the CPU and GPU in terms of computation probability of error.

See full details below:

CPU

New features

Add chunked generation for the LweKeyswitchKey
Add multi bit PBS for 128 bits moduli
Add Atomic Pattern support at the ClientKey level
Add OverflowingNeg in the High Level API
Add compression support after noise squashing
Add modulus switch noise compensation technique and centering
Add a different hashing mode for ZK v2 allowing for faster verification
Add a more granular conformance check for ZK proofs
Add a "key chain" mechanism to update old ciphertexts parameters to newer ones

Improvements

New algorithm for division, 36% improvement for 64 bits division with default parameters, now run in 5.5s vs 8.6s

GPU

New features

All operations now come with a utility function to query how much memory that function will require on GPU:
- All integer operations (bitwise operations, comparisons, shift/rotate, cmux, addition, subtraction, multiplication, division, etc.)
- Operations on booleans
- Compression/decompression
- Encrypted random generation
Add support for GPU-accelerated expand on the HL Api
Allow a user to perform computation on multi-gpu using a custom selection of GPUs
Add squash noise in the high level API
Add support to GPU-accelerated expand to CompactCiphextList
Add cuda debug target for integer tests via a Cargo feature
Add move_to_current_device for booleans

Improvements

Fix degrees after abs
Allow to build with both GPU & HPU features enabled
Add indexes to modulus switch noise reduction
Add missing error checks after some kernels
Fix a linking problem on Hopper GPUs
Fix hardcoded use of message modulus in some operations
Fix degrees after bitxor
Prevent nvToolsExt inclusion when not profiling
Fix degrees after scalar bitxor
Fix race condition on expand when on multi-gpu
Fix the packing keyswitch buffer not being allocated on large parameter sets

Fixes

Use cooperative groups based PBS on H100s when possible on large batches
Optimize sum_ciphertexts in cuda backend (ilog2 and scalar div got significant performance improvements thanks to this)
Increase keyswitch occupancy to 100%

HPU

New features

Add modulus-switch noise reduction (centered binary)
Update HPU parameter set to reach a 2^-128 probability of failure, as on CPU & GPU
Add support of most of the missing operations: div, max/min, shift, rot, leading/trailing zeros/ones
Simplify & accelerate FPGA loading by using PCIe instead of loading flash at each bitstream update

Resources

PreviousTFHE-rs v1.4 - October 2025 NextTFHE-rs v1.2 - May 2025

Last updated 19 minutes ago