TFHE-rs v1.3 - July 2025
Summary
TFHE-rs v1.3.0 introduces several new features both focused on performance and on the usability of the library. The HPU now supports more operations and now has a parameter set matching the CPU and GPU in terms of computation probability of error.
See full details below:
CPU
New features
Add chunked generation for the
LweKeyswitchKey
Add multi bit PBS for 128 bits moduli
Add Atomic Pattern support at the
ClientKey
levelAdd
OverflowingNeg
in the High Level APIAdd compression support after noise squashing
Add modulus switch noise compensation technique and centering
Add a different hashing mode for ZK v2 allowing for faster verification
Add a more granular conformance check for ZK proofs
Add a "key chain" mechanism to update old ciphertexts parameters to newer ones
Improvements
New algorithm for division, 36% improvement for 64 bits division with default parameters, now run in 5.5s vs 8.6s
GPU
New features
All operations now come with a utility function to query how much memory that function will require on GPU:
All integer operations (bitwise operations, comparisons, shift/rotate, cmux, addition, subtraction, multiplication, division, etc.)
Operations on booleans
Compression/decompression
Encrypted random generation
Add support for GPU-accelerated expand on the HL Api
Allow a user to perform computation on multi-gpu using a custom selection of GPUs
Add squash noise in the high level API
Add support to GPU-accelerated expand to CompactCiphextList
Add cuda debug target for integer tests via a Cargo feature
Add move_to_current_device for booleans
Improvements
Fix degrees after abs
Allow to build with both GPU & HPU features enabled
Add indexes to modulus switch noise reduction
Add missing error checks after some kernels
Fix a linking problem on Hopper GPUs
Fix hardcoded use of message modulus in some operations
Fix degrees after bitxor
Prevent nvToolsExt inclusion when not profiling
Fix degrees after scalar bitxor
Fix race condition on expand when on multi-gpu
Fix the packing keyswitch buffer not being allocated on large parameter sets
Fixes
Use cooperative groups based PBS on H100s when possible on large batches
Optimize sum_ciphertexts in cuda backend (ilog2 and scalar div got significant performance improvements thanks to this)
Increase keyswitch occupancy to 100%
HPU
New features
Add modulus-switch noise reduction (centered binary)
Update HPU parameter set to reach a 2^-128 probability of failure, as on CPU & GPU
Add support of most of the missing operations: div, max/min, shift, rot, leading/trailing zeros/ones
Simplify & accelerate FPGA loading by using PCIe instead of loading flash at each bitstream update
Resources
Last updated