TFHE-rs v1.3 - July 2025

Summary


TFHE-rs v1.3.0 introduces several new features both focused on performance and on the usability of the library. The HPU now supports more operations and now has a parameter set matching the CPU and GPU in terms of computation probability of error.

See full details below:

CPU


New features

  • Add chunked generation for the LweKeyswitchKey

  • Add multi bit PBS for 128 bits moduli

  • Add Atomic Pattern support at the ClientKey level

  • Add OverflowingNeg in the High Level API

  • Add compression support after noise squashing

  • Add modulus switch noise compensation technique and centering

  • Add a different hashing mode for ZK v2 allowing for faster verification

  • Add a more granular conformance check for ZK proofs

  • Add a "key chain" mechanism to update old ciphertexts parameters to newer ones

Improvements

New algorithm for division, 36% improvement for 64 bits division with default parameters, now run in 5.5s vs 8.6s

GPU


New features

  • All operations now come with a utility function to query how much memory that function will require on GPU:

    • All integer operations (bitwise operations, comparisons, shift/rotate, cmux, addition, subtraction, multiplication, division, etc.)

    • Operations on booleans

    • Compression/decompression

    • Encrypted random generation

  • Add support for GPU-accelerated expand on the HL Api

  • Allow a user to perform computation on multi-gpu using a custom selection of GPUs

  • Add squash noise in the high level API

  • Add support to GPU-accelerated expand to CompactCiphextList

  • Add cuda debug target for integer tests via a Cargo feature

  • Add move_to_current_device for booleans

Improvements

  • Fix degrees after abs

  • Allow to build with both GPU & HPU features enabled

  • Add indexes to modulus switch noise reduction

  • Add missing error checks after some kernels

  • Fix a linking problem on Hopper GPUs

  • Fix hardcoded use of message modulus in some operations

  • Fix degrees after bitxor

  • Prevent nvToolsExt inclusion when not profiling

  • Fix degrees after scalar bitxor

  • Fix race condition on expand when on multi-gpu

  • Fix the packing keyswitch buffer not being allocated on large parameter sets

Fixes

  • Use cooperative groups based PBS on H100s when possible on large batches

  • Optimize sum_ciphertexts in cuda backend (ilog2 and scalar div got significant performance improvements thanks to this)

  • Increase keyswitch occupancy to 100%

HPU


New features

  • Add modulus-switch noise reduction (centered binary)

  • Update HPU parameter set to reach a 2^-128 probability of failure, as on CPU & GPU

  • Add support of most of the missing operations: div, max/min, shift, rot, leading/trailing zeros/ones

  • Simplify & accelerate FPGA loading by using PCIe instead of loading flash at each bitstream update

Resources


Last updated