TFHE-rs
WebsiteLibrariesProduct & ServicesDevelopersSupport
0.5
0.5
  • What is TFHE-rs?
  • Getting Started
    • Installation
    • Quick Start
    • Types & Operations
    • Benchmarks
    • Security and Cryptography
  • Tutorials
    • Homomorphic Parity Bit
    • Homomorphic Case Changing on Ascii String
  • How To
    • Run on GPU
    • Configure Rust
    • Detect Overflow
    • Serialize/Deserialize
    • Migrate Data to Newer Versions of TFHE-rs
    • Compress Ciphertexts/Keys
    • Use Public Key Encryption
    • Use Trivial Ciphertext
    • Generic Function Bounds
    • Use Parallelized PBS
    • Use the C API
    • Use the JS on WASM API
    • Use multi-threading using the rayon crate
    • Debug
  • Fine-grained APIs
    • Quick Start
    • Boolean
      • Operations
      • Cryptographic Parameters
      • Serialization/Deserialization
    • Shortint
      • Operations
      • Cryptographic Parameters
      • Serialization/Deserialization
    • Integer
      • Operations
      • Cryptographic Parameters
      • Serialization/Deserialization
  • Application Tutorials
    • SHA256 with Boolean API
    • Dark Market with Integer API
    • Homomorphic Regular Expressions Integer API
  • Crypto Core API [Advanced users]
    • Quick Start
    • Tutorial
  • Developers
    • Contributing
  • API references
    • docs.rs
Powered by GitBook

Libraries

  • TFHE-rs
  • Concrete
  • Concrete ML
  • fhEVM

Developers

  • Blog
  • Documentation
  • Github
  • FHE resources

Company

  • About
  • Introduction to FHE
  • Media
  • Careers
On this page
  • Prerequisites
  • Importing to your project
  • Supported platforms
  • A first example
  • Configuring and creating keys.
  • Setting the keys
  • Encrypting data
  • Computation.
  • Decryption.
  • Improving performance.
  • List of available operations

Was this helpful?

Export as PDF
  1. How To

Run on GPU

PreviousHomomorphic Case Changing on Ascii StringNextConfigure Rust

Last updated 11 months ago

Was this helpful?

TFHE-rs now includes a GPU backend, featuring a CUDA implementation for performing integer arithmetics on encrypted data. In what follows, a simple tutorial is introduced: it shows how to update your existing program to use GPU acceleration, or how to start a new one using GPU.

Prerequisites

  • Cuda version >= 10

  • Compute Capability >= 3.0

  • >= 8.0 - check this for more details about nvcc/gcc compatible versions

  • >= 3.24

  • Rust version - check this

Importing to your project

To use the TFHE-rs GPU backend in your project, you first need to add it as a dependency in your Cargo.toml.

If you are using an x86 machine:

tfhe = { version = "0.5.5", features = [ "boolean", "shortint", "integer", "x86_64-unix", "gpu" ] }

If you are using an ARM machine:

tfhe = { version = "0.5.5", features = [ "boolean", "shortint", "integer", "aarch64-unix", "gpu" ] }

When running code that uses TFHE-rs, it is highly recommended to run in release mode with cargo's --release flag to have the best possible performance

Supported platforms

TFHE-rs GPU backend is supported on Linux (x86, aarch64).

OS
x86
aarch64

Linux

x86_64-unix

aarch64-unix*

macOS

Unsupported

Unsupported*

Windows

Unsupported

Unsupported

A first example

Configuring and creating keys.

Here is a full example (combining the client and server parts):

use tfhe::{ConfigBuilder, set_server_key, FheUint8, ClientKey, CompressedServerKey};
use tfhe::prelude::*;

fn main() {

    let config = ConfigBuilder::default().build();

    let client_key= ClientKey::generate(config);
    let compressed_server_key = CompressedServerKey::new(&client_key);

    let gpu_key = compressed_server_key.decompress_to_gpu();

    let clear_a = 27u8;
    let clear_b = 128u8;

    let a = FheUint8::encrypt(clear_a, &client_key);
    let b = FheUint8::encrypt(clear_b, &client_key);

    //Server-side

    set_server_key(gpu_key);
    let result = a + b;

    //Client-side
    let decrypted_result: u8 = result.decrypt(&client_key);

    let clear_result = clear_a + clear_b;

    assert_eq!(decrypted_result, clear_result);
}

Setting the keys

The configuration of the key is different from the CPU. More precisely, if both client and server keys are still generated by the Client (which is assumed to run on a CPU), the server key has then to be decompressed by the Server to be converted into the right format. To do so, the server should run this function: decompressed_to_gpu(). From then on, there is no difference between the CPU and the GPU.

Encrypting data

On the client-side, the method to encrypt the data is exactly the same than the CPU one, i.e.:

    let clear_a = 27u8;
    let clear_b = 128u8;

    let a = FheUint8::encrypt(clear_a, &client_key);
    let b = FheUint8::encrypt(clear_b, &client_key);

Computation.

    //Server-side
    set_server_key(gpu_key);
    let result = a + b;

    //Client-side
    let decrypted_result: u8 = result.decrypt(&client_key);

    let clear_result = clear_a + clear_b;

    assert_eq!(decrypted_result, clear_result);

Decryption.

Finally, the client gets the decrypted results by computing:

    let decrypted_result: u8 = result.decrypt(&client_key);

Improving performance.

TFHE-rs includes the possibility to leverage the high number of threads given by a GPU. To do so, the configuration should be updated with Rust let config = ConfigBuilder::with_custom_parameters(PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS, None).build(); The complete example becomes:

use tfhe::{ConfigBuilder, set_server_key, FheUint8, ClientKey, CompressedServerKey};
use tfhe::prelude::*;
use tfhe::shortint::parameters::PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS;

fn main() {

    let config = ConfigBuilder::with_custom_parameters(PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS, None).build();

    let client_key= ClientKey::generate(config);
    let compressed_server_key = CompressedServerKey::new(&client_key);

    let gpu_key = compressed_server_key.decompress_to_gpu();

    let clear_a = 27u8;
    let clear_b = 128u8;

    let a = FheUint8::encrypt(clear_a, &client_key);
    let b = FheUint8::encrypt(clear_b, &client_key);

    //Server-side

    set_server_key(gpu_key);
    let result = a + b;

    //Client-side
    let decrypted_result: u8 = result.decrypt(&client_key);

    let clear_result = clear_a + clear_b;

    assert_eq!(decrypted_result, clear_result);
}

List of available operations

The GPU backend includes the following operations:

name

symbol

Enc/Enc

Enc/ Int

Neg

-

N/A

Add

+

Sub

-

Mul

*

Div

/

Rem

%

Not

!

N/A

BitAnd

&

BitOr

|

BitXor

^

Shr

>>

Shl

<<

Rotate right

rotate_right

Rotate left

rotate_left

Min

min

Max

max

Greater than

gt

Greater or equal than

ge

Lower than

lt

Lower or equal than

le

Equal

eq

Cast (into dest type)

cast_into

N/A

Cast (from src type)

cast_from

N/A

Ternary operator

if_then_else

Benchmarks

The tables below contain benchmarks for homomorphic operations running on a single V100 from AWS (p3.2xlarge machines), with the default parameters:

Operation \ Size
FheUint8
FheUint16
FheUint32
FheUint64
FheUint128
FheUint256

cuda_add

103.33 ms

129.26 ms

156.83 ms

186.99 ms

320.96 ms

528.15 ms

cuda_bitand

26.11 ms

26.21 ms

26.63 ms

27.24 ms

43.07 ms

65.01 ms

cuda_bitor

26.1 ms

26.21 ms

26.57 ms

27.23 ms

43.05 ms

65.0 ms

cuda_bitxor

26.08 ms

26.21 ms

26.57 ms

27.25 ms

43.06 ms

65.07 ms

cuda_eq

52.82 ms

53.0 ms

79.4 ms

79.58 ms

96.37 ms

145.25 ms

cuda_ge

104.7 ms

130.23 ms

156.19 ms

183.2 ms

213.43 ms

288.76 ms

cuda_gt

104.93 ms

130.2 ms

156.33 ms

183.38 ms

213.47 ms

288.8 ms

cuda_le

105.14 ms

130.47 ms

156.48 ms

183.44 ms

213.33 ms

288.75 ms

cuda_lt

104.73 ms

130.23 ms

156.2 ms

183.14 ms

213.33 ms

288.74 ms

cuda_max

156.7 ms

182.65 ms

210.74 ms

251.78 ms

316.9 ms

442.71 ms

cuda_min

156.85 ms

182.67 ms

210.39 ms

252.02 ms

316.96 ms

442.95 ms

cuda_mul

219.73 ms

302.11 ms

465.91 ms

955.66 ms

2.71 s

9.15 s

cuda_ne

52.72 ms

52.91 ms

79.28 ms

79.59 ms

96.37 ms

145.36 ms

cuda_neg

103.26 ms

129.4 ms

157.19 ms

187.09 ms

321.27 ms

530.11 ms

cuda_sub

103.34 ms

129.42 ms

156.87 ms

187.01 ms

321.04 ms

528.13 ms

In comparison with the , the only difference lies into the key creation, which is detailed

The server must first set its keys up, like in the CPU, with: set_server_key(gpu_key); . Then, homomorphic computations are done with the same code than the one described .

All operations follow the same syntax than the one described in .

gcc
page
cmake
page
here
here
CPU example
here
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✖️
✖️
✖️
✖️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✖️
✔️
✖️
✔️
✖️
✔️
✖️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✔️
✖️
✖️
✔️
✖️