Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
To use TFHE-rs
in your project, you first need to add it as a dependency in your Cargo.toml
.
If you are using an x86
machine:
If you are using an ARM
machine:
You need to use a Rust version >= 1.72 to compile TFHE-rs.
When running code that uses TFHE-rs
, it is highly recommended to run in release mode with cargo's --release
flag to have the best possible performance
TFHE-rs is supported on Linux (x86, aarch64), macOS (x86, aarch64) and Windows (x86 with RDSEED
instruction).
OS | x86 | aarch64 |
---|---|---|
Linux
x86_64-unix
aarch64-unix
*
macOS
x86_64-unix
aarch64-unix
*
Windows
x86_64
Unsupported
The goal of this tutorial is to build a data type that represents a ASCII string in FHE while implementing the to_lower
and to_upper
functions.
An ASCII character is stored in 7 bits. To store an encrypted ASCII we use the FheUint8
.
The uppercase letters are in the range [65, 90]
The lowercase letters are in the range [97, 122]
lower_case = upper_case + UP_LOW_DISTANCE
<=> upper_case = lower_case - UP_LOW_DISTANCE
Where UP_LOW_DISTANCE = 32
This type will hold the encrypted characters as a Vec<FheUint8>
to implement the functions that change the case.
To use the FheUint8
type, the integer
feature must be activated:
Other configurations can be found here.
In the FheAsciiString::encrypt
function, some data validation is done:
The input string can only contain ascii characters.
It is not possible to branch on an encrypted value, however it is possible to evaluate a boolean condition and use it to get the desired result. Checking if the 'char' is an uppercase letter to modify it to a lowercase can be done without using a branch, like this:
We can remove the branch this way:
On an homomorphic integer, this gives
The whole code is:
Due to their nature, homomorphic operations are naturally slower than their cleartext equivalents. Some timings are exposed for basic operations. For completeness, benchmarks for other libraries are also given.
All benchmarks were launched on an AWS hpc7a.96xlarge instance with the following specifications: AMD EPYC 9R14 CPU @ 2.60GHz and 740GB of RAM.
This measures the execution time for some operation sets of tfhe-rs::integer (the unsigned version). Note that the timings for FheInt
(i.e., the signed integers) are similar.
The table below reports the timing when the inputs of the benchmarked operation are encrypted.
The table below reports the timing when the left input of the benchmarked operation is encrypted and the other is a clear scalar of the same size.
All timings are related to parallelized Radix-based integer operations, where each block is encrypted using the default parameters (i.e., PARAM_MESSAGE_2_CARRY_2_KS_PBS, more information about parameters can be found here). To ensure predictable timings, the operation flavor is the default
one: the carry is propagated if needed. The operation costs may be reduced by using unchecked
, checked
, or smart
.
This measures the execution time for some operations using various parameter sets of tfhe-rs::shortint. Except for unchecked_add
, all timings are related to the default
operations. This flavor ensures predictable timings for an operation along the entire circuit by clearing the carry space after each operation.
This uses the Concrete FFT + AVX-512 configuration.
This measures the execution time of a single binary Boolean gate.
Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are:
Following the official instructions from OpenFHE, clang14
and the following command are used to setup the project: cmake -DNATIVE_SIZE=32 -DWITH_NATIVEOPT=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DWITH_OPENMP=OFF ..
To use the HEXL library, the configuration used is as follows:
Using the same hpc7a.96xlarge machine as the one for tfhe-rs, the timings are:
TFHE-rs benchmarks can be easily reproduced from source.
If the host machine does not support AVX512, then turning on AVX512_SUPPORT
will not provide any speed-up.
📁 Github | 💛 Community support | 🟨 Zama Bounty Program
TFHE-rs is a pure Rust implementation of TFHE for Boolean and integer arithmetics over encrypted data. It includes a Rust and C API, as well as a client-side WASM API.
TFHE-rs is meant for developers and researchers who want full control over what they can do with TFHE, while not worrying about the low level implementation.
The goal is to have a stable, simple, high-performance, and production-ready library for all the advanced features of TFHE.
The TFHE-rs library implements Zama’s variant of Fully Homomorphic Encryption over the Torus (TFHE). TFHE is based on Learning With Errors (LWE), a well-studied cryptographic primitive believed to be secure even against quantum computers.
In cryptography, a raw value is called a message (also sometimes called a cleartext), while an encoded message is called a plaintext and an encrypted plaintext is called a ciphertext.
Zama's variant of TFHE is fully homomorphic and deals with fixed-precision numbers as messages. It implements all needed homomorphic operations, such as addition and function evaluation via Programmable Bootstrapping. You can read more about Zama's TFHE variant in the preliminary whitepaper.
Using FHE in a Rust program with TFHE-rs consists in:
generating a client key and a server key using secure parameters:
a client key encrypts/decrypts data and must be kept secret
a server key is used to perform operations on encrypted data and could be public (also called an evaluation key)
encrypting plaintexts using the client key to produce ciphertexts
operating homomorphically on ciphertexts with the server key
decrypting the resulting ciphertexts into plaintexts using the client key
If you would like to know more about the problems that FHE solves, we suggest you review our 6 minute introduction to homomorphic encryption.
TFHE-rs includes a list of specific operations to detect overflows. The overall idea is to have a specific ciphertext encrypting a flag reflecting the status of the computations. When an overflow occurs, this flag is set to true. Since the server is not able to evaluate this value (since it is encrypted), the client has to check the flag value when decrypting to determine if an overflow has happened. These operations might be slower than their equivalent which do not detect overflow, so they are not enabled by default (see the table below). In order to use them, specific operators must be called. At the moment, only additions, subtractions, multiplications are supported. Missing operations will be added soon.
The list of operations along with their symbol is:
name | symbol | type |
---|
These operations are then used exactly in the same way than the usual ones. The only difference lies into the decryption, as shown in following example:
The current benchmarks are given in the following tables (the first one for unsigned homomorphic integers and the second one for the signed integers):
TFHE-rs now includes a GPU backend, featuring a CUDA implementation for performing integer arithmetics on encrypted data. In what follows, a simple tutorial is introduced: it shows how to update your existing program to use GPU acceleration, or how to start a new one using GPU.
Cuda version >= 10
Compute Capability >= 3.0
>= 8.0 - check this for more details about nvcc/gcc compatible versions
>= 3.24
Rust version - check this
To use the TFHE-rs GPU backend
in your project, you first need to add it as a dependency in your Cargo.toml
.
If you are using an x86
machine:
If you are using an ARM
machine:
When running code that uses TFHE-rs
, it is highly recommended to run in release mode with cargo's --release
flag to have the best possible performance
TFHE-rs GPU backend is supported on Linux (x86, aarch64).
Here is a full example (combining the client and server parts):
The configuration of the key is different from the CPU. More precisely, if both client and server keys are still generated by the Client (which is assumed to run on a CPU), the server key has then to be decompressed by the Server to be converted into the right format. To do so, the server should run this function: decompressed_to_gpu()
. From then on, there is no difference between the CPU and the GPU.
On the client-side, the method to encrypt the data is exactly the same than the CPU one, i.e.:
Finally, the client gets the decrypted results by computing:
TFHE-rs includes the possibility to leverage the high number of threads given by a GPU. To do so, the configuration should be updated with Rust let config = ConfigBuilder::with_custom_parameters(PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS, None).build();
The complete example becomes:
The GPU backend includes the following operations:
The tables below contain benchmarks for homomorphic operations running on a single V100 from AWS (p3.2xlarge machines), with the default parameters:
The basic steps for using the high-level API of TFHE-rs are:
Importing the TFHE-rs prelude;
Client-side: Configuring and creating keys;
Client-side: Encrypting data;
Server-side: Setting the server key;
Server-side: Computing over encrypted data;
Client-side: Decrypting data.
Here is a full example (combining the client and server parts):
The default configuration for x86 Unix machines:
tfhe
uses traits
to have a consistent API for creating FHE types and enable users to write generic functions. To be able to use associated functions and methods of a trait, the trait has to be in scope.
To make it easier, the prelude
'pattern' is used. All of the important tfhe
traits are in a prelude
module that you can glob import. With this, there is no need to remember or know the traits that you want to import.
The first step is the creation of the configuration. The configuration is used to declare which type you will (or will not) use, as well as enabling you to use custom crypto-parameters for these types. Custom parameters should only be used for more advanced usage and/or testing.
A configuration can be created by using the ConfigBuilder type.
The config is generated by first creating a builder with all types deactivated. Then, the integer types with default parameters are activated, since we are going to use FheUint8 values.
The generate_keys
command returns a client key and a server key.
The client_key
is meant to stay private and not leave the client, whereas the server_key
can be made public and sent to a server for it to enable FHE computations.
The next step is to call set_server_key
This function will move the server key to an internal state of the crate and manage the details to give a simpler interface.
Encrypting data is achieved via the encrypt
associated function of the FheEncrypt trait.
Types exposed by this crate implement at least one of FheEncrypt or FheTryEncrypt to allow encryption.
Computations should be as easy as normal Rust to write, thanks to the usage of operator overloading.
The decryption is achieved by using the decrypt
method, which comes from the FheDecrypt trait.
This example is dedicated to the building of a small function that homomorphically computes a parity bit.
First, a non-generic function is written. Then, generics are used to handle the case where the function inputs are both FheBool
s and clear bool
s.
The parity bit function takes as input two parameters:
A slice of Boolean
A mode (Odd
or Even
)
This function returns a Boolean that will be either true
or false
so that the sum of Booleans (in the input and the returned one) is either an Odd
or Even
number, depending on the requested mode.
Other configurations can be found .
First, the verification function is defined.
The way to find the parity bit is to initialize it to false, then
XOR
it with all the bits, one after the other, adding negation depending on the requested mode.
A validation function is also defined to sum together the number of the bit set within the input with the computed parity bit and check that the sum is an even or odd number, depending on the mode.
After the mandatory configuration steps, the function is called:
To make the compute_parity_bit
function compatible with both FheBool
and bool
, generics have to be used.
Writing a generic function that accepts FHE
types as well as clear types can help test the function to see if it is correct. If the function is generic, it can run with clear data, allowing the use of print-debugging or a debugger to spot errors.
Writing generic functions that use operator overloading for our FHE types can be trickier than normal, since FHE
types are not copy. So using the reference &
is mandatory, even though this is not the case when using native types, which are all Copy
.
This will make the generic bounds trickier at first.
The function has the following signature:
To make it generic, the first step is:
Next, the generic bounds have to be defined with the where
clause.
In the function, the following operators are used:
!
(trait: Not
)
^
(trait: BitXor
)
By adding them to where
, this gives:
However, the compiler will complain:
fhe_bit
is a reference to a BoolType
(&BoolType
) since it is borrowed from the fhe_bits
slice when iterating over its elements. The first try is to change the BitXor
bounds to what the Compiler suggests by requiring &BoolType
to implement BitXor
and not BoolType
.
The Compiler is still not happy:
The way to fix this is to use Higher-Rank Trait Bounds
:
The final code will look like this:
Here is a complete example that uses this function for both clear and FHE values:
TFHE-rs
is a cryptographic library dedicated to Fully Homomorphic Encryption. As its name suggests, it is based on the TFHE scheme.
It is necessary to understand some basics about TFHE in order to consider the limitations. Of particular importance are the precision (number of bits used to represent plaintext values) and execution time (why TFHE operations are slower than native operations).
Although there are many kinds of ciphertexts in TFHE, all of the encrypted values in TFHE-rs
are mainly stored as LWE ciphertexts.
The security of TFHE relies on the LWE problem, which stands for Learning With Errors. The problem is believed to be secure against quantum attacks.
An LWE Ciphertext is a collection of 32-bit or 64-bit unsigned integers. Before encrypting a message in an LWE ciphertext, one must first encode it as a plaintext. This is done by shifting the message to the most significant bits of the unsigned integer type used.
Then, a small random value called noise is added to the least significant bits. This noise is crucial in ensuring the security of the ciphertext.
To go from a plaintext to a ciphertext, one must encrypt the plaintext using a secret key.
An LWE ciphertext is composed of two parts:
The mask of a fresh ciphertext (one that is the result of an encryption, and not of an operation such as ciphertext addition) is a list of n
uniformly random values.
The body is computed as follows:
Now that the encryption scheme is defined, let's review the example of the addition between ciphertexts to illustrate why it is slower to compute over encrypted data.
To add two ciphertexts, we must add their $mask$ and $body$:
In FHE, there are two types of operations that can be applied to ciphertexts:
leveled operations, which increase the noise in the ciphertext
bootstrapped operations, which reduce the noise in the ciphertext
In FHE, noise must be tracked and managed to guarantee the correctness of the computation.
Bootstrapping operations are used across the computation to decrease noise within the ciphertexts, preventing it from tampering with the message. The rest of the operations are called leveled because they do not need bootstrapping operations and are usually very fast as a result.
The following sections explain the concept of noise and padding in ciphertexts.
For it to be secure, LWE requires random noise to be added to the message at encryption time.
In TFHE, this random noise is drawn from a Centered Normal Distribution, parameterized by a standard deviation. The chosen standard deviation has an impact on the security level. With everything else fixed, increasing the standard deviation will lead to an increase in the security level.
In TFHE-rs
, noise is encoded in the least significant bits of each plaintext. Each leveled computation increases the value of the noise. If too many computations are performed, the noise will eventually overflow into the message bits and lead to an incorrect result.
The figure below illustrates this problem in the case of an addition, where an extra bit of noise is incurred as a result.
TFHE-rs
offers the ability to automatically manage noise by performing bootstrapping operations to reset the noise.
Since encoded values have a fixed precision, operating on them can produce results that are outside of the original interval. To avoid losing precision or wrapping around the interval, TFHE-rs
uses additional bits by defining bits of padding on the most significant bits.
As an example, consider adding two ciphertexts. Adding two values could end up outside the range of either ciphertext, and thus necessitate a carry, which would then be carried onto the first padding bit. In the figure below, each plaintext over 32 bits has one bit of padding on its left (i.e., the most significant bit). After the addition, the padding bit is no longer available, as it has been used in order for the carry. This is referred to as consuming bits of padding. Since no padding is left, there is no guarantee that further additions would yield correct results.
TFHE-rs
includes two main types to represent encrypted data:
FheUint
: this is the homomorphic equivalent of Rust unsigned integers u8, u16, ...
FheInt
: this is the homomorphic equivalent of Rust (signed) integers i8, i16, ...
In the same manner as many programming languages, the number of bits used to represent the data must be chosen when declaring a variable. For instance:
The table below contains an overview of the available operations in TFHE-rs
. The notation Enc
(for Encypted) either refers to FheInt
or FheUint
, for any size between 1 and 256-bits.
More details, and further examples, are given in the following sections.
In TFHE-rs
, integers are used to encrypt all messages which are larger than 4 bits. All supported operations are listed below.
Homomorphic integer types support arithmetic operations.
The list of supported operations is:
A simple example of how to use these operations:
Homomorphic integer types support some bitwise operations.
The list of supported operations is:
A simple example of how to use these operations:
Homomorphic integers support comparison operations.
Due to some Rust limitations, it is not possible to overload the comparison symbols because of the inner definition of the operations. This is because Rust expects to have a Boolean as an output, whereas a ciphertext is returned when using homomorphic types.
You will need to use different methods instead of using symbols for the comparisons. These methods follow the same naming conventions as the two standard Rust traits:
The list of supported operations is:
A simple example of how to use these operations:
Homomorphic integers support the min/max operations.
A simple example of how to use these operations:
The ternary conditional operator allows computing conditional instructions of the form if cond { choice_if } else { choice_else }
.
The syntax is encrypted_condition.if_then_else(encrypted_choice_if, encrypted_choice_else)
. The encrypted_condition
should be an encryption of 0 or 1 in order to be valid.
Casting between integer types is possible via the cast_from
associated function or the cast_into
method.
Native homomorphic Booleans support common Boolean operations.
The list of supported operations is:
TFHE-rs only requires a nightly toolchain for building the C API and using advanced SIMD instructions, otherwise you can use a stable toolchain (with version >= 1.72) Install the needed Rust toolchain:
Then, you can either:
Manually specify the toolchain to use in each of the cargo commands:
Or override the toolchain to use for the current project:
To check the toolchain that Cargo will use by default, you can use the following command:
TFHE-rs
exposes different cargo features
to customize the types and features used.
This crate exposes two kinds of data types. Each kind is enabled by activating its corresponding feature in the TOML line. Each kind may have multiple types:
Parameter set | PARAM_MESSAGE_1_CARRY_1 | PARAM_MESSAGE_2_CARRY_2 | PARAM_MESSAGE_3_CARRY_3 | PARAM_MESSAGE_4_CARRY_4 |
---|---|---|---|---|
Parameter set | Concrete FFT + AVX-512 |
---|---|
Parameter set | spqlios-fma |
---|---|
Parameter set | GINX | GINX w/ Intel HEXL |
---|---|---|
The idea of homomorphic encryption is that you can compute on ciphertexts while not knowing messages encrypted within them. A scheme is said to be fully homomorphic, meaning any program can be evaluated with it, if at least two of the following operations are supported ( is a plaintext and is the corresponding ciphertext):
homomorphic univariate function evaluation:
homomorphic addition:
homomorphic multiplication:
Operation\Size | FheUint8 | FheUint16 | FheUint32 | FheUint64 | FheUint128 | FheUint256 |
---|
Operation\Size | FheInt8 | FheInt16 | FheInt32 | FheInt64 | FheInt128 | FheInt256 |
---|
OS | x86 | aarch64 |
---|
In comparison with the , the only difference lies into the key creation, which is detailed
The server must first set its keys up, like in the CPU, with: set_server_key(gpu_key);
. Then, homomorphic computations are done with the same code than the one described .
All operations follow the same syntax than the one described in .
Operation \ Size | FheUint8 | FheUint16 | FheUint32 | FheUint64 | FheUint128 | FheUint256 |
---|
Configuration options for different platforms can be found . Other rust and homomorphic types features can be found .
In this example, 8-bit unsigned integers with default parameters are used. The integers
feature must also be enabled, as per the table on .
An LWE secret key is a list of n
random integers: . is called the
The mask
The body
To add ciphertexts, it is sufficient to add their masks and bodies. Instead of just adding two integers, one needs to add elements. This is an intuitive example to show the slowdown of FHE computation compared to plaintext computation, but other operations are far more expensive (e.g., the computation of a lookup table using Programmable Bootstrapping).
The bootstrapping of TFHE has the particularity of being programmable: this means that any function can be homomorphically computed over an encrypted input, while also reducing the noise. These functions are represented by look-up tables. The computation of a PBS is in general either preceded or followed by a keyswitch, which is an operation used to change the encryption key. The output ciphertext is then encrypted with the same key as the input one. To do this, two (public) evaluation keys are required: a bootstrapping key and a keyswitching key. These operations are quite complex to describe, more information about these operations (or about TFHE in general) can be found here .
By default, the cryptographic parameters provided by TFHE-rs
ensure at least 128 bits of security. The security has been evaluated using the latest versions of the Lattice Estimator () with red_cost_model = reduction.RC.BDGL16
.
For all sets of parameters, the error probability when computing a univariate function over one ciphertext is . Note that univariate functions might be performed when arithmetic functions are computed (i.e., the multiplication of two ciphertexts).
In classical public key encryption, the public key contains a given number of ciphertexts all encrypting the value 0. By setting the number of encryptions to 0 in the public key at , where is the LWE dimension, is the ciphertext modulus, and is the number of security bits. This construction is secure due to the leftover hash lemma, which relates to the impossibility of breaking the underlying multiple subset sum problem. This guarantees both a high-density subset sum and an exponentially large number of possible associated random vectors per LWE sample .
name | symbol | type |
---|
For division by 0, the convention is to return modulus - 1
. For instance, for FheUint8
, the modulus is , so a division by 0 will return an encryption of 255. For the remainder operator, the convention is to return the first input without any modification. For instance, if ct1 = FheUint8(63)
and ct2 = FheUint8(0)
then ct1 % ct2
will return FheUint8(63)
.
name | symbol | type |
---|
name | symbol | type |
---|
name | symbol | type |
---|
name | symbol | type |
---|
name | symbol | type |
---|
Kind | Features | Type(s) |
---|
In general, the library automatically chooses the best instruction sets available by the host. However, in the case of 'AVX-512', this has to be explicitly chosen as a feature. This requires to use a along with the feature nightly-avx512
.
Operation \ Size
FheUint8
FheUint16
FheUint32
FheUint64
FheUint128
FheUint256
Negation (-
)
55.4 ms
79.7 ms
105 ms
133 ms
163 ms
199 ms
Add / Sub (+
,-
)
58.9 ms
86.0 ms
106 ms
124 ms
151 ms
193 ms
Mul (x
)
122 ms
164 ms
227 ms
410 ms
1,04 s
3,41 s
Equal / Not Equal (eq
, ne
)
32.0 ms
32.0 ms
50.4 ms
50.9 ms
53.1 ms
54.6 ms
Comparisons (ge
, gt
, le
, lt
)
43.7 ms
65.2 ms
84.3 ms
107 ms
132 ms
159 ms
Max / Min (max
,min
)
68.4 ms
86.8 ms
106 ms
132 ms
160 ms
200 ms
Bitwise operations (&
, |
, ^
)
17.1 ms
17.3 ms
17.8 ms
18.8 ms
20.2 ms
22.2 ms
Div / Rem (/
, %
)
631 ms
1.59 s
3.77 s
8,64 s
20,3 s
53,4 s
Left / Right Shifts (<<
, >>
)
82.8 ms
99.2 ms
121 ms
149 ms
194 ms
401 ms
Left / Right Rotations (left_rotate
, right_rotate
)
82.1 ms
99.4 ms
120 ms
149 ms
194 ms
402 ms
Operation \ Size
FheUint8
FheUint16
FheUint32
FheUint64
FheUint128
FheUint256
Add / Sub (+
,-
)
68.3 ms
82.4 ms
102 ms
122 ms
151 ms
191 ms
Mul (x
)
93.7 ms
139 ms
178 ms
242 ms
516 ms
1.02 s
Equal / Not Equal (eq
, ne
)
30.2 ms
30.8 ms
32.7 ms
50.4 ms
51.2 ms
54.8 ms
Comparisons (ge
, gt
, le
, lt
)
47.3 ms
69.9 ms
96.3 ms
102 ms
138 ms
141 ms
Max / Min (max
,min
)
75.4 ms
99.7 ms
120 ms
126 ms
150 ms
186 ms
Bitwise operations (&
, |
, ^
)
17.1 ms
17.4 ms
18.2 ms
19.2 ms
19.7 ms
22.6 ms
Div (/
)
160 ms
212 ms
272 ms
402 ms
796 ms
2.27 s
Rem (%
)
315 ms
428 ms
556 ms
767 ms
1.27 s
2.86 s
Left / Right Shifts (<<
, >>
)
16.8 ms
16.8 ms
17.3 ms
18.0 ms
18.9 ms
22.6 ms
Left / Right Rotations (left_rotate
, right_rotate
)
16.8 ms
16.9 ms
17.3 ms
18.3 ms
19.0 ms
22.8 ms
unchecked_add
341 ns
555 ns
2.47 µs
9.77 µs
add
5.96 ms
12.6 ms
102 ms
508 ms
mul_lsb
5.99 ms
12.3 ms
101 ms
500 ms
keyswitch_programmable_bootstrap
6.40 ms
12.9 ms
104 ms
489 ms
DEFAULT_PARAMETERS_KS_PBS
8.49 ms
PARAMETERS_ERROR_PROB_2_POW_MINUS_165_KS_PBS
13.7 ms
TFHE_LIB_PARAMETERS
9.90 ms
default_128bit_gate_bootstrapping_parameters
13.5 ms
FHEW_BINGATE/STD128_OR
25.5 ms
21,6 ms
FHEW_BINGATE/STD128_LMKCDEY_OR
25.4 ms
19.9 ms
unsigned_overflowing_add | 63.67 ms | 84.11 ms | 107.95 ms | 120.8 ms | 147.38 ms | 191.28 ms |
unsigned_overflowing_sub | 68.89 ms | 81.83 ms | 107.63 ms | 120.38 ms | 150.21 ms | 190.39 ms |
unsigned_overflowing_mul | 140.76 ms | 191.85 ms | 272.65 ms | 510.61 ms | 1.34 s | 4.51 s |
signed_overflowing_add | 76.54 ms | 84.78 ms | 104.23 ms | 134.38 ms | 162.99 ms | 202.56 ms |
signed_overflowing_sub | 82.46 ms | 86.92 ms | 104.41 ms | 132.21 ms | 168.06 ms | 201.17 ms |
signed_overflowing_mul | 277.91 ms | 365.67 ms | 571.22 ms | 1.21 s | 3.57 s | 12.84 s |
Linux |
|
|
macOS | Unsupported | Unsupported* |
Windows | Unsupported | Unsupported |
cuda_add | 103.33 ms | 129.26 ms | 156.83 ms | 186.99 ms | 320.96 ms | 528.15 ms |
cuda_bitand | 26.11 ms | 26.21 ms | 26.63 ms | 27.24 ms | 43.07 ms | 65.01 ms |
cuda_bitor | 26.1 ms | 26.21 ms | 26.57 ms | 27.23 ms | 43.05 ms | 65.0 ms |
cuda_bitxor | 26.08 ms | 26.21 ms | 26.57 ms | 27.25 ms | 43.06 ms | 65.07 ms |
cuda_eq | 52.82 ms | 53.0 ms | 79.4 ms | 79.58 ms | 96.37 ms | 145.25 ms |
cuda_ge | 104.7 ms | 130.23 ms | 156.19 ms | 183.2 ms | 213.43 ms | 288.76 ms |
cuda_gt | 104.93 ms | 130.2 ms | 156.33 ms | 183.38 ms | 213.47 ms | 288.8 ms |
cuda_le | 105.14 ms | 130.47 ms | 156.48 ms | 183.44 ms | 213.33 ms | 288.75 ms |
cuda_lt | 104.73 ms | 130.23 ms | 156.2 ms | 183.14 ms | 213.33 ms | 288.74 ms |
cuda_max | 156.7 ms | 182.65 ms | 210.74 ms | 251.78 ms | 316.9 ms | 442.71 ms |
cuda_min | 156.85 ms | 182.67 ms | 210.39 ms | 252.02 ms | 316.96 ms | 442.95 ms |
cuda_mul | 219.73 ms | 302.11 ms | 465.91 ms | 955.66 ms | 2.71 s | 9.15 s |
cuda_ne | 52.72 ms | 52.91 ms | 79.28 ms | 79.59 ms | 96.37 ms | 145.36 ms |
cuda_neg | 103.26 ms | 129.4 ms | 157.19 ms | 187.09 ms | 321.27 ms | 530.11 ms |
cuda_sub | 103.34 ms | 129.42 ms | 156.87 ms | 187.01 ms | 321.04 ms | 528.13 ms |
Min |
| Binary |
Max |
| Binary |
Ternary operator |
| Ternary |
Booleans |
| Booleans |
ShortInts |
| Short integers |
Integers |
| Arbitrary-sized integers |
Sometimes, the server side needs to initialize a value. For example, when computing the sum of a list of ciphertext, one might want to initialize the sum
variable to 0
.
Instead of asking the client to send a real encryption of zero, the server can do a trivial encryption
A trivial encryption will create a ciphertext that contains the desired value, however, the 'encryption' is trivial that is, it is not really encrypted: anyone, any key can decrypt it.
Note that when you want to do an operation that involves a ciphertext and a clear value, you should only use a trivial encryption of the clear value if the ciphertext/clear-value operation (often called scalar operation) you want to run is not supported.
TFHE-rs includes features to reduce the size of both keys and ciphertexts, by compressing them. Most TFHE-rs entities contain random numbers generated by a Pseudo Random Number Generator (PRNG). A PRNG is deterministic, therefore storing only the random seed used to generate those numbers is enough to keep all the required information: using the same PRNG and the same seed, the full chain of random values can be reconstructed when decompressing the entity.
In the library, entities that can be compressed are prefixed by Compressed
. For instance, the type of a compressed FheUint256
is CompressedFheUint256
.
In the following example code, we use the bincode
crate dependency to serialize in a binary format and compare serialized sizes.
This example shows how to compress a ciphertext encypting messages over 16 bits.
This example shows how to compress the server keys.
This example shows how to compress the classical public keys.
It is not currently recommended to use the CompressedPublicKey to encrypt ciphertexts without first decompressing it. In case the resulting PublicKey is too large to fit in memory the encryption with the CompressedPublicKey will be very slow, this is a known problem and will be addressed in future releases.
This example shows how to use compressed compact public keys.
In what follows, the process to manage data when upgrading the TFHE-rs version (starting from the 0.5.5 release) is given. This page details the methods to make data, which have initially been generated with an older version of TFHE-rs, usable with a newer version.
The current strategy that has been adopted for TFHE-rs is the following:
TFHE-rs has a global SERIALIZATION_VERSION
constant;
When breaking serialization changes are introduced, this global version is bumped;
Safe serialization primitives check this constant upon deserialization, if the data is incompatible, these primitives return an error.
To be able to use older serialized data with newer versions, the following is done on new major TFHE-rs releases:
A minor update is done to the previously released branch to add the new release as an optional dependency;
Conversion code is added to the previous branch to be able to load old data and convert it to the new data format.
In practice, if we take the 0.6 release as a concrete example, here is what will happen:
0.6.0 is released with breaking changes to the serialization;
0.5.5 has tfhe@0.6.0 as optional dependency gated by the forward_compatibility
feature;
Conversion code is added to 0.5.5, if possible without any user input, but some data migration will likely require some information to be provided by the developer writing the migration code;
0.5.5 is released.
Note that if you do not need forward compatibility 0.5.5 will be equivalent to 0.5.3 from a usability perspective and you can safely update. Note also that the 0.6.0 has no knowledge of previous releases.
A set of generic tooling is given to allow migrating data by using several workflows. The data migration is considered to be an application/protocol layer concern to avoid imposing design choices.
Examples to migrate data:
An Application
uses TFHE-rs 0.5.3 and needs/wants to upgrade to 0.6.0 to benefit from various improvements.
Example timeline of the data migration or Bulk Data Migration
:
A new transition version of the Application
is compiled with the 0.5.5 release of TFHE-rs;
The transition version of the Application
adds code to read previously stored data, convert it to the proper format for 0.6.0 and save it back to disk;
The service enters a maintenance period (if relevant);
Migration of data from 0.5.5 to 0.6.0 is done with the transition version of the Application
, note that depending on the volume of data this transition can take a significant amount of time;
The updated version of the Application
is compiled with the 0.6.0 release of TFHE-rs and put in production;
Service is resumed with the updated Application
(if relevant).
The above case is describing a simple use case, where only a single version of data has to be managed. Moreover, the above strategy is not relevant in the case where the data is so large that migrating it in one go is not doable, or if the service cannot suffer any interruption.
In order to manage more complicated cases, another method called Migrate On Read
can be used.
Here is an example timeline where data is migrated only as needed with the Migrate On Read
approach:
A new version of the Application
is compiled, it has tfhe@0.5.5 as dependency (the dependency will have to be renamed to avoid conflicts, a possible name is to use the major version like tfhe_0_5
) and tfhe@0.6.0 which will not be renamed and can be accessed as tfhe
Code to manage reading the data is added to the Application
:
The code determines whether the data was saved with the 0.5 Application
or the 0.6 Application
, if the data is already up to date with the 0.6 format it can be loaded right away, if it's in the 0.5 format the Application
can check if an updated version of the data is already available in the 0.6 format and loads that if it's available, otherwise it converts the data to 0.6, saves the converted data to avoid having to convert it every time it is accessed and continue processing with the 0.6 data
The above is more complicated to manage as data will be present on disk with several versions, however it allows to run the service continuously or near-continuously once the new Application
is deployed (it will require careful routing or error handling as nodes with outdated Application
won't be able to process the 0.6 data).
Also, if required, several version of TFHE-rs can be "chained" to upgrade very old data to newer formats. The above pattern can be extended to have tfhe_0_5
(tfhe@0.5.5 renamed), tfhe_0_6
(tfhe@0.6.0 renamed) and tfhe
being tfhe@0.7.0, this will require special handling from the developers so that their protocol can handle data from 0.5.5, 0.6.0 and 0.7.0 using all the conversion tooling from the relevant version.
E.g., if some computation requires data from version 0.5.5 a conversion function could be called upgrade_data_from_0_5_to_0_7
and do:
read data from 0.5.5
convert to 0.6.0 format using tfhe_0_6
convert to 0.7.0 format using tfhe_0_7
save to disk in 0.7.0 format
process 0.7.0 data with tfhe
which is tfhe@0.7.0
Public key encryption refers to the cryptographic paradigm where the encryption key can be publicly distributed, whereas the decryption key remains secret to the owner. This differs from usual case where the same secret key is used to encrypt and decrypt the data. In TFHE-rs, there exists two methods for public key encryptions. First, the usual one, where the public key contains ma y encryption of zeroes. More details can be found in Guide to Fully Homomorphic Encryption over the [Discretized] Torus, Appendix A.. The second method is based on the paper entitled TFHE Public-Key Encryption Revisited. The main advantage of the latter method in comparison with the former lies into the key sizes, which are drastically reduced.
Note that public keys can be compressed
This example shows how to use public keys.
This example shows how to use compact public keys. The main difference is in the ConfigBuilder, where the parameter set has been changed.
The Programmable Bootstrapping(PBS) is a sequential operation by nature. However, some recent results showed that parallelism could be added at the cost of having larger keys. Overall, the performance of the PBS are improved. This new PBS is called a multi bit PBS. In TFHE-rs, since integer homomorphic operations are already parallelized, activating this feature may improve performance in the case of high core count CPUs if enough cores are available, or for small input message precision.
In what follows, an example on how to use the parallelized bootstrapping by choosing multi bit PBS parameters:
By construction, the parallelized PBS might not be deterministic: the resulting ciphertext will always decrypt to the same plaintext, but the order of the operations could differ so the output ciphertext might differ. In order to activate the deterministic version, the suffix 'with_deterministic_execution()' should be added to the parameters, as shown in the following example:
As explained in the Introduction, most types are meant to be shared with the server that performs the computations.
The easiest way to send these data to a server is to use the serialization
and deserialization
features. tfhe
uses the serde framework. Serde's Serialize
and Deserialize
functions are implemented on TFHE's types.
To serialize our data, a data format should be picked. Here, bincode is a good choice, mainly because it is a binary format.
For some types, safe serialization and deserialization functions are available. Bincode is used internally.
Safe-deserialization must take as input the output of a safe-serialization. On this condition, validation of the following is done:
type: trying to deserialize type A
from a serialized type B
raises an error along the lines of On deserialization, expected type A, got type B instead of a generic deserialization error (or less likely a meaningless result of type A
)
version: trying to deserialize type A
(version 0.2) from a serialized type A
(incompatible version 0.1) raises an error along the lines of On deserialization, expected serialization version 0.2, got version 0.1 instead of a generic deserialization error (or less likely a meaningless result of type A
(version 0.2))
parameter compatibility: trying to deserialize into an object of type A
with some crypto parameters from a an object of type A
with other crypto parameters raises an error along the lines of Deserialized object of type A not conformant with given parameter set. If both parameters sets 1 and 2 have the same lwe dimension for ciphertexts, a ciphertext from param 1 may not fail this deserialization check with param 2 even if doing this deserialization may not make sense. Also, this check can't distinguish ciphertexts/server keys from independant client keys with the same parameters (which makes no sense combining to do homomorphic operations). This check is meant to prevent runtime errors in server homomorphic operations by checking that server keys and ciphertexts are compatible with the same parameter set.
Moreover, a size limit (in number of bytes) for the serialized data is expected on both serialization and deserialization. On serialization, an error is raised if the serialized output would be bigger than the given limit. On deserialization, an error is raised if the serialized input is bigger than the given limit. It is meant to gracefully return an error in case of an attacker trying to cause an out of memory error on deserialization.
A standalone is_conformant
method is also available on those types to do a parameter compatibility check.
Parameter compatibility check is done by safe_deserialize_conformant
function but a safe_deserialize
function without this check is also available.
Since tfhe-rs 0.5, trivial ciphertexts have another application. They can be used to allow debugging via a debugger or print statements as well as speeding-up execution time so that you won't have to spend minutes waiting for execution to progress.
This can greatly improve the pace at which one develops FHE applications.
Keep in mind that trivial ciphertexts are not secure at all, thus an application released/deployed in production must never receive trivial ciphertext from a client.
To use this feature, simply call your circuits/functions with trivially encrypted values (made using encrypt_trivial
) instead of real encryptions (made using encrypt
)
This example is going to print.
If any input to mul_all
is not a trivial ciphertexts, the computations would be done 100% in FHE, and the program would output:
Using trivial encryptions as input, the example runs in 980 ms on a standard 12 cores laptop, using real encryptions it would run in 7.5 seconds on a 128-core machine.
In tfhe::boolean
, the available operations are mainly related to their equivalent Boolean gates (i.e., AND, OR... etc). What follows are examples of a unary gate (NOT) and a binary gate (XOR). The last one is about the ternary MUX gate, which allows homomorphic computation of conditional statements of the form If..Then..Else
.
This library is meant to be used both on the server side and the client side. The typical use case should follow the subsequent steps:
On the client side, generate the client
and server keys
.
Send the server key
to the server.
Then any number of times:
On the client side, encrypt the input data with the client key
.
Transmit the encrypted input to the server.
On the server side, perform homomorphic computation with the server key
.
Transmit the encrypted output to the client.
On the client side, decrypt the output data with the client key
.
In the first step, the client creates two keys, the client key
and the server key
, with the concrete_boolean::gen_keys
function:
The client_key
is of type ClientKey
. It is secret and must never be transmitted. This key will only be used to encrypt and decrypt data.
The server_key
is of type ServerKey
. It is a public key and can be shared with any party. This key has to be sent to the server because it is required for homomorphic computation.
Note that both the client_key
and server_key
implement the Serialize
and Deserialize
traits. This way you can use any compatible serializer to store/send the data. To store the server_key
in a binary file, you can use the bincode
library:
Once the server key is available on the server side, it is possible to perform some homomorphic computations. The client needs to encrypt some data and send it to the server. Again, the Ciphertext
type implements the Serialize
and the Deserialize
traits, so that any serializer and communication tool suiting your use case can be employed:
Anyone (the server or a third party) with the public key can also encrypt some (or all) of the inputs. The public key can only be used to encrypt, not to decrypt.
Once the encrypted inputs are on the server side, the server_key
can be used to homomorphically execute the desired Boolean circuit:
Once the encrypted output is on the client side, the client_key
can be used to decrypt it:
rayon is a popular crate to easily write multi-threaded code in Rust.
It is possible to use rayon to write multi-threaded TFHE-rs code. However due to internal details of rayon
and TFHE-rs
, there is some special setup that needs to be done.
The high level api requires to call set_server_key
on each thread where computations needs to be done. So a first attempt at using rayon with TFHE-rs
might look like this:
However, due to rayon's work stealing mechanism and TFHE-rs's internals, this may create `BorrowMutError'.
The correct way is to call rayon::broadcast
If your application needs to operate on data from different clients concurrently, and that you want each client to use multiple threads, you will need to create different rayon thread pools
This can be useful if you have some rust #[test]
TFHE-rs supports WASM for the client api, that is, it supports key generation, encryption, decryption but not doing actual computations.
TFHE-rs supports 3 WASM 'targets':
nodejs: to be used in a nodejs app/package
web: to be used in a web browser
web-parallel: to be used in a web browser with multi-threading support
In all cases, the core of the API is same, only few initialization function changes.
When using the Web WASM target, there is an additional init
function to call.
When using the Web WASM target with parallelism enabled, there is also one more initialization function to call initThreadPool
The TFHE-rs repo has a Makefile that contains targets for each of the 3 possible variants of the API:
make build_node_js_api
to build the nodejs API
make build_web_js_api
to build the browser API
make build_web_js_api_parallel
to build the browser API with parallelism
The compiled WASM package will be in tfhe/pkg.
The sequential browser API and the nodejs API are published as npm packages. You can add the browser API to your project using the command npm i tfhe
. You can add the nodejs API to your project using the command npm i node-tfhe
.
TFHE-rs uses WASM to expose a JS binding to the client-side primitives, like key generation and encryption, of the Boolean and shortint modules.
There are several limitations at this time. Due to a lack of threading support in WASM, key generation can be too slow to be practical for bigger parameter sets.
Some parameter sets lead to FHE keys that are too big to fit in the 2GB memory space of WASM. This means that some parameter sets are virtually unusable.
To build the JS on WASM bindings for TFHE-rs, you need to install wasm-pack
in addition to a compatible (>= 1.67) rust toolchain
.
In a shell, then run the following to clone the TFHE-rs repo (one may want to checkout a specific tag, here the default branch is used for the build):
The command above targets nodejs. A binding for a web browser can be generated as well using --target=web
. This use case will not be discussed in this tutorial.
Both Boolean and shortint features are enabled here, but it's possible to use one without the other.
After the build, a new directory pkg is present in the tfhe
directory.
Be sure to update the path of the required clause in the example below for the TFHE package that was just built.
The example.js
script can then be run using node
, like so:
This library makes it possible to execute homomorphic operations over encrypted data, where the data are either Booleans, short integers (named shortint in the rest of this documentation), or integers up to 256 bits. It allows you to execute a circuit on an untrusted server because both circuit inputs and outputs are kept private. Data are indeed encrypted on the client side, before being sent to the server. On the server side, every computation is performed on ciphertexts.
The server, however, has to know the circuit to be evaluated. At the end of the computation, the server returns the encryption of the result to the user. Then the user can decrypt it with the secret key
.
The overall process to write an homomorphic program is the same for all types. The basic steps for using the TFHE-rs library are the following:
Choose a data type (Boolean, shortint, integer)
Import the library
Create client and server keys
Encrypt data with the client key
Compute over encrypted data using the server key
Decrypt data with the client key
This library has different modules, with different levels of abstraction.
There is the core_crypto module, which is the lowest level API with the primitive functions and types of the TFHE scheme.
Above the core_crypto module, there are the Boolean, shortint, and integer modules, which contain easy to use APIs enabling evaluation of Boolean, short integer, and integer circuits.
Finally, there is the high-level module built on top of the Boolean, shortint, integer modules. This module is meant to abstract cryptographic complexities: no cryptographical knowledge is required to start developing an FHE application. Another benefit of the high-level module is the drastically simplified development process compared to lower level modules.
TFHE-rs exposes a high-level API by default that includes datatypes that try to match Rust's native types by having overloaded operators (+, -, ...).
Here is an example of how the high-level API is used:
Use the --release
flag to run this example (eg: cargo run --release
)
Here is an example of how the library can be used to evaluate a Boolean circuit:
Use the --release
flag to run this example (eg: cargo run --release
)
Here is a full example using shortint:
Use the --release
flag to run this example (eg: cargo run --release
)
Use the --release
flag to run this example (eg: cargo run --release
)
The library is simple to use and can evaluate homomorphic circuits of arbitrary length. The description of the algorithms can be found in the TFHE paper (also available as ePrint 2018/421).
This library exposes a C binding to the high-level TFHE-rs primitives to implement Fully Homomorphic Encryption (FHE) programs.
TFHE-rs C API can be built on a Unix x86_64 machine using the following command:
or on a Unix aarch64 machine using the following command:
The tfhe.h
header as well as the static (.a) and dynamic (.so) libtfhe
binaries can then be found in "${REPO_ROOT}/target/release/".
The tfhe-c-api-dynamic-buffer.h
header and the static (.a) and dynamic (.so) libraries will be found in "${REPO_ROOT}/target/release/deps/".
The build system needs to be set up so that the C or C++ program links against TFHE-rs C API binaries and the dynamic buffer library.
Here is a minimal CMakeLists.txt to do just that:
TFHE-rs C API
.WARNING: The following example does not have proper memory management in the error case to make it easier to fit the code on this page.
To run the example below, the above CMakeLists.txt and main.c files need to be in the same directory. The commands to run are:
| Binary |
| Binary |
| Binary |
name | symbol |
|
|
Neg |
| N/A |
Add |
|
Sub |
|
Mul |
|
Div |
|
Rem |
|
Not |
| N/A |
BitAnd |
|
BitOr |
|
BitXor |
|
Shr |
|
Shl |
|
Rotate right |
|
Rotate left |
|
Min |
|
Max |
|
Greater than |
|
Greater or equal than |
|
Lower than |
|
Lower or equal than |
|
Equal |
|
Cast (into dest type) |
| N/A |
Cast (from src type) |
| N/A |
Ternary operator |
|
name | symbol |
|
|
Neg |
|
Add |
|
Sub |
|
Mul |
|
Div |
|
Rem |
|
Not |
|
BitAnd |
|
BitOr |
|
BitXor |
|
Shr |
|
Shl |
|
Min |
|
Max |
|
Greater than |
|
Greater or equal than |
|
Lower than |
|
Lower or equal than |
|
Equal |
|
Cast (into dest type) |
|
Cast (from src type) |
|
Ternary operator |
|
| Unary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Unary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Binary |
| Unary |
tfhe::integer
is dedicated to integers smaller than 256 bits. The steps to homomorphically evaluate an integer circuit are described here.
integer
provides 3 basic key types:
ClientKey
ServerKey
PublicKey
The ClientKey
is the key that encrypts and decrypts messages, thus this key is meant to be kept private and should never be shared. This key is created from parameter values that will dictate both the security and efficiency of computations. The parameters also set the maximum number of bits of message encrypted in a ciphertext.
The ServerKey
is the key that is used to actually do the FHE computations. It contains a bootstrapping key and a keyswitching key. This key is created from a ClientKey
that needs to be shared to the server, so it is not meant to be kept private. A user with a ServerKey
can compute on the encrypted data sent by the owner of the associated ClientKey
.
To reflect this, computation/operation methods are tied to the ServerKey
type.
The PublicKey
is a key used to encrypt messages. It can be publicly shared to allow users to encrypt data such that only the ClientKey
holder will be able to decrypt. Encrypting with the PublicKey
does not alter the homomorphic capabilities associated to the ServerKey
.
To generate the keys, a user needs two parameters:
A set of shortint
cryptographic parameters.
The number of ciphertexts used to encrypt an integer (we call them "shortint blocks").
We are now going to build a pair of keys that can encrypt 8-bit integers (signed or unsigned) by using 4 shortint blocks that store 2 bits of message each.
Once we have our keys, we can encrypt values:
Once the client key is generated, the public key can be derived and used to encrypt data.
With our server_key
, and encrypted values, we can now do an addition and then decrypt the result.
If you wish to write generic functions which use operators with mixed reference and non-reference, it might get tricky at first to specify the trait bounds. This page should serve as a cookbook to help you.
Operators (+, *, >>, etc) are tied to traits in std:::ops
, e.g. +
is std::ops::Add
, so to write a generic function which uses the +
operator, you need to use add std::ops::Add
as a trait bound.
Then, depending on if the left hand side / right hand side is an owned value or a reference, the trait bound is slightly different. The table below shows the possibilities.
operation | trait bound |
---|---|
The for<'a>
syntax is something called Higher-Rank Trait Bounds, often shortened as HRTB
Writing generic functions will also allow you to call them using clear inputs, only allowing easier debugging.
Since the ServerKey
and ClientKey
types both implement the Serialize
and Deserialize
traits, you are free to use any serializer that suits you to save and load the keys to disk.
Here is an example using the bincode
serialization library, which serializes to a binary format:
As explained in the introduction, some types (Serverkey
, Ciphertext
) are meant to be shared with the server that performs the computations.
The easiest way to send these data to a server is to use the serialization and deserialization features. tfhe::shortint
uses the framework. Serde's Serialize and Deserialize are then implemented on the tfhe::shortint
types.
To serialize the data, we need to pick a . For our use case, is a good choice, mainly because it is a binary format.
This contains the operations available in tfhe::boolean, along with code examples.
Let ct_1, ct_2, ct_3
be three Boolean ciphertexts. Then, the MUX gate (abbreviation of MUltipleXer) is equivalent to the operation:
This example shows how to use the MUX ternary gate:
The structure and operations related to short integers are described in this section.
In shortint
, the encrypted data is stored in an LWE ciphertext.
Conceptually, the message stored in an LWE ciphertext is divided into a carry buffer and a message buffer.
The message buffer is the space where the actual message is stored. This represents the modulus of the input messages (denoted by MessageModulus
in the code). When doing computations on a ciphertext, the encrypted message can overflow the message modulus. The part of the message which exceeds the message modulus is stored in the carry buffer. The size of the carry buffer is defined by another modulus, called CarryModulus
.
Together, the message modulus and the carry modulus form the plaintext space that is available in a ciphertext. This space cannot be overflowed, otherwise the computation may result in an incorrect output.
In order to ensure the correctness of the computation, we track the maximum value encrypted in a ciphertext via an associated attribute called the degree. When the degree reaches a defined threshold, the carry buffer may be emptied to safely resume the computations. In shortint
the carry modulus is considered useful as a means to do more computations.
The operations available via a ServerKey
may come in different variants:
operations that take their inputs as encrypted values
scalar operations that take at least one non-encrypted value as input
For example, the addition has two variants:
ServerKey::unchecked_add
, which takes two encrypted values and adds them.
ServerKey::unchecked_scalar_add
, which takes an encrypted value and a clear value (a so-called scalar) and adds them.
Each operation may come in different 'flavors':
unchecked
: always does the operation, without checking if the result may exceed the capacity of the plaintext space. Using this operation might have an impact on the correctness of the following operations;
checked
: checks are done before computing the operation, returning an error if operation cannot be done safely;
smart
: always does the operation. If the operation cannot be computed safely, the smart operation will clear the carry to make the operation possible. Some of those will require a mutable reference as input: this is to allow the modification of the carry, but this will not change the underlying encrypted value;
default
: always does the operation and always clears the carry. Could be slower than smart, but it ensures that the timings are consistent from one call to another.
Not all operations have these 4 flavors, as some of them are implemented in a way that the operation is always possible without ever exceeding the plaintext space capacity.
If you don't know which flavor to use, you should use the default
one.
Let's try to do a circuit evaluation using the different flavors of operations that we have already introduced. For a very small circuit, the unchecked
flavour may be enough to do the computation correctly. Otherwise,checked
and smart
are the best options.
Let's do a scalar multiplication, a subtraction, and a multiplication.
During this computation, the carry buffer has been overflowed and, as all the operations were unchecked
, the output may be incorrect.
If we redo this same circuit with the checked
flavor, a panic will occur:
The checked
flavor permits manual management of the overflow of the carry buffer by raising an error if correctness is not guaranteed.
Using the smart
flavor will output the correct result all the time. However, the computation may be slower as the carry buffer may be cleaned during the computations.
The main advantage of the default flavor is to ensure predictable timings as long as this is the only kind of operation which is used.
Using default
could slow-down computations.
#List of available operations
Certain operations can only be used if the parameter set chosen is compatible with the bivariate programmable bootstrapping, meaning the carry buffer is larger than or equal to the message buffer. These operations are marked with a star (*).
The list of implemented operations for shortint is:
addition between two ciphertexts
addition between a ciphertext and an unencrypted scalar
comparisons <
, <=
, >
, >=
, ==
, !=
between a ciphertext and an unencrypted scalar
division of a ciphertext by an unencrypted scalar
LSB multiplication between two ciphertexts returning the result truncated to fit in the message buffer
multiplication of a ciphertext by an unencrypted scalar
bitwise shift <<
, >>
subtraction of a ciphertext by another ciphertext
subtraction of a ciphertext by an unencrypted scalar
negation of a ciphertext
bitwise and, or and xor (*)
comparisons <
, <=
, >
, >=
, ==
, !=
between two ciphertexts (*)
division between two ciphertexts (*)
MSB multiplication between two ciphertexts returning the part overflowing the message buffer
(*)
TFHE-rs supports both private and public key encryption methods. The only difference between both lies in the encryption step: in this case, the encryption method is called using public_key
instead of client_key
.
Here is a small example on how to use public encryption:
Classical arithmetic operations are supported by shortint:
Short homomorphic integer types support some bitwise operations.
A simple example on how to use these operations:
Short homomorphic integer types support comparison operations.
A simple example on how to use these operations:
A simple example on how to use this operation to homomorphically compute the hamming weight (i.e., the number of bits equal to one) of an encrypted number.
Using the shortint types offers the possibility to evaluate bi-variate functions, or functions that take two ciphertexts as input. This requires choosing a parameter set such that the carry buffer size is at least as large as the message (i.e., PARAM_MESSAGE_X_CARRY_Y with X <= Y).
Here is a simple code example:
All parameter sets provide at least 128-bits of security according to the , with an error probability equal to when using programmable bootstrapping. This error probability is due to the randomness added at each encryption (see for more details about the encryption process).
shortint
comes with sets of parameters that permit the use of the library functionalities securely and efficiently. Each parameter set is associated to the message and carry precisions. Therefore, each key pair is entangled to precision.
The user is allowed to choose which set of parameters to use when creating the pair of keys.
The difference between the parameter sets is the total amount of space dedicated to the plaintext, how it is split between the message buffer and the carry buffer, and the order in which the keyswitch (KS) and bootstrap (PBS) are computed. The syntax chosen for the name of a parameter is: PARAM_MESSAGE_{number of message bits}_CARRY_{number of carry bits}_{KS_PBS | PBS_KS}
. For example, the set of parameters for a message buffer of 5 bits, a carry buffer of 2 bits and where the keyswitch is computed before the bootstrap is PARAM_MESSAGE_5_CARRY_2_KS_PBS
.
Note that the KS_PBS
order should have better performance at the expense of ciphertext size, PBS_KS
is the opposite.
This example contains keys that are generated to have messages encoded over 2 bits (i.e., computations are done modulus ) with 2 bits of carry.
The PARAM_MESSAGE_2_CARRY_2_KS_PBS
parameter set is the default shortint
parameter set that you can also use through the tfhe::shortint::prelude::DEFAULT_PARAMETERS
constant.
The computations of bi-variate functions is based on a trick: concatenating two ciphertexts into one. Where the carry buffer is not at least as large as the message buffer, this trick no longer works. In this case, many bi-variate operations, such as comparisons, cannot be correctly computed. The only exception concerns multiplication.
It is possible to define new parameter sets. To do so, it is sufficient to use the function unsecure_parameters()
or to manually fill the ClassicPBSParameters
structure fields.
For instance:
tfhe::shortint
is dedicated to unsigned integers smaller than 8 bits. The steps to homomorphically evaluate a circuit are described below.
tfhe::shortint
provides 3 key types:
ClientKey
ServerKey
PublicKey
The ClientKey
is the key that encrypts and decrypts messages (integer values up to 8 bits here). It is meant to be kept private and should never be shared. This key is created from parameter values that will dictate both the security and efficiency of computations. The parameters also set the maximum number of bits of message encrypted in a ciphertext.
The ServerKey
is the key that is used to evaluate the FHE computations. Most importantly, it contains a bootstrapping key and a keyswitching key. This key is created from a ClientKey
that needs to be shared to the server (it is not meant to be kept private). A user with a ServerKey
can compute on the encrypted data sent by the owner of the associated ClientKey
.
Computation/operation methods are tied to the ServerKey
type.
The PublicKey
is the key used to encrypt messages. It can be publicly shared to allow users to encrypt data such that only the ClientKey
holder will be able to decrypt. Encrypting with the PublicKey
does not alter the homomorphic capabilities associated to the ServerKey
.
Once the keys have been generated, the client key is used to encrypt data:
Once the keys have been generated, the client key is used to encrypt data:
Using the server_key
, addition is possible over encrypted values. The resulting plaintext is recovered after the decryption via the secret client key.
The TFHE cryptographic scheme relies on a variant of and is based on a problem so difficult that it is even post-quantum resistant.
Some cryptographic parameters will require tuning to ensure both the correctness of the result and the security of the computation.
To make it simpler, we've provided two sets of parameters, which ensure correct computations for a certain probability with the standard security of 128 bits. There exists an error probability due to the probabilistic nature of the encryption, which requires adding randomness (noise) following a Gaussian distribution. If this noise is too large, the decryption will not give a correct result. There is a trade-off between efficiency and correctness: generally, using a less efficient parameter set (in terms of computation time) leads to a smaller risk of having an error during homomorphic evaluation.
In the two proposed sets of parameters, the only difference lies in this error probability. The default parameter set ensures an error probability of at most when computing a programmable bootstrapping (i.e., any gates but the not
). The other one is closer to the error probability claimed in the original , namely , but it is up-to-date regarding security requirements.
The following array summarizes this:
Parameter set | Error probability |
---|
You can also create your own set of parameters. This is an unsafe
operation as failing to properly fix the parameters will result in an incorrect and/or insecure computation:
*
*
As shown , the choice of the parameter set impacts the operations available and their efficiency.
In the case of multiplication, two algorithms are implemented: the first one relies on the bi-variate function trick, where the other one is based on the . To correctly compute a multiplication, the only requirement is to have at least one bit of carry (i.e., using parameter sets PARAM_MESSAGE_X_CARRY_Y with Y>=1). This method is slower than using the other one. Using the smart
version of the multiplication automatically chooses which algorithm is used depending on the chosen parameters.
T $op T
T: $Op<T, Output=T>
T $op &T
T: for<'a> $Op<&'a T, Output=T>
&T $op T
for<'a> &'a T: $Op<T, Output=T>
&T $op &T
for<'a> &'a T: $Op<&'a T, Output=T>
integer
does not come with its own set of parameters. Instead, it relies on parameters from shortint
. Currently, parameter sets having the same space dedicated to the message and the carry (i.e. PARAM_MESSAGE_{X}_CARRY_{X}
with X
in [1,4]) are recommended. See here for more details about cryptographic parameters, and here to see how to properly instantiate integers depending on the chosen representation.
As explained in the introduction, some types (Serverkey
, Ciphertext
) are meant to be shared with the server that does the computations.
The easiest way to send these data to a server is to use the serialization and deserialization features. TFHE-rs
uses the serde framework, so serde's Serialize and Deserialize are implemented.
To be able to serialize our data, a data format needs to be picked. Here, bincode is a good choice, mainly because it is binary format.
The core_crypto
module from TFHE-rs
is dedicated to the implementation of the cryptographic tools related to TFHE. To construct an FHE application, the shortint and/or Boolean modules (based on core_crypto
) are recommended.
The core_crypto
module offers an API to low-level cryptographic primitives and objects, like lwe_encryption
or rlwe_ciphertext
. The goal is to propose an easy-to-use API for cryptographers.
The overall code architecture is split in two parts: one for entity definitions and another focused on algorithms. The entities contain the definition of useful types, like LWE ciphertext or bootstrapping keys. The algorithms are then naturally defined to work using these entities.
The API is convenient to add or modify existing algorithms, or to have direct access to the raw data. Even if the LWE ciphertext object is defined, along with functions giving access to the body, it is also possible to bypass these to get directly the element of LWE mask.
For instance, the code to encrypt and then decrypt a message looks like:
core_crypto
primitivesWelcome to this tutorial about TFHE-rs
core_crypto
module.
core_crypto
moduleTo use TFHE-rs
, it first has to be added as a dependency in the Cargo.toml
:
This enables the x86_64-unix
feature to have efficient implementations of various algorithms for x86_64
CPUs on a Unix-like system. The 'unix' suffix indicates that the UnixSeeder
, which uses /dev/random
to generate random numbers, is activated as a fallback if no hardware number generator is available (like rdseed
on x86_64
or if the Randomization Services
on Apple platforms are not available). To avoid having the UnixSeeder
as a potential fallback or to run on non-Unix systems (e.g., Windows), the x86_64
feature is sufficient.
For Apple Silicon, the aarch64-unix
or aarch64
feature should be enabled. aarch64
is not supported on Windows as it's currently missing an entropy source required to seed the CSPRNGs used in TFHE-rs
.
In short: For x86_64
-based machines running Unix-like OSes:
For Apple Silicon or aarch64-based machines running Unix-like OSes:
For x86_64
-based machines with the rdseed instruction
running Windows:
core_crypto
module.As a complete example showing the usage of some common primitives of the core_crypto
APIs, the following Rust code homomorphically computes 2 * 3 using two different methods. First using a cleartext multiplication and then using a PBS.
There are two ways to contribute to TFHE-rs
. You can:
open issues to report bugs and typos and to suggest ideas;
ask to become an official contributor by emailing hello@zama.ai. Only approved contributors can send pull requests, so get in touch before you do.
DEFAULT_PARAMETERS |
TFHE_LIB_PARAMETERS |
In this tutorial, we are going to build a dark market application using TFHE-rs. A dark market is a marketplace where buy and sell orders are not visible to the public before they are filled. Different algorithms aim to solve this problem, we are going to implement the algorithm defined in this paper with TFHE-rs.
We will first implement the algorithm in plain Rust and then we will see how to use TFHE-rs to implement the same algorithm with FHE.
In addition, we will also implement a modified version of the algorithm that allows for more concurrent operations which improves the performance in hardware where there are multiple cores.
A list of sell orders where each sell order is only defined in volume terms, it is assumed that the price is fetched from a different source.
A list of buy orders where each buy order is only defined in volume terms, it is assumed that the price is fetched from a different source.
The sell and buy orders are within the range [1,100].
The maximum number of sell and buy orders is 500, respectively.
There is no output returned at the end of the algorithm. Instead, the algorithm makes changes on the given input lists. The number of filled orders is written over the original order count in the respective lists. If it is not possible to fill the orders, the order count is set to zero.
Example 1:
Sell | Buy | |
---|---|---|
Last three indices of the filled sell orders are zero because there is no buy orders to match them.
Example 2:
Last three indices of the filled buy orders are zero because there is no sell orders to match them.
Calculate the total sell volume and the total buy volume.
Find the total volume that will be transacted. In the paper, this amount is calculated with the formula:
When closely observed, we can see that this formula can be replaced with the min
function. Therefore, we calculate this value by taking the minimum of the total sell volume and the total buy volume.
Beginning with the first item, start filling the sell orders one by one. We apply the min
function replacement also here.
The number of orders that are filled is indicated by modifying the input list. For example, if the first sell order is 1000 and the total volume is 500, then the first sell order will be modified to 500 and the second sell order will be modified to 0.
Do the fill operation also for the buy orders.
For the FHE implementation, we first start with finding the right bit size for our algorithm to work without overflows.
The variables that are declared in the algorithm and their maximum values are described in the table below:
As we can observe from the table, we need 16 bits of message space to be able to run the algorithm without overflows. TFHE-rs provides different presets for the different bit sizes. Since we need 16 bits of message, we are going to use the integer
module to implement the algorithm.
Here are the input types of our algorithm:
sell_orders
is of type Vec<tfhe::integer::RadixCipherText>
buy_orders
is of type Vec<tfhe::integer::RadixCipherText>
server_key
is of type tfhe::integer::ServerKey
Now, we can start implementing the algorithm with FHE:
Calculate the total sell volume and the total buy volume.
Find the total volume that will be transacted by taking the minimum of the total sell volume and the total buy volume.
Beginning with the first item, start filling the sell and buy orders one by one. We can create fill_orders
closure to reduce code duplication since the code for filling buy orders and sell orders are the same.
TFHE-rs provides parallelized implementations of the operations. We can use these parallelized implementations to speed up the algorithm. For example, we can use smart_add_assign_parallelized
instead of smart_add_assign
.
We can parallelize vector sum with Rayon and reduce
operation.
We can run vector summation on buy_orders
and sell_orders
in parallel since these operations do not depend on each other.
We can match sell and buy orders in parallel since the matching does not depend on each other.
When observed closely, there is only a small amount of concurrency introduced in the fill_orders
part of the algorithm. The reason is that the volume_left_to_transact
is shared between all the orders and should be modified sequentially. This means that the orders cannot be filled in parallel. If we can somehow remove this dependency, we can fill the orders in parallel.
In order to do so, we closely observe the function of volume_left_to_transact
variable in the algorithm. We can see that it is being used to check whether we can fill the current order or not. Instead of subtracting the current order value from volume_left_to_transact
in each loop, we can add this value to the next order index and check the availability by comparing the current order value with the total volume. If the current order value (now representing the sum of values before this order plus this order) is smaller than the total number of matching orders, we can safely fill all the orders and continue the loop. If not, we should partially fill the orders with what is left from matching orders.
We will call the new list the "prefix sum" of the array.
The new version for the plain fill_orders
is as follows:
To write this new function we need transform the conditional code into a mathematical expression since FHE does not support conditional operations.
New fill_order
function requires a prefix sum array. We are going to calculate this prefix sum array in parallel with the algorithm described here.
The sample code in the paper is written in CUDA. When we try to implement the algorithm in Rust we see that the compiler does not allow us to do so. The reason for that is while the algorithm does not access the same array element in any of the threads(the index calculations using d
and k
values never overlap), Rust compiler cannot understand this and does not let us share the same array between threads. So we modify how the algorithm is implemented, but we don't change the algorithm itself.
Here is the modified version of the algorithm in TFHE-rs:
The plain, FHE and parallel FHE implementations can be run by providing respective arguments as described below.
In this tutorial, we've learned how to implement the volume matching algorithm described in this paper in plain Rust and in TFHE-rs. We've identified the right bit size for our problem at hand, used operations defined in TFHE-rs
, and introduced concurrency to the algorithm to increase its performance.
The structure and operations related to integers are described in this section.
In integer
, the encrypted data is split amongst many ciphertexts encrypted with the shortint
library. Below is a scheme representing an integer composed by k shortint ciphertexts.
This crate implements two ways to represent an integer:
the Radix representation
the CRT (Chinese Reminder Theorem) representation
The first possibility to represent a large integer is to use a Radix-based decomposition on the plaintexts. Let be a basis such that the size of is smaller than (or equal to) 4 bits. Then, an integer can be written as , where each is strictly smaller than . Each is then independently encrypted. In the end, an Integer ciphertext is defined as a set of shortint ciphertexts.
The definition of an integer requires a basis and a number of blocks. These parameters are chosen at key generation. Below, the keys are dedicated to integers encrypting messages over 8 bits, using a basis over 2 bits (i.e., ) and 4 blocks.
In this representation, the correctness of operations requires the carries to be propagated throughout the ciphertext. This operation is costly, since it relies on the computation of many programmable bootstrapping operations over shortints.
The second approach to represent large integers is based on the Chinese Remainder Theorem. In this case, the basis is composed of several integers , such that there are pairwise coprime, and each has a size smaller than 4 bits. The CRT-based integer are defined modulus . For an integer , its CRT decomposition is simply defined as . Each part is then encrypted as a shortint ciphertext. In the end, an Integer ciphertext is defined as a set of shortint ciphertexts.
In the following example, the chosen basis is . The integer is defined modulus . There is no need to pre-size the number of blocks since it is determined from the number of values composing the basis. Here, the integer is split over three blocks.
This representation has many advantages: no carry propagation is required, cleaning the carry buffer of each ciphertext block is enough. This implies that operations can easily be parallelized. It also allows the efficient computation of PBS in the case where the function is CRT-compliant.
A variant of the CRT is proposed where each block might be associated to a different key couple. Here, a keychain to the computations is required, but this may result in a performance improvement.
The list of operations available in integer
depends on the type of representation:
Much like shortint
, the operations available via a ServerKey
may come in different variants:
operations that take their inputs as encrypted values.
scalar operations take at least one non-encrypted value as input.
For example, the addition has both variants:
ServerKey::unchecked_add
, which takes two encrypted values and adds them.
ServerKey::unchecked_scalar_add
, which takes an encrypted value and a clear value (the so-called scalar) and adds them.
Each operation may come in different 'flavors':
unchecked
: always does the operation, without checking if the result may exceed the capacity of the plaintext space.
checked
: checks are done before computing the operation, returning an error if operation cannot be done safely.
smart
: always does the operation, if the operation cannot be computed safely, the smart operation will propagate the carry buffer to make the operation possible. Some of those will require a mutable reference as input: this is because the inputs' carry might be cleaned, but this will not change the underlying encrypted value.
default
: always compute the operation and always clear the carry. Could be slower than smart, but ensure that the timings are consistent from one call to another.
Not all operations have these 4 flavors, as some of them are implemented in a way that the operation is always possible without ever exceeding the plaintext space capacity.
If you don't know which flavor to use, you should use the default
one.
Let's try to do a circuit evaluation using the different flavors of already introduced operations. For a very small circuit, the unchecked
flavor may be enough to do the computation correctly. Otherwise, checked
and smart
are the best options.
As an example, let's do a scalar multiplication, a subtraction, and an addition.
During this computation the carry buffer has been overflowed, and the output may be incorrect as all the operations were unchecked
.
If the same circuit is done but using the checked
flavor, a panic will occur:
The checked
flavor permits the manual management of the overflow of the carry buffer by raising an error if correctness is not guaranteed.
Using the smart
flavor will output the correct result all the time. However, the computation may be slower as the carry buffer may be propagated during the computations.
You must avoid cloning the inputs when calling smart
operations to preserve performance. For instance, you SHOULD NOT have these kind of patterns in the code:
The main advantage of the default flavor is to ensure predictable timings, as long as only this kind of operation is used. Only the parallelized version of the operations is provided.
Using default
could slow down computations.
This tutorial explains how to build a regex Pattern Matching Engine (PME) where ciphertext is the content that is evaluated.
A regex PME is an essential tool for programmers. It allows you to perform complex searches on content. A less powerful simple search on string can only find matches of the exact given sequence of characters (e.g., your browser's default search function). Regex PMEs are more powerful, allowing searches on certain structures of text, where a structure may take any form in multiple possible sequences of characters. The structure to be searched is defined with the regex, a very concise language.
Here are some example regexes to give you an idea of what is possible:
Regex | Semantics |
---|---|
Regexes are powerful enough to be able to express structures like email address formats. This capability is what makes regexes useful for many programming solutions.
There are two main components identifiable in a PME:
The pattern that is to be matched has to be parsed, translated from a textual representation into a recursively structured object (an Abstract Syntax Tree, or AST).
This AST must then be applied to the text that it is to be matched against, resulting in a 'yes' or 'no' to whether the pattern has matched (in the case of our FHE implementation, this result is an encrypted 'yes' or an encrypted 'no').
Parsing is a well understood problem. There are a couple of different approaches possible here. Regardless of the approach chosen, it starts with figuring out what language we want to support. That is, what are the kinds of sentences we want our regex language to include? A few example sentences we definitely want to support are, for example: /a/
, /a?bc/
, /^ab$/
, /ab|cd/
, however example sentences don't suffice as a specification because they can never be exhaustive (they're endless). We need something to specify exactly the full set of sentences our language supports. There exists a language that can help us describe our own language's structure exactly: Grammar.
It is useful to start with defining the Grammar before starting to write code for the parser because the code structure follows directly from the Grammar. A Grammar consists of a generally small set of rules. For example, a very basic Grammar could look like this:
This describes a language that only contains the sentence "a". Not a very interesting language.
We can make it more interesting though by introducing choice into the Grammar with | (called a 'pipe') operators. If we want the above Grammar to accept either "a" or "b":
So far, only Grammars with a single rule have been shown. However, a Grammar can consist of multiple rules. Most languages require it. So let's consider a more meaningful language, one that accepts sentences consisting of one or more digits. We could describe such a language with the following Grammar:
The +
after Digit
is another Grammar operator. With it, we specify that Digit must be matched one or more times. Here are all the Grammar operators that are relevant for this tutorial:
In the case of the example PME, the Grammar is as follows (notice the unquoted ? and quoted ?, etc. The unquoted characters are Grammar operators, and the quoted are characters we are matching in the parsing).
We will refer occasionally to specific parts in the Grammar listed above by <rule name>.<variant index> (where the first rule variant has index 1).
With the Grammar defined, we can start defining a type to parse into. In Rust, we have the enum
kind of type that is perfect for this, as it allows you to define multiple variants that may recurse. I prefer to start by defining variants that do not recurse (i.e., that don't contain nested regex expressions):
With this, we can translate the following basic regexes:
Notice we're not yet able to sequence multiple components together. Let's define the first variant that captures recursive RegExpr for this:
With this Seq (short for sequence) variant, we allow translating patterns that contain multiple components:
Let's finish the RegExpr datastructure by adding variants for 'Optional' matching, 'Not' logic in a range, and 'Either' left or right matching:
Some features may make the most sense being implemented during post-processing of the parsed datastructure. For example, the case insensitivity feature (the i
Modifier) is implemented in the example implementation by taking the parsed RegExpr and mutating every character mentioned inside to cover both the lower case as well as the upper case variant (see function case_insensitive
in parser.rs
for the example implementation).
The modifier i
in our Grammar (for enabling case insensitivity) was easiest to implement by applying a post-processing step to the parser.
We are now able to translate any complex regex into a RegExpr value. For example:
With both the Grammar and the datastructure to parse into defined, we can now start implementing the actual parsing logic. There are multiple ways this can be done. For example, there exist tools that can automatically generate parser code by giving it the Grammar definition (these are called parser generators). However, you might prefer to write parsers with a parser combinator library. This may be the better option for you because the behavior in runtime is easier to understand for parsers constructed with a parser combinator library than of parsers that were generated with a parser generator tool.
Rust offers a number of popular parser combinator libraries. This tutorial used combine
, but any other library would work just as well. Choose whichever appeals the most to you (including any parser generator tool). The implementation of our regex parser will differ significantly depending on the approach you choose, so we will not cover this in detail here. You may look at the parser code in the example implementation to get an idea of how this could be done. In general though, the Grammar and the datastructure are the important components, while the parser code follows directly from these.
The next challenge is to build the execution engine, where we take a RegExpr value and recurse into it to apply the necessary actions on the encrypted content. We first have to define how we actually encode our content into an encrypted state. Once that is defined, we can start working on how we will execute our RegExpr onto the encrypted content.
It is not possible to encrypt the entire content into a single encrypted value. We can only encrypt numbers and perform operations on those encrypted numbers with FHE. Therefore, we have to find a scheme where we encode the content into a sequence of numbers that are then encrypted individually to form a sequence of encrypted numbers.
We recommend the following two strategies:
to map each character of the content into the u8 ascii value, and then encrypt each bit of these u8 values individually.
to, instead of encrypting each bit individually, encrypt each u8 ascii value in its entirety.
Strategy 1 requires more high-level TFHE-rs operations to check for a simple character match (we have to check each bit individually for equality as opposed to checking the entire byte in one, high-level TFHE-rs operation), though some experimentation did show that both options performed equally well on a regex like /a/
. This is likely because bitwise FHE operations are relatively cheap compared to u8 FHE operations. However, option 1 falls apart as soon as you introduce '[a-z]' regex logic. With option 2, it is possible to complete this match with just three TFHE-rs operations: ge
, le
, and bitand
.
If, on the other hand, we had encrypted the content with the first strategy, there would be no way to test for greater/equal than from
and less/equal than to
. We'd have to check for the potential equality of each character between from
and to
, and then join the results together with a sequence of sk.bitor
; that would require far more cryptographic operations than in strategy 2.
Because FHE operations are computationally expensive, and strategy 1 requires significantly more FHE operations for matching on [a-z]
regex logic, we should opt for strategy 2.
There are a lot of regex PMEs. It's been built many times and it's been researched thoroughly. There are different strategies possible here. A straight forward strategy is to directly recurse into our RegExpr value and apply the necessary matching operations onto the content. In a way, this is nice because it allows us to link the RegExpr structure directly to the matching semantics, resulting in code that is easier to understand, maintain, etc.
Alternatively, there exists an algorithm that transforms the AST (i.e., the RegExpr, in our case) into a Deterministic Finite Automata (DFA). Normally, this is a favorable approach in terms of efficiency because the derived DFA can be walked over without needing to backtrack (whereas the former strategy cannot prevent backtracking). This means that the content can be walked over from character to character, and depending on what the character is at this cursor, the DFA is conjunctively traveled in a definite direction which ultimately leads us to the yes, there is a match
or the no, there is no match
. There is a small upfront cost of having to translate the AST into the DFA, but the lack of backtracking during matching generally makes up for this, especially if the content that it is matched against is significantly big.
In our case though, we are matching on encrypted content. We have no way to know what the character at our cursor is, and therefore no way to find this definite direction to go forward in the DFA. Therefore, translating the AST into the DFA does not help us as it does in normal regex PMEs. For this reason, consider opting for the former strategy because it allows for matching logic that is easier to understand.
In the previous section, we decided we'll match by traversing into the RegExpr value. This section will explain exactly how to do that. Similarly to defining the Grammar, it is often best to start with working out the non-recursive RegExpr variants.
We'll start by defining the function that will recursively traverse into the RegExpr value:
sk
is the server key (aka, public key),content
is what we'll be matching against, re
is the RegExpr value we built when parsing the regex, and c_pos
is the cursor position (the index in content we are currently matching against).
The result is a vector of tuples, with the first value of the tuple being the computed ciphertext result, and the second value being the content position after the regex components were applied. It's a vector because certain RegExpr variants require the consideration of a list of possible execution paths. For example, RegExpr::Optional might succeed by applying or and not applying the optional regex (notice that in the former case, c_pos
moves forward whereas in the latter case it stays put).
On first call, a match
of the entire regex pattern starts with c_pos=0
. Then match
is called again for the entire regex pattern with c_pos=1
, etc. until c_pos
exceeds the length of the content. Each of these alternative match results are then joined together with sk.bitor
operations (this works because if one of them results in 'true' then, in general, our matching algorithm should return 'true').
The ...
within the match statement above is what we will be working out for some of the RegExpr variants now. Starting with RegExpr::Char
:
Let's consider an example of the variant above. If we apply /a/
to content bac
, we'll have the following list of match
calls re
and c_pos
values (for simplicity, re
is denoted in regex pattern instead of in RegExpr value):
And we would arrive at the following sequence of ciphertext operations:
AnyChar is a no operation:
The sequence iterates over its re_xs
, increasing the content position accordingly, and joins the results with bitand
operations:
Other variants are similar, as they recurse and manipulate re
and c_pos
accordingly. Hopefully, the general idea is already clear.
Ultimately the entire pattern-matching logic unfolds into a sequence of the following set of FHE operations:
eq (tests for an exact character match)
ge (tests for 'greater than' or 'equal to' a character)
le (tests for 'less than' or 'equal to' a character)
bitand (bitwise AND, used for sequencing multiple regex components)
bitor (bitwise OR, used for folding multiple possible execution variants' results into a single result)
bitxor (bitwise XOR, used for the 'not' logic in ranges)
Generally, the included example PME follows the approach outlined above. However, there were two additional optimizations applied. Both of these optimizations involved reducing the number of unnecessary FHE operations. Given how computationally expensive these operations are, it makes sense to optimize for this (and to ignore any suboptimal memory usage of our PME, etc.).
The first optimization involved delaying the execution of FHE operations to after the generation of all possible execution paths to be considered. This optimization allows us to prune execution paths during execution path construction that are provably going to result in an encrypted false value, without having already performed the FHE operations up to the point of pruning. Consider the regex /^a+b$/
, and we are applying this to a content of size 4. If we are executing execution paths naively, we would go ahead and check for all possible amounts of a
repetitions: ab
, aab
, aaab
. However, while building the execution paths, we can use the fact that a+
must begin at the beginning of the content, and that b
must be the final character of the content. From this follows that we only have to check for the following sentence: aaab
. Delaying execution of the FHE operations until after we've built the possible execution paths in this example reduced the number of FHE operations applied by approximately half.
The second optimization involved preventing the same FHE conditions to be re-evaluated. Consider the regex /^a?ab/
. This would give us the following possible execution paths to consider:
content[0] == a && content[1] == a && content[2] == b
(we match the a
in a?
)
content[0] == a && content[1] == b
(we don't match the a
in a?
)
Notice that, for both execution paths, we are checking for content[0] == a
. Even though we cannot see what the encrypted result is, we do know that it's either going to be an encrypted false for both cases or an encrypted true for both cases. Therefore, we can skip the re-evaluation of content[0] == a
and simply copy the result from the first evaluation over. This optimization involves maintaining a cache of known expression evaluation results and reusing those where possible.
The implementation that guided the writing of this tutorial can be found under tfhe/examples/regex_engine
.
When compiling with --example regex_engine
, a binary is produced that serves as a basic demo. Simply call it with the content string as a first argument and the pattern string as a second argument. For example, cargo run --release --features=x86_64-unix,integer --example regex_engine -- 'this is the content' '/^pattern$/'
; note it's advised to compile the executable with --release
flag as the key generation and homomorphic operations otherwise seem to experience a heavy performance penalty.
On execution, a private and public key pair are created. Then, the content is encrypted with the client key, and the regex pattern is applied onto the encrypted content string - with access given only to the server key. Finally, it decrypts the resulting encrypted result using the client key and prints the verdict to the console.
To get more information on exact computations and performance, set the RUST_LOG
environment variable to debug
or to trace
.
This section specifies the supported set of regex patterns in the regex engine.
A regex is described by a sequence of components surrounded by /
, the following components are supported:
Modifiers are mode selectors that affect the entire regex behavior. One modifier is currently supported:
Case insensitive matching, by appending an i
after the regex pattern. For example: /abc/i
These components and modifiers can be combined to form any desired regex pattern. To give some idea of what is possible, here is a non-exhaustive list of supported regex patterns:
In this tutorial we will go through the steps to turn a regular sha256 implementation into its homomorphic version. We explain the basics of the sha256 function first, and then how to implement it homomorphically with performance considerations.
The first step in this experiment is actually implementing the sha256 function. We can find the specification here, but let's summarize the three main sections of the document.
The sha256 function processes the input data in blocks or chunks of 512 bits. Before actually performing the hash computations we have to pad the input in the following way:
Append a single "1" bit
Append a number of "0" bits such that exactly 64 bits are left to make the message length a multiple of 512
Append the last 64 bits as a binary encoding of the original input length
Or visually:
Where the numbers on the top represent the length of the padded input at each position, and L+1+k+64 is a multiple of 512 (the length of the padded input).
Let's take a look at the operations that we will use as building blocks for functions inside the sha256 computation. These are bitwise AND, XOR, NOT, addition modulo 2^32 and the Rotate Right (ROTR) and Shift Right (SHR) operations, all working with 32-bit words and producing a new word.
We combine these operations inside the sigma (with 4 variations), Ch and Maj functions. At the end of the day, when we change the sha256 to be computed homomorphically, we will mainly change the isolated code of each operation.
Here is the definition of each function:
There are some things to note about the functions. Firstly we see that Maj can be simplified by applying the boolean distributive law (x AND y) XOR (x AND z) = x AND (y XOR z). So the new Maj function looks like this:
Next we can also see that Ch can be simplified by using a single bitwise multiplexer. Let's take a look at the truth table of the Ch expression.
When x = 0
the result is identical to z
, but when x = 1
the result is identical to y
. This is the same as saying if x {y} else {z}
. Hence we can replace the 4 bitwise operations of Ch by a single bitwise multiplexer.
Note that all these operations can be evaluated homomorphically. ROTR and SHR can be evaluated by changing the index of each individual bit of the word, even if each bit is encrypted, without using any homomorphic operation. Bitwise AND, XOR and multiplexer can be computed homomorphically and addition modulo 2^32 can be broken down into boolean homomorphic operations as well.
As we have mentioned, the sha256 function works with chunks of 512 bits. For each chunk, we will compute 64 32-bit words. 16 will come from the 512 bits and the rest will be computed using the previous functions. After computing the 64 words, and still within the same chunk iteration, a compression loop will compute a hash value (8 32-bit words), again using the previous functions and some constants to mix everything up. When we finish the last chunk iteration, the resulting hash values will be the output of the sha256 function.
Here is how this function looks like using arrays of 32 bools to represent words:
The key idea is that we can replace each bit of padded_input
with a Fully Homomorphic Encryption of the same bit value, and operate over the encrypted values using homomorphic operations. To achieve this we need to change the function signatures and deal with the borrowing rules of the Ciphertext type (which represents an encrypted bit) but the structure of the sha256 function remains the same. The part of the code that requires more consideration is the implementation of the sha256 operations, since they will use homomorphic boolean operations internally.
Homomorphic operations are really expensive, so we have to remove their unnecessary use and maximize parallelization in order to speed up the program. To simplify our code we use the Rayon crate which provides parallel iterators and efficiently manages threads.
The final code is available at https://github.com/zama-ai/tfhe-rs/tree/main/tfhe/examples/sha256_bool
Let's now take a look at each sha256 operation!
As we have highlighted, these two operations can be evaluated by changing the position of each encrypted bit in the word, thereby requiring 0 homomorphic operations. Here is our implementation:
To implement these operations we will use the xor
, and
and mux
methods provided by the tfhe library to evaluate each boolean operation homomorphically. It's important to note that, since we will operate bitwise, we can parallelize the homomorphic computations. In other words, we can homomorphically XOR the bits at index 0 of two words using a thread, while XORing the bits at index 1 using another thread, and so on. This means we could compute these bitwise operations using up to 32 concurrent threads (since we work with 32-bit words).
Here is our implementation of the bitwise homomorphic XOR operation. The par_iter
and par_iter_mut
methods create a parallel iterator that we use to compute each individual XOR efficiently. The other two bitwise operations are implemented in the same way.
This is perhaps the trickiest operation to efficiently implement in a homomorphic fashion. A naive implementation could use the Ripple Carry Adder algorithm, which is straightforward but cannot be parallelized because each step depends on the previous one.
A better choice would be the Carry Lookahead Adder, which allows us to use the parallelized AND and XOR bitwise operations. With this design, our adder is around 50% faster than the Ripple Carry Adder.
To even improve performance more, the function that computes the carry signals can also be parallelized using parallel prefix algorithms. These algorithms involve more boolean operations (so homomorphic operations for us) but may be faster because of their parallel nature. We have implemented the Brent-Kung and Ladner-Fischer algorithms, which entail different tradeoffs.
Brent-Kung has the least amount of boolean operations we could find (140 when using grey cells, for 32-bit numbers), which makes it suitable when we can't process many operations concurrently and fast. Our results confirm that it's indeed faster than both the sequential algorithm and Ladner-Fischer when run on regular computers.
On the other hand, Ladner-Fischer performs more boolean operations (209 using grey cells) than Brent-Kung, but they are performed in larger batches. Hence we can compute more operations in parallel and finish earlier, but we need more fast threads available or they will slow down the carry signals computation. Ladner-Fischer can be suitable when using cloud-based computing services, which offer many high-speed threads.
Our implementation uses Brent-Kung by default, but Ladner-Fischer can be enabled when needed by using the --ladner-fischer
command line argument.
For more information about parallel prefix adders you can read this paper or this other paper.
Finally, with all these sha256 operations working homomorphically, our functions will be homomomorphic as well along with the whole sha256 function (after adapting the code to work with the Ciphertext type). Let's talk about other performance improvements we can make before we finish.
If we inspect the main sha256_fhe
function, we will find operations that can be performed in parallel. For instance, within the compression loop, temp1
and temp2
can be computed concurrently. An efficient way to parallelize computations here is using the rayon::join()
function, which uses parallel processing only when there are available CPUs. Recall that the two temporary values in the compression loop are the result of several additions, so we can use nested calls to rayon::join()
to potentially parallelize more operations.
Another way to speed up consecutive additions would be using the Carry Save Adder, a very efficient adder that takes 3 numbers and returns a sum and carry sequence. If our inputs are A, B and C, we can construct a CSA with our previously implemented Maj function and the bitwise XOR operation as follows:
By chaining CSAs, we can input the sum and carry from a preceding stage along with another number into a new CSA. Finally, to get the result of the additions we add the sum and carry sequences using a conventional adder. At the end we are performing the same number of additions, but some of them are now CSAs, speeding up the process. Let's see all this together in the temp1
and temp2
computations.
The first closure of the outer call to join will return temp1
and the second temp2
. Inside the first outer closure we call join recursively until we reach the addition of the value h
, the current word w[i]
and the current constant K[i]
by using the CSA, while potentially computing in parallel the ch
function. Then we take the sum, carry and ch values and add them again using the CSA.
All this is done while potentially computing the sigma_upper_case_1
function. Finally we input the previous sum, carry and sigma values to the CSA and perform the final addition with add
. Once again, this is done while potentially computing sigma_upper_case_0
and maj
and adding them to get temp2
, in the second outer closure.
With some changes of this type, we finally get a homomorphic sha256 function that doesn't leave unused computational resources.
First of all, the most important thing when running the program is using the --release
flag. The use of sha256_bool would look like this, given the implementation of encrypt_bools
and decrypt_bools
:
By using stdin
we can supply the data to hash using a file instead of the command line. For example, if our file input.txt
is in the same directory as the project, we can use the following shell command after building with cargo build --release
:
Our implementation also accepts hexadecimal inputs. To be considered as such, the input must start with "0x" and contain only valid hex digits (otherwise it's interpreted as text).
Finally see that padding is executed on the client side. This has the advantage of hiding the exact length of the input to the server, who already doesn't know anything about the contents of it but may extract information from the length.
Another option would be to perform padding on the server side. The padding function would receive the encrypted input and pad it with trivial bit encryptions. We could then integrate the padding function inside the sha256_fhe
function computed by the server.
Sell | Buy | |
---|---|---|
Variable | Maximum Value | Bit Size |
---|---|---|
Operation name | Radix-based | CRT-based |
---|---|---|
Operator | Example | Semantics |
---|---|---|
Pattern | RegExpr value |
---|---|
Pattern | RegExpr value |
---|---|
Pattern | RegExpr value |
---|---|
re | c_pos | Ciphertext operation |
---|---|---|
Name | Notation | Examples |
---|---|---|
Pattern | Description |
---|---|
x | y | z | Result |
---|---|---|---|
Input
[ 5, 12, 7, 4, 3 ]
[ 19, 2 ]
Output
[ 5, 12, 4, 0, 0 ]
[ 19, 2 ]
Input
[ 3, 1, 1, 4, 2 ]
[ 5, 3, 3, 2, 4, 1 ]
Output
[ 3, 1, 1, 4, 2 ]
[ 5, 3, 3, 0, 0, 0 ]
total_sell_volume
50000
16
total_buy_volume
50000
16
total_volume
50000
16
volume_left_to_transact
50000
16
sell_order
100
7
buy_order
100
7
/abc/
Searches for the sequence abc
(equivalent to a simple text search)
/^abc/
Searches for the sequence abc
at the beginning of the content
/a?bc/
Searches for sequences abc
, bc
/ab|c+d/
Searches for sequences of ab
, c
repeated 1 or more times, followed by d
|
a | b
we first try matching on 'a' - if no match, we try to match on 'b'
+
a+
match 'a' one or more times
*
a*
match 'a' any amount of times (including zero times)
?
a?
optionally match 'a' (match zero or one time)
.
.
match any character
..
a .. b
match on a range of alphabetically ordered characters from 'a', up to and including 'b'
a b
sequencing; match on 'a' and then on 'b'
/a/
RegExpr::Char { c: 'a' }
/\\^/
RegExpr::Char { c: '^' }
/./
RegExpr::AnyChar
/^/
RegExpr::SOF
/$/
RegExpr::EOF
/[acd]/
RegExpr::Range { vec!['a', 'c', 'd'] }
/[a-g]/
RegExpr::Between { from: 'a', to: 'g' }
/ab/
RegExpr::Seq { re_xs: vec![RegExpr::Char { c: 'a' }, RegExpr::Char { c: 'b' }] }
/^a.$/
RegExpr::Seq { re_xs: vec![RegExpr::SOF, RexExpr::Char { 'a' }, RegExpr::AnyChar, RegExpr::EOF] }
/a[f-l]/
RegExpr::Seq { re_xs: vec![RegExpr::Char { c: 'a' }, RegExpr::Between { from: 'f', to: 'l' }] }
/a?/
RegExpr::Optional { opt_re: Box::new(RegExpr::Char { c: 'a' }) }
/[a-d]?/
RegExpr::Optional { opt_re: Box::new(RegExpr::Between { from: 'a', to: 'd' }) }
/[^ab]/
RegExpr::Not { not_re: Box::new(RegExpr::Range { cs: vec!['a', 'b'] }) }
/av|d?/
RegExpr::Either { l_re: Box::new(RegExpr::Seq { re_xs: vec![RegExpr::Char { c: 'a' }, RegExpr::Char { c: 'v' }] }), r_re: Box::new(RegExpr::Optional { opt_re: Box::new(RegExpr::Char { c: 'd' }) }) }
/(av|d)?/
RegExpr::Optional { opt_re: Box::new(RegExpr::Either { l_re: Box::new(RegExpr::Seq { re_xs: vec![RegExpr::Char { c: 'a' }, RegExpr::Char { c: 'v' }] }), r_re: Box::new(RegExpr::Char { c: 'd' }) }) }
/a/
0
sk.eq(content[0], a)
/a/
1
sk.eq(content[1], a)
/a/
2
sk.eq(content[2], a)
Character
Simply the character itself
/a/
, /b/
, /Z/
, /5/
Character range
[<character>-<character]
/[a-d]/
, /[C-H]
/
Any character
.
/a.c/
Escaped symbol
\<symbol>
/\^/
, /\$/
Parenthesis
(<regex>)
/(abc)*/
, /d(ab)?/
Optional
<regex>?
/a?/
, /(az)?/
Zero or more
<regex>*
/a*/
, /ab*c/
One or more
<regex>+
/a+/
, /ab+c/
Exact repeat
<regex{<number>}>
/ab{2}c/
At least repeat
<regex{<number>,}>
/ab{2,}c/
At most repeat
<regex{,<number>}>
/ab{,2}c/
Repeat between
<regex{<number>,<number>}>
/ab{2,4}c/
Either
<regex>|<regex>
/a|b/
, /ab|cd/
Start matching
/^<regex>
/^abc/
End matching
<regex>$/
/abc$/
/^abc$/
Matches with content that equals exactly abc
(case sensitive)
/^abc$/i
Matches with content that equals abc
(case insensitive)
/abc/
Matches with content that contains somewhere abc
/ab?c/
Matches with content that contains somewhere abc
or somewhere ab
/^ab*c$/
For example, matches with: ac
, abc
, abbbbc
/^[a-c]b|cd$/
Matches with: ab
, bb
, cb
, cd
/^[a-c]b|cd$/i
Matches with: ab
, Ab
, aB
, ..., cD
, CD
/^d(abc)+d$/
For example, matches with: dabcd
, dabcabcd
, dabcabcabcd
/^a.*d$/
Matches with any content that starts with a
and ends with d
0
0
0
0
0
0
1
1
0
1
0
0
0
1
1
1
1
0
0
0
1
0
1
0
1
1
0
1
1
1
1
1
Negation
Addition
Scalar Addition
Subtraction
Scalar Subtraction
Multiplication
Scalar Multiplication
Bitwise OR, AND, XOR
Equality
Left/Right Shift
Comparisons <
,<=
,>
, >=
Min, Max