This tutorial guides you to convert a regular SHA-256 function to its homomorphic version, with considerations of optimal performances. You will learn:
The basics of the SHA-256 function.
The steps to implement SHA-256 homomorphically.
First, you need to implement the SHA-256 function. You can find the official specification for SHA-256 here. We summarize the three key aspects of SHA-256 outlined in the document:
The SHA-256 function processes the input data in blocks or chunks of 512 bits. Before performing the hash computations, prepare the data as follows:
Append a single "1" bit
Append "0" bits until exactly 64 bits remain to make the message length a multiple of 512
Append the last 64 bits as a binary encoding of the original input length
In this diagram, the numbers on the top represent the length of the padded input at each position. The formula L+1+k+64 ensures that the length reaches a multiple of 512, matching the required length of the padded input.
We will use bitwise AND, XOR, NOT, addition modulo 2^32, the Rotate Right (ROTR) and Shift Right (SHR) operations as building blocks for functions inside the SHA-256 computation. These operations all use 32-bit words and produce new words.
We combine these operations inside the sigma (with 4 variations), Ch,
and Maj
functions. When changing SHA-256 to the homomorphic computation, we will mainly change the code of each operation.
Here is the definition of each function:
We simplify Maj
using the Boolean distributive law: (x AND y) XOR (x AND z) = x AND (y XOR z), as shown below:
We simplify Ch
using a single bitwise multiplexer. Here's the truth table of the Ch
expression.
0
0
0
0
0
0
1
1
0
1
0
0
0
1
1
1
1
0
0
0
1
0
1
0
1
1
0
1
1
1
1
1
This table shows that the result equals to z
when x = 0
, and the result equals to y
when x = 1
, which means if x {y} else {z}
. Hence we can replace the 4 bitwise operations of Ch
by a single bitwise multiplexer.
All these operations can be evaluated homomorphically:
ROTR and SHR: They can be evaluated by changing the index of each ecrypted bit of the word without using any homomorphic operation.
Bitwise AND, XOR and multiplexer: They can be computed homomorphically
Addition modulo 2^32: It can be broken down into boolean homomorphic operations.
The SHA-256 function processes data in 512-bit chunks. Here is what happens during computation:
The 512-bit chunk is computed into 16 words, each containing 32 bits.
Another 48 words are computed using the previous function.
After computing the 64 words, within the same chunk, a compression loop will compute a hash value (8 32-bit words) using the previous functions and some constants to mix everything up.
This entire process iterate through each 512-bit chunk of your data.
When we finish the last chunk iteration, the resulting hash values will be the output of the SHA-256 function.
Here is an example of this function using arrays of 32 bools to represent words:
To convert SHA-256 to a homomorphic version, you can replace each bit of padded_input
with a fully homomorphic encryption of the same bit value and operate on the encrypted value using homomorphic operations.
While the structure of the SHA-256 function remains the same, there are some important considerations in the code:
The function signature and the borrowing rules should adapt to the ciphertext type (representing the encrypted bits).
Implementing SHA-256 operations with homomorphic encryption uses homomorphic boolean operations internally.
Homomorphic operations on encrypted data can be very expensive. Consider these options for better speed:
Remove unnecessary use of homomorphic operations and maximize parallelization.
Simplify the code with Rayon crate that parallelizes iterators and manages threads efficiently.
The final code is available here.
Now let's dive into details of each SHA256 operation.
Rotate Right and Shift Right can be evaluated by changing the position of each encrypted bit in the word, requiring no homomorphic operations. Here is the implementation:
To implement these operations, we will use the xor
, and mux
methods from the TFHE-rs library to perform each boolean operation homomorphically.
For better efficiency, we can parallelize the homomorphic computations because we operate bitwise. It means that we can homomorphically XOR the bits at index 0 of two words using one thread while XORing the bits at index 1 using another thread, and so on. This approach allows for the computation of bitwise operations using up to 32 concurrent threads, corresponding to the 32-bit words used.
Here is the implementation of the bitwise homomorphic XOR operation. The par_iter
and par_iter_mut
methods create a parallel iterator that we use to compute each XOR efficiently. The other two bitwise operations are implemented in the same way.
This might be the trickiest operation to efficiently implement in a homomorphic manner. A naive implementation could use the Ripple Carry Adder algorithm, which is straightforward but cannot be parallelized because each step depends on the previous one.
A better choice is to use Carry Lookahead Adder, which allows us to use the parallelized AND and XOR bitwise operations. With this design, our adder is around 50% faster than the Ripple Carry Adder.
To further optimize performance, we use parallel prefix algorithms to parallelize the function that computes the carry signals. These algorithms involve more (homomorphic) boolean operations and their parallel nature speeds up the processing. We have implemented the Brent-Kung and Ladner-Fischer algorithms with different tradeoffs:
Brent-Kung has the least amount of boolean operations we could find (140 when using grey cells, for 32-bit numbers), which makes it suitable when we can't process many operations concurrently and fast. Our results confirm that it's indeed faster than both the sequential algorithm and Ladner-Fischer when run on regular computers.
On the other hand, Ladner-Fischer performs more boolean operations (209 using grey cells) than Brent-Kung, but they are performed in larger batches. Hence we can compute more operations in parallel and finish earlier, but we need more fast threads available or they will slow down the carry signals computation. Ladner-Fischer can be suitable when using cloud-based computing services, which offer many high-speed threads.
Our implementation uses Brent-Kung by default, but you can enable Ladner-Fischer by using the --ladner-fischer
command line argument.
For more information about parallel prefix adders, you can read this paper or this other paper.
Finally, with all these SHA-256 operations working homomorphically, our functions will be homomomorphic as well along with the whole SHA-256 function (after adapting the code to work with the Ciphertext type).
Let's talk about other performance improvements we can make before we finish.
In the main sha256_fhe
, you can perform some functions in parallel. For example, in the compression loop, temp1
and temp2
can be computed in parallel by using the rayon::join()
function when there is a CPU available. The two temporary values in the compression loop are the result of multiple additions, so you can use nested calls to rayon::join()
to parallelize more operations.
Another way to speed up consecutive additions would be using the Carry Save Adder, a very efficient adder that takes 3 numbers and returns a sum and a carry sequence. If our inputs are A, B, and C, we can construct a CSA with our previously implemented Maj function and the bitwise XOR operation as follows:
By chaining CSAs, we can input the sum and carry from a preceding stage along with another number into a new CSA. Finally, to get the result of the additions we add the sum and carry sequences using a conventional adder. In the end, we are performing the same number of additions, but some of them are now CSAs, speeding up the process. Below is the illustration of this process in the temp1
and temp2
computations.
The first closure of the outer call to join will return temp1
and the second temp2
.
Inside the first outer closure, we call join recursively until we add the value h
, the current word w[i],
and the current constant K[i]
by using the CSA, while potentially computing the ch
function in parallel. Then we take the sum, carry, and ch values and add them again using the CSA.
All this is done while potentially computing the sigma_upper_case_1
function. Finally we input the previous sum, carry, and sigma values to the CSA and perform the final addition with add
. Once again, this is done while potentially computing sigma_upper_case_0
and maj
and adding them to get temp2
, in the second outer closure.
With these types of changes, we finally get a homomorphic SHA256 function that doesn't leave unused computational resources.
First, use the --release
flag when running the program. Considering the implementation of encrypt_bools
and decrypt_bools
, the use of SHA-256 will be as follows:
We can supply the data to hash using a file instead of the command line by using stdin
. For example, if the file input.txt
is in the same directory as the project, we can use the following shell command after building with cargo build --release
:
The program accepts hexadecimal inputs. The input must start with "0x" and contain only valid hex digits, otherwise it will be interpreted as text.
Finally, padding is performed on the client side. This has the advantage of hiding the exact length of the input content from the server, thus avoiding the server extracting information from the length, even though the content is fully encrypted.
It is also feasible to perform padding on the server side. The padding function would take the encrypted input and pad it with trivial bit encryptions. We can then integrate the padding function into the sha256_fhe
function computed by the server.