Comment on page

# Performance

The most important operation in Concrete-Numpy is the table lookup operation. All operations except addition, subtraction, multiplication with non-encrypted values, and a few operations built with those primitive operations (e.g. matmul, conv) are converted to table lookups under the hood:

import concrete.numpy as cnp

@cnp.compiler({"x": "encrypted"})

def f(x):

return x ** 2

inputset = range(2 ** 4)

circuit = f.compile(inputset)

is exactly the same as

import concrete.numpy as cnp

table = cnp.LookupTable([x ** 2 for x in range(2 ** 4)])

@cnp.compiler({"x": "encrypted"})

def f(x):

return table[x]

inputset = range(2 ** 4)

circuit = f.compile(inputset)

Table lookups are very flexible, and they allow Concrete Numpy to support many operations, but they are expensive! Therefore, you should try to avoid them as much as possible. In most cases, it's not possible to avoid them completely, but you might remove the number of TLUs or replace some of them with other primitive operations.

The exact cost depend on many variables (machine configuration, error probability, etc.), but you can develop some intuition for single threaded CPU execution performance using:

import time

import concrete.numpy as cnp

import numpy as np

WARMUP = 3

SAMPLES = 8

BITWIDTHS = range(1, 15)

CONFIGURATION = cnp.Configuration(

enable_unsafe_features=True,

use_insecure_key_cache=True,

insecure_key_cache_location=".keys",

)

timings = {}

for n in BITWIDTHS:

@cnp.compiler({"x": "encrypted"})

def base(x):

return x

table = cnp.LookupTable([np.sqrt(x).round().astype(np.int64) for x in range(2 ** n)])

@cnp.compiler({"x": "encrypted"})

def tlu(x):

return table[x]

inputset = [0, 2**n - 1]

base_circuit = base.compile(inputset, CONFIGURATION)

tlu_circuit = tlu.compile(inputset, CONFIGURATION)

print()

print(f"Generating keys for n={n}...")

base_circuit.keygen()

tlu_circuit.keygen()

timings[n] = []

for i in range(SAMPLES + WARMUP):

sample = np.random.randint(0, 2 ** n)

encrypted_sample = base_circuit.encrypt(sample)

start = time.time()

encrypted_result = base_circuit.run(encrypted_sample)

end = time.time()

assert base_circuit.decrypt(encrypted_result) == sample

base_time = end - start

encrypted_sample = tlu_circuit.encrypt(sample)

start = time.time()

encrypted_result = tlu_circuit.run(encrypted_sample)

end = time.time()

assert tlu_circuit.decrypt(encrypted_result) == np.sqrt(sample).round().astype(np.int64)

tlu_time = end - start

if i >= WARMUP:

timings[n].append(tlu_time - base_time)

print(f"Sample #{i - WARMUP + 1} took {timings[n][-1] * 1000:.3f}ms")

print()

for n, times in timings.items():

print(f"{n}-bits -> {np.mean(times) * 1000:.3f}ms")

Concrete Numpy automatically parallelize execution if TLUs are applied to tensors.

Last modified 9mo ago