Nearest Neighbors
Last updated
Last updated
Concrete ML offers nearest neighbors non-parametric classification models with a scikit-learn interface through the KNeighborsClassifier
class.
The KNeighborsClassifier
class quantizes the training data-set that is given to .fit
with the specified number of bits, n_bits
. As this value must be kept low to comply with accumulator size constraints the accuracy of the model will depend heavily a well-chosen value n_bits
and the dimensionality of the data.
The predict
method of the KNeighborsClassifier
performs the following steps:
quantization of the test vectors, performed in the clear
computation of the top-k class indices of the closest training set vector, on encrypted data
majority vote of the top-k class labels to find the class for each test vector, performed in the clear
The FHE inference latency of this model is heavily influenced by the n_bits
, the dimensionality of the data. Furthermore, the size of the data-set has a linear impact on the complexity of the data and the number of nearest neighbors, n_neighbors
, also plays a role.
The KNN computation executes in FHE in steps, where is the training data-set size and is n_neighbors
. Each step requires several PBS, but the run-time of each of these PBS is influenced by the factors listed above. These factors combine to give the precision required to represent the distances between test vectors and the training data-set vectors. The PBS input precision required by the circuit is related to the one of the distance values.