This document introduces the nearest neighbors non-parametric classification models that Concrete ML provides with a scikit-learn interface through the KNeighborsClassifier
class.
The KNeighborsClassifier
class quantizes the training data-set provided to .fit
using the specified number of bits (n_bits
). To comply with accumulator size constraints, you must keep this value low. The model's accuracy will depend significantly on a well-chosen n_bits
value and the dimensionality of the data.
The predict
method of the KNeighborsClassifier
performs the following steps:
Quantize the test vectors on clear data
Compute the top-k class indices of the closest training set vector on encrypted data
Vote for the top-k class labels to find the class for each test vector, performed on clear data
The FHE inference latency of this model is heavily influenced by the n_bits
and the dimensionality of the data. Additionally, the data-set size has a linear impact on the data complexity. The number of nearest neighbors (n_neighbors
) also affects performance.
The KNN computation executes in FHE in steps, where is the training data-set size and is n_neighbors
. Each step requires several PBS operations, with their runtime affected by the factors listed above. These factors determine the precision needed to represent the distances between test vectors and training data-set vectors. The PBS input precision required by the circuit is related to the precision of the distance values.