Nearest neighbors
Last updated
Was this helpful?
Last updated
Was this helpful?
This document introduces the nearest neighbors non-parametric classification models that Concrete ML provides with a scikit-learn interface through the KNeighborsClassifier
class.
These models only support Concrete ciphertexts. See documentation for more details.
The KNeighborsClassifier
class quantizes the training data-set provided to .fit
using the specified number of bits (n_bits
). To comply with , you must keep this value low. The model's accuracy will depend significantly on a well-chosen n_bits
value and the dimensionality of the data.
The predict
method of the KNeighborsClassifier
performs the following steps:
Quantize the test vectors on clear data
Compute the top-k class indices of the closest training set vector on encrypted data
Vote for the top-k class labels to find the class for each test vector, performed on clear data
The FHE inference latency of this model is heavily influenced by the n_bits
and the dimensionality of the data. Additionally, the data-set size has a linear impact on the data complexity. The number of nearest neighbors (n_neighbors
) also affects performance.
The KNN computation executes in FHE in steps, where is the training data-set size and is n_neighbors
. Each step requires several , with their runtime affected by the factors listed above. These factors determine the precision needed to represent the distances between test vectors and training data-set vectors. The PBS input precision required by the circuit is related to the precision of the distance values.