Nearest neighbors

Concrete ML offers nearest neighbors non-parametric classification models with a scikit-learn interface through the KNeighborsClassifier class.

Concrete ML

scikit-learn

KNeighborsClassifier

Example usage

from concrete.ml.sklearn import KNeighborsClassifier

concrete_classifier = KNeighborsClassifier(n_bits=2, n_neighbors=3)

The KNeighborsClassifier class quantizes the training data-set that is given to .fit with the specified number of bits, n_bits. As this value must be kept low to comply with accumulator size constraints the accuracy of the model will depend heavily a well-chosen value n_bits and the dimensionality of the data.

The predict method of the KNeighborsClassifier performs the following steps:

quantization of the test vectors, performed in the clear
computation of the top-k class indices of the closest training set vector, on encrypted data
majority vote of the top-k class labels to find the class for each test vector, performed in the clear

Inference time considerations

The FHE inference latency of this model is heavily influenced by the n_bits, the dimensionality of the data. Furthermore, the size of the data-set has a linear impact on the complexity of the data and the number of nearest neighbors, n_neighbors, also plays a role.

The KNN computation executes in FHE in $O(Nlog^2k)$ steps, where $N$ is the training data-set size and $k$ is n_neighbors. Each step requires several PBS, but the run-time of each of these PBS is influenced by the factors listed above. These factors combine to give the precision required to represent the distances between test vectors and the training data-set vectors. The PBS input precision required by the circuit is related to the one of the distance values.

PreviousNeural networks NextEncrypted dataframe

Last updated 1 year ago

Was this helpful?