Encrypted training
Last updated
Was this helpful?
Last updated
Was this helpful?
This document explains how to train on encrypted data.
Training on encrypted data is done through an FHE program that is generated by Concrete ML, based on the characteristics of the data that are given to the fit
function. Once the FHE program associated with the SGDClassifier
object has fit the encrypted data, it performs specifically to that data's distribution and dimensionality.
When deploying encrypted training services, you need to consider the type of data that future users of your services will train on:
The distribution of the data should match to achieve good accuracy
The dimensionality of the data needs to match since the deployed FHE programs are compiled for a fixed number of dimensions.
See the section for more details.
These models only support Concrete ciphertexts. See documentation for more details.
The example shows logistic regression training on encrypted data in action.
The following snippet shows how to instantiate a logistic regression model that trains on encrypted data:
To activate encrypted training, simply set fit_encrypted=True
in the constructor. When the value is set, Concrete ML generates an FHE program which, when called through the fit
function, processes encrypted training data, labels and initial weights and outputs trained model weights. If this value is not set, training is performed on clear data using scikit-learn
gradient descent.
Next, to perform the training on encrypted data, call the fit
function with the fhe="execute"
argument:
The max_iter
parameter controls the number of batches that are processed by the training algorithm.
The trainable logistic model uses Stochastic Gradient Descent (SGD) and quantizes the data, weights, gradients and the error measure. It currently supports training 6-bit models, including g both the coefficients and the bias.
The SGDClassifier
does not currently support training models with other bit-width values. The execution time to train a model is proportional to the number of features and the number of training examples in the batch. The SGDClassifier
training does not currently support client/server deployment for training.
The parameters_range
parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range .
Once you have tested an SGDClassifier
that trains on encrypted data, you can build an FHE training service by deploying the FHE training program of the SGDClassifier
. See the page for more details on how to the Concrete ML deployment utility classes. To deploy an FHE training program, you must pass the mode='training'
parameter to the FHEModelDev
class.