This section explains how to fine-tune neural-network models and large language-models on private data. Small models can be be fine-tuned using a single-client/single-server setup. For optimal latency when fine-tuning larger models (e.g., GPT2 and bigger) you should consider distributed computation, with multiple worker nodes performing the training on encrypted data.
For a tutorial about applying FHE LORA fine-tuning to a small neural network, see this notebook.
Concrete ML supports LORA, a parameter efficient fine-tuning (PEFT) approach, in the hybrid model paradigm. LORA adds adapters, which contain a low number of fine-tunable weights, to the linear layers in an original model.
Concrete ML will outsource the logic of a model's original forward and backward passes to one or more remote servers. On the other hand, the forward and backward passes over the LORA weights, the loss computation and the weight updates are performed by the client side. As the number of LORA weights is low, this does not incur significant added computation time for the model training client machine. More than 99% of a model's weights can be outsourced for large LLMs.
The main benefit of hybrid-model LORA training is outsourcing the computation of the linear layers. In LLMs these layers have considerable size and performing inference and gradient computations for them requires significant hardware. Using Concrete ML, these computations can be securely outsourced, eliminating the memory bottleneck that previously constrained such operations.
Concrete ML integrates with the peft
package which adds LORA layer adapters to a model's linear layers. Here are the steps to convert a model to hybrid FHE LORA training.
peft
LORA layersThe LoraConfig
class from the peft
package contains the various LORA parameters. It is possible to specify which layers have LORA adapters through the target_modules
argument. Please refer to the LoraConfig
documentation for a reference on the various config options.
Concrete ML requires a conversion step for the peft
model, adding FHE compatible layers. In this step the several fine-tuning parameters can be configured:
the number of gradient accumulation steps: for LORA it is common to accumulate gradients over several gradient descent steps before updating weights.
the optimizer parameters
the loss function
Next, a hybrid FHE model must be compiled in order to convert the selected outsourced layers to use FHE. Other layers will be executed on the client side. The back-and-forth communication of encrypted activations and gradients may require significant bandwidth.
Finally, the hybrid model can be trained, much in the same way a PyTorch model is trained. The client is responsible for generating and iterating on training data batches.
One fine-tuned, the LORA hybrid FHE model can perform inference only, through the model.inference_model
attribute of the hybrid FHE model.
To compare to the original model, it is possible to disable the LORA weights in order to use the original model for inference.