Encrypted fine-tuning
This document explains how to fine-tune neural-network models and large language-models (LLMs) on private data.
Small models can be fine-tuned using a single-client/single-server setup. For larger models (such as GPT-2 and above), consider using distributed computation across multiple worker nodes to perform training on encrypted data for optimal latency.
Overview
Refer to this notebook to see the tutorial about applying FHE LoRA fine-tuning to a small neural network.
Concrete ML supports LoRA, a parameter-efficient fine-tuning (PEFT) approach, in the hybrid model paradigm. LoRA adds adapter layers, which contain a small number of trainable parameters, to the linear layers of a base model.
In this setup, Concrete ML outsources the computationally intensive parts of forward and backward passes for large models to one or more remote servers. The training client machine only handles the LoRA-adapter forward/backward passes, loss computation, and adapter weight updates. Since the LoRA adapters are small, this additional computation on the client side is minimal. For large LLMs, over 99% of the model's weights can remain outsourced.
The main benefit of hybrid-model LoRA training is outsourcing the computation of linear layers, which are typically large in LLMs. These layers require substantial hardware for inference and gradient computation. By securely outsourcing this work, Concrete ML removes the memory bottleneck that previously limited such operations.
Usage
Concrete ML integrates with the peft
package to add LoRA adapters to a model's linear layers. Below are the steps to convert a model into a hybrid FHE LoRA training setup.
1. Apply the peft
LoRA layers
peft
LoRA layersThe LoraConfig
class from the peft
package contains the various LoRA parameters. You can specify which layers have LoRA adapters through the target_modules
argument. For a detailed reference of the various configuration options, refer to the LoraConfig
documentation.
2. Convert the LoRA model to use custom Concrete ML layers
Next, we need to integrate the LoRA-adapted peft_model
into the Concrete ML hybrid FHE training framework. This is done using the LoraTrainer
class, which handles the logic of encrypting outsourced computations, running the forward and backward passes, and updating the LoRA adapter weights.
You can configure:
The loss function.
The optimizer and its parameters.
Gradient accumulation steps (if needed).
3. Compile a hybrid FHE model for the LoRA adapted PyTorch model
Before training in FHE, we need to compile the model. Compilation calibrates and converts the outsourced linear layers to their FHE equivalents. The compile method uses representative data for this step.
At this point, the trainer has a hybrid FHE model ready for encrypted execution of the outsourced layers. The LoRA layers remain on the client side in the clear.
4. Train the model on private data
You can now train the hybrid FHE model with your private data. The train method will run forward and backward passes, updating only the LoRA adapter weights locally while securely outsourcing the main layers’ computations.
Additional options
Inference
Once fine-tuned, the LoRA hybrid FHE model can perform inference only, through the peft_model
attribute of the hybrid FHE model.
Toggle LoRA layers
To compare to the original model, you can disable the LoRA weights to use the original model for inference.
Last updated