Tree-based Models
Last updated
Last updated
Concrete ML provides several of the most popular classification
and regression
tree models that can be found in scikit-learn:
Concrete ML | scikit-learn |
---|---|
Concrete ML also supports XGBoost's XGBClassifier
:
Concrete ML | XGboost |
---|---|
For a formal explanation of the mechanisms that enable FHE-compatible decision trees, please see the following paper: Privacy-Preserving Tree-Based Inference with Fully Homomorphic Encryption, arXiv:2303.01254
Here's an example of how to use this model in FHE on a popular data-set using some of scikit-learn's pre-processing tools. A more complete example can be found in the XGBClassifier notebook.
Similarly, the decision boundaries of the Concrete ML model can be plotted, then compared to the results of the classical XGBoost model executed in the clear. A 6-bits model is shown in order to illustrate the impact of quantization on classification. Similar plots can be found in the Classifier Comparison notebook.
This graph above shows that, when using a sufficiently high bit-width, quantization has little impact on the decision boundaries of the Concrete ML FHE decision tree models. As quantization is done individually on each input feature, the impact of quantization is strongly reduced, and, thus, FHE tree-based models reach similar accuracy as their floating point equivalents. Using 6 bits for quantization makes the Concrete ML model reach or exceed the floating point accuracy. The number of bits for quantization can be adjusted through the n_bits
parameter.
When n_bits
is set low, the quantization process may sometimes create some artifacts that could decrease in performance, but the execution speed in FHE decreases. In this way, it is possible to adjust the accuracy/speed trade-off, and some accuracy can be recovered by increasing the n_estimators
.
The following graph shows that using 5-6 bits of quantization is usually sufficient to reach the performance of a non-quantized XGBoost model on floating point data. The metrics plotted are accuracy and F1-score on the spambase
data-set.