0.3

Getting Started

Built-in Models

Deep Learning

Advanced topics

Tree-based Models

Concrete-ML provides several of the most popular tree models

`classification`

that can be found in Scikit-learn:Concrete-ML

scikit-learn

Concrete-ML

XGboost

Example

Here's an example of how to use this model in FHE on a popular dataset using some of scikit-learn's preprocessing tools. A more complete example can be found in the XGBClassifier notebook.

from sklearn.datasets import load_breast_cancer

from sklearn.decomposition import PCA

from sklearn.model_selection import GridSearchCV, train_test_split

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

â€‹

from concrete.ml.sklearn.xgb import XGBClassifier

â€‹

â€‹

# Get dataset and split into train and test

X, y = load_breast_cancer(return_X_y=True)

â€‹

# Split the train and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=8)

â€‹

# Define our model

model = XGBClassifier(n_jobs=1, n_bits=3)

â€‹

# Define the pipeline

# We will normalize the data and apply a PCA before fitting the model

pipeline = Pipeline([("standard_scaler", StandardScaler()), ("pca", PCA()), ("model", model)])

â€‹

# Define the parameters to tune

param_grid = {

"pca__n_components": [2, 5, 10, 15],

"model__max_depth": [2, 3, 5],

"model__n_estimators": [5, 10, 20],

}

â€‹

# Instantiate the grid search with 5-fold cross validation on all available cores

grid = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1, scoring="accuracy")

â€‹

# Launch the grid search

grid.fit(X_train, y_train)

â€‹

# Print the best parameters found

print(f"Best parameters found: {grid.best_params_}")

â€‹

# Output:

# Best parameters found: {'model__max_depth': 5, 'model__n_estimators': 10, 'pca__n_components': 5}

â€‹

# Currently we only focus on model inference in FHE

# The data transformation will be done in clear (client machine)

# while the model inference will be done in FHE on a server.

# The pipeline can be split into 2 parts:

# 1. data transformation

# 2. estimator

best_pipeline = grid.best_estimator_

data_transformation_pipeline = best_pipeline[:-1]

model = best_pipeline[-1]

â€‹

# Transform test set

X_train_transformed = data_transformation_pipeline.transform(X_train)

X_test_transformed = data_transformation_pipeline.transform(X_test)

â€‹

# Evaluate the model on the test set in clear

y_pred_clear = model.predict(X_test_transformed)

print(f"Test accuracy in clear: {(y_pred_clear == y_test).mean():0.2f}")

â€‹

# Output:

# Test accuracy: 0.98

â€‹

# Compile the model to FHE

model.compile(X_train_transformed)

â€‹

# Perform the inference in FHE

# Warning: this will take a while. It is recommended to run this with a very small batch of

# example first (e.g. N_TEST_FHE = 1)

# Note that here the encryption and decryption is done behind the scene.

N_TEST_FHE = 1

y_pred_fhe = model.predict(X_test_transformed[:N_TEST_FHE], execute_in_fhe=True)

â€‹

# Assert that FHE predictions are the same as the clear predictions

print(f"{(y_pred_fhe == y_pred_clear[:N_TEST_FHE]).sum()} "

f"examples over {N_TEST_FHE} have a FHE inference equal to the clear inference.")

â€‹

# Output:

# 1 examples over 1 have a FHE inference equal to the clear inference

Using the above example, we can then plot how the model classifies the inputs and then compare those results with the XGBoost model executed in clear. A 6 bits model is also given in order to better understand the impact of quantization on classification. Similar plots can be found in the Classifier Comparison notebook.

Comparison of clasification decision boundaries between FHE and plaintext models

This shows the impact of quantization over the decision boundaries in the FHE models, especially with the 3 bits model, where only three main decision boundaries can be observed. This results in a small decrease of accuracy of about 7% compared to the initial XGBoost classifier. Besides, using 6 bits of quantization makes the model reach 93% of accuracy, drastically reducing this difference to only 1.7%.

In fact, the quantization process may sometimes create some artifacts that could lead to a decrease in performance. Still, the impact of those artifacts is often minor when considering small tree-based models, making FHE models reach similar scores as their equivalent clear ones.

Last modified 20d ago

Export as PDF

Copy link