Concrete ML
WebsiteLibrariesProducts & ServicesDevelopersSupport
0.3
0.3
  • What is Concrete ML?
  • Getting Started
    • Installation
    • Key Concepts
  • Built-in Models
    • Linear Models
    • Tree-based Models
    • Neural Networks
    • Examples
  • Deep Learning
    • Using Torch
    • Using ONNX
    • Examples
    • Debugging Models
  • Advanced topics
    • Quantization
    • Pruning
    • Production Deployment
    • Compilation
    • More about ONNX
    • FHE Op-graphs
    • Using Hummingbird
    • Using Skorch
  • Developer Guide
    • Set Up the Project
    • Set Up Docker
    • Documentation
    • Support and Issues
    • Contributing
    • API
Powered by GitBook

Libraries

  • TFHE-rs
  • Concrete
  • Concrete ML
  • fhEVM

Developers

  • Blog
  • Documentation
  • Github
  • FHE resources

Company

  • About
  • Introduction to FHE
  • Media
  • Careers
On this page

Was this helpful?

Export as PDF
  1. Advanced topics

Pruning

PreviousQuantizationNextProduction Deployment

Last updated 2 years ago

Was this helpful?

Pruning is a method to reduce neural network complexity, usually applied in order reduce the computation cost or memory size. Pruning is used in Concrete-ML to control the size of accumulators in neural networks, thus making them FHE compatible. See for an explanation of the accumulator bitwidth constraints.

In neural networks, a neuron computes a linear combination of inputs and learned weights, then applies an activation function.

The neuron computes:

yk=ϕ(∑iwixi)y_k = \phi\left(\sum_i w_ix_i\right)yk​=ϕ(∑i​wi​xi​)

When building a full neural network, each layer will contain multiple neurons, which are connected to the neuron outputs of a previous layer or to the inputs.

For every neuron shown in each layer of the figure above, the linear combinations of inputs and learned weights are computed. Depending on the values of the inputs and weights, the sum vk=∑iwixiv_k = \sum_i w_ix_ivk​=∑i​wi​xi​ - which for Concrete-ML neural networks is computed with integers - can take a range of different values.

Pruning a neural network entails fixing some of the weights wkw_kwk​ to be zero during training. This is advantageous to meet FHE constraints, as irrespective of the distribution of xix_ixi​, multiplying these input values by 0 does not increase the accumulator value.

Fixing some of the weights to 0 makes the network graph look more similar to the following:

To respect the bit width constraint of the FHE , the values of the accumulator vkv_kvk​ must remain small to be representable with only 8 bits. In other words, the values must be between 0 and 255.

While pruning weights can reduce the prediction performance of the neural network, studies show that a high level of pruning (above 50%) can often be applied. See here how Concrete-ML uses pruning in .

Table Lookup
Fully Connected Neural Networks
Artificial Neuron (from: wikipedia)
Fully Connected Neural Network
Pruned Fully Connected Neural Network
here