Pruning
Pruning is a method to reduce neural network complexity, usually applied in order reduce the computation cost or memory size. Pruning is used in Concrete-ML to control the size of accumulators in neural networks, thus making them FHE compatible. See here for an explanation of the accumulator bitwidth constraints.
In neural networks, a neuron computes a linear combination of inputs and learned weights, then applies an activation function.
The neuron computes:
When building a full neural network, each layer will contain multiple neurons, which are connected to the neuron outputs of a previous layer or to the inputs.
Fixing some of the weights to 0 makes the network graph look more similar to the following:
While pruning weights can reduce the prediction performance of the neural network, studies show that a high level of pruning (above 50%) can often be applied. See here how Concrete-ML uses pruning in Fully Connected Neural Networks.
Last updated