This guide explains dataflow parallelism and how it can improve the execution time of Concrete circuits.
Dataflow parallelism is particularly useful when the circuit performs computations that are neither completely independent (such as loop/doall parallelism) nor fully dependent (e.g. sequential, non-parallelizable code). In such cases dataflow tasks can execute as soon as their inputs are available and thus minimizing over-synchronization.
Without dataflow parallelism, circuit is executed operation by operation, like an imperative language. If the operations themselves are not tensorized, loop parallelism would not be utilized and the entire execution would happen in a single thread. Dataflow parallelism changes this by analyzing the operations and their dependencies within the circuit to determine what can be done in parallel and what cannot. Then it distributes the tasks that can be done in parallel to different threads.
For example:
This prints:
The reason for that is:
To summarize, dataflow analyzes the circuit to determine which parts of the circuit can be run at the same time, and tries to run as many operations as possible in parallel.
When the circuit is tensorized, dataflow might slow execution down since the tensor operations already use multiple threads and adding dataflow on top creates congestion in the CPU between the HPX (dataflow parallelism runtime) and OpenMP (loop parallelism runtime). So try both before deciding on whether to use dataflow or not.
This guide explains tensorization and how it can improve the execution time of Concrete circuits.
Tensors should be used instead of scalars when possible to maximize loop parallelism.
For example:
This prints:
Enabling dataflow is kind of letting the runtime do this for you. It'd also help in the specific case.
This guide introduces the different options for parallelism in Concrete and how to utilize them to improve the execution time of Concrete circuits.
Modern CPUs have multiple cores to perform computation and utilizing multiple cores is a great way to boost performance.
There are two kinds of parallelism in Concrete:
Loop parallelism to make tensor operations parallel, achieved by using
Dataflow parallelism to make independent operations parallel, achieved by using
Loop parallelism is enabled by default, as it's supported on all platforms. Dataflow parallelism however is only supported on Linux, hence not enabled by default.