For developers using FPGAs for the implementation of floating-point DSP functions, one key challenge is how to decompose the computation algorithm into sequences of parallel hardware processes while efficiently managing data flow through the parallel pipelines of these processes.