Hi Asif
Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.
What do you think?
The main question is what we really want to solve here. What is the
bottleneck? and which HW want to saturate?. Why I am saying that because
there are multiple H/W involve while taking the backup (Network/CPU/Disk). If we
already saturated the disk then there is no need to add parallelism because
we will be blocked on disk I/O anyway. I implemented the parallel backup in a sperate
application and has wonderful results. I just skim through the code and have
some reservation that creating a separate process only for copying data is overkill.
There are two options, one is non-blocking calls or you can have some worker threads.
But before doing that need to see the pg_basebackup bottleneck, after that, we
can see what is the best way to solve that. Some numbers may help to understand the
actual benefit.