Home > mailing lists

Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From	Ibrar Ahmed
Subject	Re: WIP/PoC for parallel backup
Date	August 23, 2019 16:03:10
Msg-id	CALtqXTcsT5aoxcKK1shs+r37LO6ToPJ8feztSH6w-R0zMuQD2g@mail.gmail.com Whole thread Raw
In response to	Re: WIP/PoC for parallel backup (Asim R P <apraveen@pivotal.io>)
List	pgsql-hackers

Tree view

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Hi Asif

Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

The main question is what we really want to solve here. What is the

bottleneck? and which HW want to saturate?. Why I am saying that because

there are multiple H/W involve while taking the backup (Network/CPU/Disk). If we

already saturated the disk then there is no need to add parallelism because

we will be blocked on disk I/O anyway. I implemented the parallel backup in a sperate

application and has wonderful results. I just skim through the code and have

some reservation that creating a separate process only for copying data is overkill.

There are two options, one is non-blocking calls or you can have some worker threads.

But before doing that need to see the pg_basebackup bottleneck, after that, we

can see what is the best way to solve that. Some numbers may help to understand the

actual benefit.

Ibrar Ahmed

pgsql-hackers by date:

From: Pierre Giraud
Date: 23 August 2019, 15:47:56
Subject: Explain: Duplicate key "Workers" in JSON format

From: Andrew Dunstan
Date: 23 August 2019, 16:21:59
Subject: Re: "ago" times on buildfarm status page

Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

Previous

Next