Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Asif Rehman
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CADM=Jej4UxVHkR-gxV0eY0TkFeZAMKkYqQhY+kwbO12iSam+0Q@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Asif Rehman <asifr.rehman@gmail.com>)
Responses Re: WIP/PoC for parallel backup
List pgsql-hackers
Hi,

I have created a commitfest entry.

On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Thanks Jeevan. Here is the documentation patch.

On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
                            [NOVERIFY_CHECKSUMS]


Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.


I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1] 
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This approach
can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the 
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted 
according to data received by threads.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



The latest changes look good to me. However, the patch set is missing the documentation.
Please add those.

Thanks

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company



--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_trigger.tgparentid
Next
From: Asif Rehman
Date:
Subject: Re: [Patch] Base backups and random or zero pageheaders