Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Asif Rehman
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CADM=JeiGCVEzLY4WmAX6PxtzZ8c5oo46F7rKOxXuOsMTSELCjQ@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Thu, Apr 2, 2020 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
> go, that's why the name.

I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?

Okay will rename and will modify the function to send a single file as well.


> 1- parallel backup does not work with a standby server. In parallel backup, the server
> spawns multiple processes and there is no shared state being maintained. So currently,
> no way to tell multiple processes if the standby was promoted during the backup since
> the START_BACKUP was called.

Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.


Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
in progress. So if the backup is a large one, early error detection would be much beneficial.
This is the current behavior of non-parallel backup as well.
 

> 2- throttling. Robert previously suggested that we implement throttling on the client-side.
> However, I found a previous discussion where it was advocated to be added to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function to the client.
> That’s why for the time being I have disabled it and have asked for suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support taking backup
> from standby. Also, there is a new feature recently committed for backup progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
> added via this commit ID: e65497df. For parallel backup to update these stats, a shared
> state will be required.

I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.

Okay, then I will add the shared state. And since we are adding the shared state, we can use
that for throttling, progress-reporting and standby early error checking.


> Since multiple pg_basebackup can be running at the same time, maintaining a shared state
> can become a little complex, unless we disallow taking multiple parallel backups.

I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.

There are two possible options:

(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.


This BackupID should be given back as a response to start backup command. All client workers

must append this ID to all parallel backup replication commands. So that we can use this identifier

to search for that particular backup. Does that sound good?
 

 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Proposal: Expose oldest xmin as SQL function for monitoring
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?