Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Robert Haas
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CA+Tgmoap42j7Z296OA9OcOzFZtNMgBHNN851pk5rgJmVPSbQ1Q@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Asif Rehman <asifr.rehman@gmail.com>)
Responses Re: WIP/PoC for parallel backup  (Asif Rehman <asifr.rehman@gmail.com>)
List pgsql-hackers
On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
> go, that's why the name.

I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?

> 1- parallel backup does not work with a standby server. In parallel backup, the server
> spawns multiple processes and there is no shared state being maintained. So currently,
> no way to tell multiple processes if the standby was promoted during the backup since
> the START_BACKUP was called.

Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.

> 2- throttling. Robert previously suggested that we implement throttling on the client-side.
> However, I found a previous discussion where it was advocated to be added to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function to the client.
> That’s why for the time being I have disabled it and have asked for suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support taking backup
> from standby. Also, there is a new feature recently committed for backup progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
> added via this commit ID: e65497df. For parallel backup to update these stats, a shared
> state will be required.

I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.

> Since multiple pg_basebackup can be running at the same time, maintaining a shared state
> can become a little complex, unless we disallow taking multiple parallel backups.

I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Kashif Zeeshan
Date:
Subject: Re: WIP/PoC for parallel backup
Next
From: Robert Haas
Date:
Subject: Re: WIP/PoC for parallel backup