Re: pg_basebackup + incremental base backups - Mailing list pgsql-general

From Stephen Frost
Subject Re: pg_basebackup + incremental base backups
Date
Msg-id 20200521215306.GF3418@tamriel.snowman.net
Whole thread Raw
In response to pg_basebackup + incremental base backups  (Christopher Pereira <kripper@imatronix.cl>)
List pgsql-general
Greetings,

* Christopher Pereira (kripper@imatronix.cl) wrote:
> On 21-May-20 08:43, Stephen Frost wrote:
> >* Christopher Pereira (kripper@imatronix.cl) wrote:
> >>Is there some way to rebuild the standby cluster by doing a differential
> >>backup of the primary cluster directly?
> >We've contemplated adding support for something like this to pgbackrest,
> >since all the pieces are there, but there hasn't been a lot of demand
> >for it and it kind of goes against the idea of having a proper backup
> >solution, really..  It'd also create quite a bit of load on the primary
> >to checksum all the files to do the comparison against what's on the
> >replica that you're trying to update, so not something you'd probably
> >want to do a lot more than necessary.
>
> We have backups of the whole server and only need a efficient way to rebuild
> the hot-standby cluster when pg_rewind is not able to do so.

Personally, I find myself more confident in what pgbackrest does to
remaster a former primary (using a delta restore), but a lot of that
really comes down to the question of: why did the primary fail?  If you
don't know that, I really wouldn't recommend using pg_rewind.

> I agree with your concerns about the increased load on the primary server,
> but this rebuilding process would only be done in case of emergency or
> during low load hours.
>
> pg_basebackup works fine but does not support differential/incremental
> backups which is a blocker.

pg_basebackup is missing an awful lot of other things- managing of
backup rotation, WAL expiration, the ability to parallelize, encryption
support, ability to push backups/fetch backups to/from cloud storage
solutions, ability to resume from failed backups, delta restore (which
is more-or-less what you're asking for), parallel archiving/fetching of
WAL..

> Do you know any alternative software that is able to rebuild the standby PG
> data dir using rsync or similar while the primary is still online?

pgbackrest can certainly rebuild the standby, if you're using it for
backups, and do so very quickly thanks to delta restore and it's
parallelism.  I'm not aware of anything that does exactly what you're
looking for.

> It seems a simple pg_start_backup + rsync + pg_stop_backup (maybe combined
> with a LVM snapshot) would do, but we would prefer to use some existing
> tool.

I'd strongly recommend that you use an existing tool, there's an awful
lot of complications and you absolutely can *not* use rsync for that
unless you are doing it with checksums enabled, and even then it's
complicated- you probably don't want to sync across unlogged tables but
it's not easy to exclude those, or temp files/tables, you have to make
sure to manage the WAL properly, ensure that the appropriate information
makes it into the backup_label (you shouldn't be using exclusive backup
because a reboot of the primary at the wrong time will result in PG not
starting up on the primary...), etc, etc.

> We just tried barman, but it also seems to require a restore from the backup
> before being able to start the standby server (?), and we are afraid this
> would require double storage, IO and time for rebuilding the standby
> cluster.

I really think you should reconsider whatever backup solution you're
using today and rather than keeping it independent, make it part of the
solution to rebuilding replicas.

Maybe it isn't clear, so I'll try to explain- pgbackrest, if you use it
for your backups, will be able to restore over top of an existing PG
cluster, updating only those files which are different from what's in
the backup (based on checksums that it calculates), and is able to do so
in parallel, and then you can replay WAL from your pgbackrest repo,
right up until the replica is able to reconnect to the primary and
resume replaying WAL.  It's a pretty common approach and is supported by
HA solutions like patroni.

Thanks,

Stephen

Attachment

pgsql-general by date:

Previous
From: Ravi Krishna
Date:
Subject: Re: Table partitioning for cloud service?
Next
From: Adrian Klaver
Date:
Subject: Re: Table partitioning for cloud service?