Re: pg_basebackup + incremental base backups - Mailing list pgsql-general

From Stephen Frost
Subject Re: pg_basebackup + incremental base backups
Date
Msg-id 20200526142030.GD3418@tamriel.snowman.net
Whole thread Raw
In response to Re: pg_basebackup + incremental base backups  (Christopher Pereira <kripper@imatronix.cl>)
Responses Re: pg_basebackup + delta base backups  (Christopher Pereira <kripper@imatronix.cl>)
List pgsql-general
Greetings,

* Christopher Pereira (kripper@imatronix.cl) wrote:
> On 24-May-20 15:48, Stephen Frost wrote:
> >That really shouldn't be possible.  I'm very curious as to exactly what
> >happened that resulted in your primary/replica being 'out of sync', as
> >you say.
>
> Actually this was more a hypothetical question to find a solution in case
> some day one of our standby clusters goes out of sync and we have to rebuild
> it having a very big database.

"out of sync" is a bit of an odd concept, but having a replica fall
behind a long way is certainly something that can happen and may require
a rebuild from a backup (or from a new sync off of the primary in some
other way, as you suggest below).  In a situation where there's async
replication happening and you promote a replica to take over, that's
definitely a case where you might also have to rebuild the former
primary.

> With proper WAL archiving this shouldn't happen but we wanted to be prepared
> for this scenario just in case.

Sure, having WAL archiving and good backups is definitely my strong
recommendation for how to address any cases where the replica falls
behind, or the replica is promoted and you want to remaster the former
primary.

> We did some tests measuring IO and traffic and are very happy with the
> results. We will definitely be adding pgBackRest to our toolchain.

Glad to hear that.

> Regarding my initial question, I still believe that the world deserves a
> simple direct pg_basebackup replacement even when putting an additional
> "repo host" in the middle is a better idea in the long term.

Perhaps..

> As you said, all the pieces are there and it would be quite easy to write a
> new "pg_basebackup_delta" script that could be executed on the standby host
> to:
>
> 1) setup a pgBackRest repo on the primary host (via SSH)
>
> 2) create a backup on the primary host (via SSH)
>
> 3) do a delta restore on the standby
>
> Even when the repository on the primary host is only created temporarily
> (and require double storage, resources, etc), it may still be worth
> considering the traffic that can be saved by doing a delta restore on a
> standby host in a different region, right?

So...  There's actually a way to do this with pgbackrest, but it doesn't
support the delta capability.

We take care in pgbackrest to make the repo format for full backups
actually match exactly what a PG cluster would look like, specifically
because we wish to allow users to, if absolutely everything else fails
and pgbackrest is non-functional, rebuild from the repo.  To that point,
we even make sure that command-line tools like gzip, bzip2, openssl,
etc, work with the files we create.

For what you're asking about though, you would do something like:

- Set up a pgbackrest repo on the host you're rebuilding (the replica)
- Disable all repo compression, encryption, et al.
- Enable archive copy, so the WAL for the backup is put into the backup
- Enable parallel jobs
- Set up pgbackrest on the primary with the replica configured as the
  repo host, get WAL archiving working and such.
- Run create-stanza
- Perform a *full* backup, make sure everything works.

Once that's done, you go find the pg_data directory inside the full
backup that you made inside of the pgbackrest repo, copy/move/rsync
(with checksums enabled!) that to where your PG data directory should
be, set up a recovery.conf to point to the primary, maybe also set it up
with a restore_command pointing to that repo (may or may not be needed,
depending) and start up PG.  That should allow PG to start, replay all
of the necessary WAL, and then connect to the primary and start
streaming.

If you have any tablespaces, you'd need to deal with those too, of
course.  If you start moving things out of the pgbackrest repo, you're
corrupting it, naturally, so you wouldn't really want to continue using
it once you've gotten all of this done.

Again, this isn't an approach which I'd generally recommend...  We used
to do it when we were first writing pgbackrest with a ZFS filesystem and
after a backup we'd create a snapshot off of the repo and start PG right
up and then run pg_dump on it and do other checks to make sure it
worked, but have moved to using pgbackrest delta restores instead for
that kind of use-case.

Thanks,

Stephen

Attachment

pgsql-general by date:

Previous
From: Frank Millman
Date:
Subject: Re: Slow SELECT
Next
From: "Mike Klaas"
Date:
Subject: Re: Help understanding SIReadLock growing without bound oncompleted transaction