Re: [ADMIN] Bad recovery: no pg_xlog/RECOVERYXLOG - Mailing list pgsql-admin

From Stephen Frost
Subject Re: [ADMIN] Bad recovery: no pg_xlog/RECOVERYXLOG
Date
Msg-id 20171106133745.GA4628@tamriel.snowman.net
Whole thread Raw
In response to Re: [ADMIN] Bad recovery: no pg_xlog/RECOVERYXLOG  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: [ADMIN] Bad recovery: no pg_xlog/RECOVERYXLOG
List pgsql-admin
Mark,

* Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote:
> On 03/11/17 00:11, Stephen Frost wrote:
> >Sure, that'll work much of the time, but that's about like saying that
> >PG could run without fsync being enabled much of the time and everything
> >will be ok.  Both are accurate, but hopefully you'll agree that PG
> >really should always be run with fsync enabled.
>
> It is completely different - this is a 'straw man' argument, and
> justs serves to confuse this discussion.

I don't see it as any different at all.  The point I was trying to make
there is that there's a minimum requirement for backups, just as there
is for ACID compliance, and any solution needs to meet that minimum to
be considered.

> The crux of your argument seems to be concerning the synchronization
> between pg_basbackup finishing and being sure you have the required
> archive logs. Now just so we are all clear, when pg_basebackup ends
> it essentially calls do_pg_stop_backup (from xlog.c) which ensures
> that all required WAL files are archived, or to be precise here
> makes sure archive_command has been run successfully for each
> required WAL file.

pg_basebackup talks the replication protocol, to be clear, and sends a
BASE_BACKUP message, of which one of the options is 'NOWAIT' to indicate
if the server should wait until all of the WAL has been archived.
Typically, pg_basebackup does send a 'NOWAIT' to tell the server to not
hold up the final message until all of the WAL has been archived,
because it's handling the verification of the WAL having been archived.
In the unusual case that WAL isn't included with the pg_basebackup it
looks like it would wait for the archive_command to complete, which is
better than I had thought (and hadn't noticed on my first glance through
the code), though that does depend on a functional and perfect
archive_command, and there's no shortage of reasons for why that might
not be the case at the time the backup is happening.

That's an awful lot of action-at-a-distance hope for me to be
comfortable with, however.  A backup solution really does need to verify
that the WAL has been completely and reliably stored, as discussed in
the documentation, before claiming a backup is valid, and there's
basically no reason not to unless the tool you've chosen to use makes
that particularly difficult (even if not *technically* impossible, given
enough effort).  If your solution is built on the assumption that WAL
archiving is always working and there's no check happening during backup
to verify that you've got all the WAL then I have serious doubts about
it being reliable.  If you're independently monitoring that all WAL has
been archived, that's certainly helpful, but I don't consider that to be
a complete substitute for making sure that you've got all of the WAL for
a given backup.

> Your entire argument seems about whether said WAL is fsync'ed to
> disk, and how this is impossible to ensure in a shell script.
[...]
> So it is clearly *possible*.

Yes, it's possible, but it's not something I'd recommend doing and none
of your arguments have made me any more likely to recommend trying to
ensure a proper backup has completed using shell scripts.  What I fail
to understand is your insistence on it being a good idea.  I've seen
lots and lots of attempts at it, even made some myself, and have come to
the generally agreed upon conclusion that it's both a bad idea to hack
together your own backup solution for PG and that, even if you do want
to try, using shell scripts to attempt to accomplish it is a bad idea.
There's much better solutions out there which are really what folks
should be using.  I'm not against using pg_basebackup either, but if
you're using it, let it handle the archiving because it does verify that
all of the WAL has been archived properly.

> Actually I was helping him get a *reliable* backup system, I think
> you misunderstood how swift changes the picture compared to a single
> server/single disk design.

I do understand the goals of things like swift and s3 and the intent
behind them to provide a better store than local disks, and I'm not
against using them, to be clear, but they only address one of the
requirements that I outlined for a reliable backup solution.  I mention
both requirements consistently to, hopefully, ensure that those coming
along later to read these threads remember that it's more than just
making sure that you verify all the WAL has been archived during a
backup- but that they've been archived and actually fsync'd or written
out to reliable storage.

Thanks!

Stephen

pgsql-admin by date:

Previous
From: Günce Kaya
Date:
Subject: [ADMIN] Partitions
Next
From: chris kim
Date:
Subject: [ADMIN] Standby wal issue