Re: Missing important information in backup.sgml - Mailing list pgsql-docs
From | Gunnar \"Nick\" Bluth |
---|---|
Subject | Re: Missing important information in backup.sgml |
Date | |
Msg-id | d1b49a36-6e90-d8c6-6242-66d26ae32b6f@pro-open.de Whole thread Raw |
In response to | Re: Missing important information in backup.sgml (Stephen Frost <sfrost@snowman.net>) |
List | pgsql-docs |
Am 29.11.2016 um 14:43 schrieb Stephen Frost: > Greetings, > > * Gunnar "Nick" Bluth (gunnar.bluth@pro-open.de) wrote: >> Am 23.11.2016 um 21:41 schrieb Stephen Frost: >>> * Gunnar "Nick" Bluth (gunnar.bluth@pro-open.de) wrote: >>> One of the very important things that should be done as part of a backup >>> is to ensure that all of the archive files required to restore the >>> database to a consistent state are safely stored in the archive. If >>> that isn't done then it's possible that an incomplete archive may also >>> render backups invalid. >> >> Well, the need to have a complete archive is described in the docs >> already. Maybe the potential consequences of an incomplete archive >> should be pointed or more drastically...? > > We should probably add to "step 5" something about "verify that all WAL > files have been archived between the start and stop backup" or similar. +1 >> Now, the main purpose of my patch was to document a behaviour that many >> of us have run into, namely that FATAL error showing up in the log when >> the archive_command exits with RC > 127. It's a nuisance only, but it >> does send people on false tracks and should at least be mentioned in the >> documentation. > > I agree that we should add information to the documentation that certain > error codes are handled differently. > >> And since a couple of people does use rsync (or some wrappers around it) >> for archiving, and that is notoriously giving RCs > 127, it seems legit >> to at least mention it, no? > > The low-level API documentation should be focused on the API and not how > to (mis)use the API using common unix commands. > >> diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml >> index 6eaed1e..a8f574e 100644 >> --- a/doc/src/sgml/backup.sgml >> +++ b/doc/src/sgml/backup.sgml >> @@ -587,7 +587,8 @@ tar -cf backup.tar /usr/local/pgsql/data >> the administrator specify a shell command to be executed to copy a >> completed segment file to wherever it needs to go. The command could be >> as simple as a <literal>cp</>, or it could invoke a complex shell >> - script — it's all up to you. >> + script — it's all up to you. There however are some things to consider >> + when creating such a command, most of which are covered below. >> </para> > > If we're changing this then we should remove the notion that it could be > "as simple as a cp" because it really should *not* be. That is a > dangerous and very poor recommendation to be making to our users. That thought creeped on me as well... ;-) >> <para> >> @@ -636,7 +637,11 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0 >> <productname>PostgreSQL</> will assume that the file has been >> successfully archived, and will remove or recycle it. However, a nonzero >> status tells <productname>PostgreSQL</> that the file was not archived; >> - it will try again periodically until it succeeds. >> + it will try again periodically until it succeeds. Note that an exit >> + status of 128 or higher will cause the archiver to exit, resulting in a >> + <literal>FATAL</> error in the server log. It will be restarted by the >> + postmaster and continue where it left. E.g., <command>rsync</> is known >> + for returning exit statuses of 255 on network issues. >> </para> > > I agree with adding documentation about what happens with different exit > status values. I don't believe we should make any mention of rsync. A > sophisticated enough user to understand exit codes should be able to > work out what the exit code is for whatever tool or tools they're using, > it's not on us to document what the exit codes are for every possible > tool (nor should one even use something as simple as a bare cp or rsync > in archive_command). Fair enough. > >> <para> >> @@ -696,6 +701,16 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0 >> preserve the file name (<literal>%f</>). >> </para> >> >> + <para> >> + Depending on your specific requirements (and datacenter layout), you may >> + want to make sure that the archived WAL segments have been written out to >> + persistent storage before the <command>archive_command</> returns. >> + Otherwise, a WAL segment that is assumed by <productname>PostgreSQL</> >> + to be archived could be recycled or removed prematurely, rendering your >> + archive incomplete (and thus disabling a recovery) in case of an outage of >> + the archiving destination. >> + </para> > > I dislike the tone of this. The wording I would suggest would be more > along the lines of: > > The archive_command should only return success once the WAL segment has > been completely copied, written out, and synced to persistent storage as > PostgreSQL will recycle or remove the segment shortly after the > archive_command returns. If the WAL segment is lost due to an outage of > the archive server or other issue, any backup which was performed during > the time that the WAL segment was written will be unusable and > Point-In-Time-Recovery from an earlier backup will not be usable past > the missing WAL segment. Sounds good to me. I'll incorparate that and send -v6 soon! -- Gunnar "Nick" Bluth RHCE/SCLA Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de _____________________________________________________________ In 1984 mainstream users were choosing VMS over UNIX. Ten years later they are choosing Windows over UNIX. What part of that message aren't you getting? - Tom Payne
Attachment
pgsql-docs by date: