Re: pg_stop_backup does not complete - Mailing list pgsql-hackers

From Greg Smith
Subject Re: pg_stop_backup does not complete
Date
Msg-id 4B85D41A.1070904@2ndquadrant.com
Whole thread Raw
In response to Re: pg_stop_backup does not complete  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_stop_backup does not complete
Re: pg_stop_backup does not complete
List pgsql-hackers
Tom Lane wrote:
> The value of the HINT I think would be to make them (a) not afraid to
> hit control-C and (b) aware of the fact that their archiver has got
> a problem.
>
Agreed on both points.  Patch attached that implements something similar
to Josh's wording, tweaking the original warning too.  Here's what it
looks like when you run into the bad situation (which I easily simulated
with "archive_command='/bin/false'") from the client's perspective:

gsmith@meddle:~/pgwork/src/master/src$ psql -c "select
pg_start_backup('test')"
 pg_start_backup
-----------------
 0/5000020
(1 row)

gsmith@meddle:~/pgwork/src/master/src$ psql
psql (9.0devel)
Type "help" for help.

gsmith=# select pg_stop_backup();
NOTICE:  pg_stop_backup cleanup done, waiting for required segments to
archive
WARNING:  pg_stop_backup still waiting for all required segments to
archive (60 seconds elapsed)
HINT:  Confirm your archive_command is executing successfully.
pg_stop_backup can be aborted safely, but the resulting backup will not
be usable.
^CCancel request sent
ERROR:  canceling statement due to user request

And this is the sort of thing that shows up in the logs with default
logging behavior while all this is happening; you don't see the NOTICE,
but the WARNING and HINT are both there which I think is good:

LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: /bin/false
WARNING:  transaction log file "000000010000000000000000" could not be
archived: too many failures
WARNING:  pg_stop_backup still waiting for all required segments to
archive (60 seconds elapsed)
HINT:  Confirm your archive_command is executing successfully.
pg_stop_backup can be aborted safely, but the resulting backup will not
be usable.

Does this solve the logging side of this?  You can still make a case for
a more forceful pg_stop_backup, this seems to at least remove much of
the mystery and frustration from the whole exercise.  This patch plus a
little documentation suggesting how to recover from this issue might be
enough.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ca088b0..c09ede9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8125,6 +8125,9 @@ pg_stop_backup(PG_FUNCTION_ARGS)
     BackupHistoryFileName(histfilename, ThisTimeLineID, _logId, _logSeg,
                           startpoint.xrecoff % XLogSegSize);

+    ereport(NOTICE,
+            (errmsg("pg_stop_backup cleanup done, waiting for required segments to archive")));
+
     seconds_before_warning = 60;
     waits = 0;

@@ -8139,8 +8142,10 @@ pg_stop_backup(PG_FUNCTION_ARGS)
         {
             seconds_before_warning *= 2;        /* This wraps in >10 years... */
             ereport(WARNING,
-                    (errmsg("pg_stop_backup still waiting for archive to complete (%d seconds elapsed)",
-                            waits)));
+                    (errmsg("pg_stop_backup still waiting for all required segments to archive (%d seconds elapsed)",
+                            waits),
+                      errhint("Confirm your archive_command is executing successfully.  "
+                             "pg_stop_backup can be aborted safely, but the resulting backup will not be usable.")));
         }
     }


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: testing cvs HEAD - HS/SR - xlog timeline 0 pg_xlogfile_name_offset
Next
From: Josh Berkus
Date:
Subject: Re: pg_stop_backup does not complete