Re: Archive recovery won't be completed on some situation. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Archive recovery won't be completed on some situation.
Date
Msg-id 20140319.172806.193015541.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Archive recovery won't be completed on some situation.  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Archive recovery won't be completed on some situation.
Re: Archive recovery won't be completed on some situation.
List pgsql-hackers
Hello, thank you for suggestions.

The *problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions. 9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't. Finally, needed amendments per versions
are

9.4: Nothing more is needed (but resetting backup mode by    resetxlog is acceptable)

9.3: Can be recovered without resetting backup positions in    controlfile.  (but smarter with it)

9.2: Same to 9.3

9.1: Cannot be recoverd without directly resetting backup    position in controlfile.  Resetting feature is needed.


At Mon, 17 Mar 2014 15:59:09 +0200, Heikki Linnakangas wrote
> On 03/15/2014 05:59 PM, Fujii Masao wrote:
> > What about adding new option into pg_resetxlog so that we can
> > reset the pg_control's backup start location? Even after we've
> > accidentally entered into the situation that you described, we can
> > exit from that by resetting the backup start location in pg_control.
> > Also this option seems helpful to salvage the data as a last resort
> > from the corrupted backup.
> 
> Yeah, seems reasonable. After you run pg_resetxlog, there's no hope
> that the backup end record would arrive any time later. And if it
> does, it won't really do much good after you've reset the WAL.
> 
> We probably should just clear out the backup start/stop location
> always when you run pg_resetxlog. Your database is potentially broken
> if you reset the WAL before reaching consistency, but if forcibly do
> that with "pg_resetxlog -f", you've been warned.

Agreed. Attached patches do that and I could "recover" the
database state with following steps,

(1) Remove recovery.conf and do pg_resetxlog -bf   (the option name 'b' would be arguable)
(2) Start the server (with crash recovery)
(3) Stop the server (in any mode)
(4) Create recovery.conf and start the server with archive recovery.

Some annoyance in step 2 and 3 but I don't want to support the
pacemaker's in-a-sense broken sequence no further:(

This is alterable by the following steps suggested in Masao's
previous mail for 9.2 and alter, but 9.1 needs forcibly resetting
startBackupPoint.

At Sun, 16 Mar 2014 00:59:01 +0900, Fujii Masao wrote
> Though this is formal way, you can exit from that situation by
> 
> (1) Remove recovery.conf and start the server with crash recovery
> (2) Execute pg_start_backup() after crash recovery ends
> (3) Copy backup_label to somewhere
> (4) Execute pg_stop_backup() and shutdown the server
> (5) Copy backup_label back to $PGDATA
> (6) Create recovery.conf and start the server with archive recovery

This worked for 9.2, 9.3 and HEAD but failed for 9.1 at step 1.

| 2014-03-19 15:53:02.512 JST FATAL:  WAL ends before end of online backup
| 2014-03-19 15:53:02.512 JST HINT:  Online backup started with pg_start_backup() must be ended with pg_stop_backup(),
andall WAL up to that point must be available at recovery.
 

This seems inevitable.

| if (InRecovery &&
|     (XLByteLT(EndOfLog, minRecoveryPoint) ||
|      !XLogRecPtrIsInvalid(ControlFile->backupStartPoint)))
| {
...
|     /*
|      * Ran off end of WAL before reaching end-of-backup WAL record, or
|      * minRecoveryPoint.
|      */
|     if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
|         ereport(FATAL,
|                 (errmsg("WAL ends before end of online backup"),

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 28a4f19..7d9cf6d 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -85,6 +85,7 @@ main(int argc, char *argv[])    int            c;    bool        force = false;    bool
noupdate= false;
 
+    bool        resetbackuppos = false;    MultiXactId set_oldestmxid = 0;    char       *endptr;    char
*endptr2;
@@ -110,7 +111,7 @@ main(int argc, char *argv[])    }
-    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1)    {        switch (c)        {
@@ -122,6 +123,10 @@ main(int argc, char *argv[])                noupdate = true;                break;
+            case 'b':
+                resetbackuppos = true;
+                break;
+            case 'e':                set_xid_epoch = strtoul(optarg, &endptr, 0);                if (endptr == optarg
||*endptr != '\0')
 
@@ -350,6 +355,13 @@ main(int argc, char *argv[])        ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli;    }
+    if (resetbackuppos)
+    {
+        ControlFile.backupStartPoint = InvalidXLogRecPtr;
+        ControlFile.backupEndPoint = InvalidXLogRecPtr;
+        ControlFile.backupEndRequired = false;
+    }
+    if (minXlogSegNo > newXlogSegNo)        newXlogSegNo = minXlogSegNo;
@@ -1098,6 +1110,7 @@ usage(void)    printf(_("  -O OFFSET        set next multitransaction offset\n"));    printf(_("
-V,--version    output version information, then exit\n"));    printf(_("  -x XID           set next transaction
ID\n"));
+    printf(_("  -b               reset backup positions\n"));    printf(_("  -?, --help       show this help, then
exit\n"));   printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));} 
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index cd003f4..8b578c8 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -82,6 +82,7 @@ main(int argc, char *argv[])    int            c;    bool        force = false;    bool
noupdate= false;
 
+    bool        resetbackuppos = false;    uint32        set_xid_epoch = (uint32) -1;    TransactionId set_xid = 0;
Oid           set_oid = 0;
 
@@ -114,7 +115,7 @@ main(int argc, char *argv[])    }
-    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1)    {        switch (c)        {
@@ -126,6 +127,10 @@ main(int argc, char *argv[])                noupdate = true;                break;
+            case 'b':
+                resetbackuppos = true;
+                break;
+            case 'e':                set_xid_epoch = strtoul(optarg, &endptr, 0);                if (endptr == optarg
||*endptr != '\0')
 
@@ -347,6 +352,13 @@ main(int argc, char *argv[])        ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli;    }
+    if (resetbackuppos)
+    {
+        ControlFile.backupStartPoint = InvalidXLogRecPtr;
+        ControlFile.backupEndPoint = InvalidXLogRecPtr;
+        ControlFile.backupEndRequired = false;
+    }
+    if (minXlogSegNo > newXlogSegNo)        newXlogSegNo = minXlogSegNo;
@@ -1042,6 +1054,7 @@ usage(void)    printf(_("  -O OFFSET        set next multitransaction offset\n"));    printf(_("
-V,--version    output version information, then exit\n"));    printf(_("  -x XID           set next transaction
ID\n"));
+    printf(_("  -b               reset backup positions\n"));    printf(_("  -?, --help       show this help, then
exit\n"));   printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));} 
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 80e8268..149639b 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -82,6 +82,7 @@ main(int argc, char *argv[])    int            c;    bool        force = false;    bool
noupdate= false;
 
+    bool        resetbackuppos = false;    uint32        set_xid_epoch = (uint32) -1;    TransactionId set_xid = 0;
Oid           set_oid = 0;
 
@@ -115,7 +116,7 @@ main(int argc, char *argv[])    }
-    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1)    {        switch (c)        {
@@ -127,6 +128,10 @@ main(int argc, char *argv[])                noupdate = true;                break;
+            case 'b':
+                resetbackuppos = true;
+                break;
+            case 'e':                set_xid_epoch = strtoul(optarg, &endptr, 0);                if (endptr == optarg
||*endptr != '\0')
 
@@ -333,6 +338,15 @@ main(int argc, char *argv[])    if (minXlogTli > ControlFile.checkPointCopy.ThisTimeLineID)
ControlFile.checkPointCopy.ThisTimeLineID= minXlogTli;
 
+    if (resetbackuppos)
+    {
+        ControlFile.backupStartPoint.xlogid = 0;
+        ControlFile.backupStartPoint.xrecoff = 0;
+        ControlFile.backupEndPoint.xlogid = 0;
+        ControlFile.backupEndPoint.xrecoff = 0;
+        ControlFile.backupEndRequired = false;
+    }
+    if (minXlogId > newXlogId ||        (minXlogId == newXlogId &&         minXlogSeg > newXlogSeg))
@@ -1035,6 +1049,7 @@ usage(void)    printf(_("  -O OFFSET        set next multitransaction offset\n"));    printf(_("
-V,--version    output version information, then exit\n"));    printf(_("  -x XID           set next transaction
ID\n"));
+    printf(_("  -b               reset backup start position\n"));    printf(_("  -?, --help       show this help,
thenexit\n"));    printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));} 
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 54cc5b0..3ecfef8 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -82,6 +82,7 @@ main(int argc, char *argv[])    int            c;    bool        force = false;    bool
noupdate= false;
 
+    bool        resetbackuppos = false;    uint32        set_xid_epoch = (uint32) -1;    TransactionId set_xid = 0;
Oid           set_oid = 0;
 
@@ -115,7 +116,7 @@ main(int argc, char *argv[])    }
-    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+    while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1)    {        switch (c)        {
@@ -127,6 +128,10 @@ main(int argc, char *argv[])                noupdate = true;                break;
+            case 'b':
+                resetbackuppos = true;
+                break;
+            case 'e':                set_xid_epoch = strtoul(optarg, &endptr, 0);                if (endptr == optarg
||*endptr != '\0')
 
@@ -333,6 +338,12 @@ main(int argc, char *argv[])    if (minXlogTli > ControlFile.checkPointCopy.ThisTimeLineID)
ControlFile.checkPointCopy.ThisTimeLineID= minXlogTli;
 
+    if (resetbackuppos)
+    {
+        ControlFile.backupStartPoint.xlogid = 0;
+        ControlFile.backupStartPoint.xrecoff = 0;
+    }
+    if (minXlogId > newXlogId ||        (minXlogId == newXlogId &&         minXlogSeg > newXlogSeg))
@@ -1028,6 +1039,7 @@ usage(void)    printf(_("  -o OID          set next OID\n"));    printf(_("  -O OFFSET       set
nextmultitransaction offset\n"));    printf(_("  -x XID          set next transaction ID\n"));
 
+    printf(_("  -b              reset backup start position\n"));    printf(_("  --help          show this help, then
exit\n"));   printf(_("  --version       output version information, then exit\n"));    printf(_("\nReport bugs to
<pgsql-bugs@postgresql.org>.\n"));

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: pg_archivecleanup bug
Next
From: Pavel Stehule
Date:
Subject: Review: plpgsql.extra_warnings, plpgsql.extra_errors