Re: Archive recovery won't be completed on some situation. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Archive recovery won't be completed on some situation. |
Date | |
Msg-id | 20140319.172806.193015541.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Archive recovery won't be completed on some situation. (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Archive recovery won't be completed on some situation.
Re: Archive recovery won't be completed on some situation. |
List | pgsql-hackers |
Hello, thank you for suggestions. The *problematic* operation sequence I saw was performed by pgsql-RA/Pacemaker. It stops a server already with immediate mode and starts the Master as a Standby at first, then promote. Focusing on this situation, there would be reasonable to reset backup positions. 9.4 canceles backup mode even on immediate shutdown so the operation causes no problem, but 9.3 and before are doesn't. Finally, needed amendments per versions are 9.4: Nothing more is needed (but resetting backup mode by resetxlog is acceptable) 9.3: Can be recovered without resetting backup positions in controlfile. (but smarter with it) 9.2: Same to 9.3 9.1: Cannot be recoverd without directly resetting backup position in controlfile. Resetting feature is needed. At Mon, 17 Mar 2014 15:59:09 +0200, Heikki Linnakangas wrote > On 03/15/2014 05:59 PM, Fujii Masao wrote: > > What about adding new option into pg_resetxlog so that we can > > reset the pg_control's backup start location? Even after we've > > accidentally entered into the situation that you described, we can > > exit from that by resetting the backup start location in pg_control. > > Also this option seems helpful to salvage the data as a last resort > > from the corrupted backup. > > Yeah, seems reasonable. After you run pg_resetxlog, there's no hope > that the backup end record would arrive any time later. And if it > does, it won't really do much good after you've reset the WAL. > > We probably should just clear out the backup start/stop location > always when you run pg_resetxlog. Your database is potentially broken > if you reset the WAL before reaching consistency, but if forcibly do > that with "pg_resetxlog -f", you've been warned. Agreed. Attached patches do that and I could "recover" the database state with following steps, (1) Remove recovery.conf and do pg_resetxlog -bf (the option name 'b' would be arguable) (2) Start the server (with crash recovery) (3) Stop the server (in any mode) (4) Create recovery.conf and start the server with archive recovery. Some annoyance in step 2 and 3 but I don't want to support the pacemaker's in-a-sense broken sequence no further:( This is alterable by the following steps suggested in Masao's previous mail for 9.2 and alter, but 9.1 needs forcibly resetting startBackupPoint. At Sun, 16 Mar 2014 00:59:01 +0900, Fujii Masao wrote > Though this is formal way, you can exit from that situation by > > (1) Remove recovery.conf and start the server with crash recovery > (2) Execute pg_start_backup() after crash recovery ends > (3) Copy backup_label to somewhere > (4) Execute pg_stop_backup() and shutdown the server > (5) Copy backup_label back to $PGDATA > (6) Create recovery.conf and start the server with archive recovery This worked for 9.2, 9.3 and HEAD but failed for 9.1 at step 1. | 2014-03-19 15:53:02.512 JST FATAL: WAL ends before end of online backup | 2014-03-19 15:53:02.512 JST HINT: Online backup started with pg_start_backup() must be ended with pg_stop_backup(), andall WAL up to that point must be available at recovery. This seems inevitable. | if (InRecovery && | (XLByteLT(EndOfLog, minRecoveryPoint) || | !XLogRecPtrIsInvalid(ControlFile->backupStartPoint))) | { ... | /* | * Ran off end of WAL before reaching end-of-backup WAL record, or | * minRecoveryPoint. | */ | if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) | ereport(FATAL, | (errmsg("WAL ends before end of online backup"), regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c index 28a4f19..7d9cf6d 100644 --- a/src/bin/pg_resetxlog/pg_resetxlog.c +++ b/src/bin/pg_resetxlog/pg_resetxlog.c @@ -85,6 +85,7 @@ main(int argc, char *argv[]) int c; bool force = false; bool noupdate= false; + bool resetbackuppos = false; MultiXactId set_oldestmxid = 0; char *endptr; char *endptr2; @@ -110,7 +111,7 @@ main(int argc, char *argv[]) } - while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1) + while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1) { switch (c) { @@ -122,6 +123,10 @@ main(int argc, char *argv[]) noupdate = true; break; + case 'b': + resetbackuppos = true; + break; + case 'e': set_xid_epoch = strtoul(optarg, &endptr, 0); if (endptr == optarg ||*endptr != '\0') @@ -350,6 +355,13 @@ main(int argc, char *argv[]) ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli; } + if (resetbackuppos) + { + ControlFile.backupStartPoint = InvalidXLogRecPtr; + ControlFile.backupEndPoint = InvalidXLogRecPtr; + ControlFile.backupEndRequired = false; + } + if (minXlogSegNo > newXlogSegNo) newXlogSegNo = minXlogSegNo; @@ -1098,6 +1110,7 @@ usage(void) printf(_(" -O OFFSET set next multitransaction offset\n")); printf(_(" -V,--version output version information, then exit\n")); printf(_(" -x XID set next transaction ID\n")); + printf(_(" -b reset backup positions\n")); printf(_(" -?, --help show this help, then exit\n")); printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));} diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c index cd003f4..8b578c8 100644 --- a/src/bin/pg_resetxlog/pg_resetxlog.c +++ b/src/bin/pg_resetxlog/pg_resetxlog.c @@ -82,6 +82,7 @@ main(int argc, char *argv[]) int c; bool force = false; bool noupdate= false; + bool resetbackuppos = false; uint32 set_xid_epoch = (uint32) -1; TransactionId set_xid = 0; Oid set_oid = 0; @@ -114,7 +115,7 @@ main(int argc, char *argv[]) } - while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1) + while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1) { switch (c) { @@ -126,6 +127,10 @@ main(int argc, char *argv[]) noupdate = true; break; + case 'b': + resetbackuppos = true; + break; + case 'e': set_xid_epoch = strtoul(optarg, &endptr, 0); if (endptr == optarg ||*endptr != '\0') @@ -347,6 +352,13 @@ main(int argc, char *argv[]) ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli; } + if (resetbackuppos) + { + ControlFile.backupStartPoint = InvalidXLogRecPtr; + ControlFile.backupEndPoint = InvalidXLogRecPtr; + ControlFile.backupEndRequired = false; + } + if (minXlogSegNo > newXlogSegNo) newXlogSegNo = minXlogSegNo; @@ -1042,6 +1054,7 @@ usage(void) printf(_(" -O OFFSET set next multitransaction offset\n")); printf(_(" -V,--version output version information, then exit\n")); printf(_(" -x XID set next transaction ID\n")); + printf(_(" -b reset backup positions\n")); printf(_(" -?, --help show this help, then exit\n")); printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));} diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c index 80e8268..149639b 100644 --- a/src/bin/pg_resetxlog/pg_resetxlog.c +++ b/src/bin/pg_resetxlog/pg_resetxlog.c @@ -82,6 +82,7 @@ main(int argc, char *argv[]) int c; bool force = false; bool noupdate= false; + bool resetbackuppos = false; uint32 set_xid_epoch = (uint32) -1; TransactionId set_xid = 0; Oid set_oid = 0; @@ -115,7 +116,7 @@ main(int argc, char *argv[]) } - while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1) + while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1) { switch (c) { @@ -127,6 +128,10 @@ main(int argc, char *argv[]) noupdate = true; break; + case 'b': + resetbackuppos = true; + break; + case 'e': set_xid_epoch = strtoul(optarg, &endptr, 0); if (endptr == optarg ||*endptr != '\0') @@ -333,6 +338,15 @@ main(int argc, char *argv[]) if (minXlogTli > ControlFile.checkPointCopy.ThisTimeLineID) ControlFile.checkPointCopy.ThisTimeLineID= minXlogTli; + if (resetbackuppos) + { + ControlFile.backupStartPoint.xlogid = 0; + ControlFile.backupStartPoint.xrecoff = 0; + ControlFile.backupEndPoint.xlogid = 0; + ControlFile.backupEndPoint.xrecoff = 0; + ControlFile.backupEndRequired = false; + } + if (minXlogId > newXlogId || (minXlogId == newXlogId && minXlogSeg > newXlogSeg)) @@ -1035,6 +1049,7 @@ usage(void) printf(_(" -O OFFSET set next multitransaction offset\n")); printf(_(" -V,--version output version information, then exit\n")); printf(_(" -x XID set next transaction ID\n")); + printf(_(" -b reset backup start position\n")); printf(_(" -?, --help show this help, thenexit\n")); printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));} diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c index 54cc5b0..3ecfef8 100644 --- a/src/bin/pg_resetxlog/pg_resetxlog.c +++ b/src/bin/pg_resetxlog/pg_resetxlog.c @@ -82,6 +82,7 @@ main(int argc, char *argv[]) int c; bool force = false; bool noupdate= false; + bool resetbackuppos = false; uint32 set_xid_epoch = (uint32) -1; TransactionId set_xid = 0; Oid set_oid = 0; @@ -115,7 +116,7 @@ main(int argc, char *argv[]) } - while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1) + while ((c = getopt(argc, argv, "fl:m:no:O:x:e:b")) != -1) { switch (c) { @@ -127,6 +128,10 @@ main(int argc, char *argv[]) noupdate = true; break; + case 'b': + resetbackuppos = true; + break; + case 'e': set_xid_epoch = strtoul(optarg, &endptr, 0); if (endptr == optarg ||*endptr != '\0') @@ -333,6 +338,12 @@ main(int argc, char *argv[]) if (minXlogTli > ControlFile.checkPointCopy.ThisTimeLineID) ControlFile.checkPointCopy.ThisTimeLineID= minXlogTli; + if (resetbackuppos) + { + ControlFile.backupStartPoint.xlogid = 0; + ControlFile.backupStartPoint.xrecoff = 0; + } + if (minXlogId > newXlogId || (minXlogId == newXlogId && minXlogSeg > newXlogSeg)) @@ -1028,6 +1039,7 @@ usage(void) printf(_(" -o OID set next OID\n")); printf(_(" -O OFFSET set nextmultitransaction offset\n")); printf(_(" -x XID set next transaction ID\n")); + printf(_(" -b reset backup start position\n")); printf(_(" --help show this help, then exit\n")); printf(_(" --version output version information, then exit\n")); printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
pgsql-hackers by date: