Re: Switching timeline over streaming replication - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Switching timeline over streaming replication |
Date | |
Msg-id | 50C2108E.9020103@vmware.com Whole thread Raw |
In response to | Re: Switching timeline over streaming replication (Amit Kapila <amit.kapila@huawei.com>) |
Responses |
Re: Switching timeline over streaming replication
Re: Switching timeline over streaming replication Re: Switching timeline over streaming replication |
List | pgsql-hackers |
On 06.12.2012 15:39, Amit Kapila wrote: > On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote: >> On 05.12.2012 14:32, Amit Kapila wrote: >>> On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote: >>>> After some diversions to fix bugs and refactor existing code, I've >>>> committed a couple of small parts of this patch, which just add some >>>> sanity checks to notice incorrect PITR scenarios. Here's a new >>>> version of the main patch based on current HEAD. >>> >>> After testing with the new patch, the following problems are observed. >>> >>> Defect - 1: >>> >>> 1. start primary A >>> 2. start standby B following A >>> 3. start cascade standby C following B. >>> 4. start another standby D following C. >>> 5. Promote standby B. >>> 6. After successful time line switch in cascade standby C& D, >> stop D. >>> 7. Restart D, Startup is successful and connecting to standby C. >>> 8. Stop C. >>> 9. Restart C, startup is failing. >> >> Ok, the error I get in that scenario is: >> >> C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not >> contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05 >> 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit >> code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to >> startup process failure >> > >> >> That mismatch causes the error. I'd like to fix this by always treating >> the checkpoint record to be part of the new timeline. That feels more >> correct. The most straightforward way to implement that would be to peek >> at the xlog record before updating replayEndRecPtr and replayEndTLI. If >> it's a checkpoint record that changes TLI, set replayEndTLI to the new >> timeline before calling the redo-function. But it's a bit of a >> modularity violation to peek into the record like that. >> >> Or we could just revert the sanity check at beginning of recovery that >> throws the "requested timeline 2 does not contain minimum recovery point >> 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint >> record that says "unexpected timeline ID %u in checkpoint record, before >> reaching minimum recovery point %X/%X on timeline %u" checks basically >> the same thing, but at a later stage. However, the way >> minRecoveryPointTLI is updated still seems wrong to me, so I'd like to >> fix that. >> >> I'm thinking of something like the attached (with some more comments >> before committing). Thoughts? > > This has fixed the problem reported. > However, I am not able to think will there be any problem if we remove check > "requested timeline 2 does not contain minimum recovery point >> 0/3023F08 on timeline 1" at beginning of recovery and just update > replayEndTLI with ThisTimeLineID? Well, it seems wrong for the control file to contain a situation like this: pg_control version number: 932 Catalog version number: 201211281 Database system identifier: 5819228770976387006 Database cluster state: shut down in recovery pg_control last modified: pe 7. joulukuuta 2012 17.39.57 Latest checkpoint location: 0/3023EA8 Prior checkpoint location: 0/2000060 Latest checkpoint's REDO location: 0/3023EA8 Latest checkpoint's REDO WAL file: 000000020000000000000003 Latest checkpoint's TimeLineID: 2 ... Time of latest checkpoint: pe 7. joulukuuta 2012 17.39.49 Min recovery ending location: 0/3023F08 Min recovery ending loc's timeline: 1 Note the latest checkpoint location and its TimelineID, and compare them with the min recovery ending location. The min recovery ending location is ahead of latest checkpoint's location; the min recovery ending location actually points to the end of the checkpoint record. But how come the min recovery ending location's timeline is 1, while the checkpoint record's timeline is 2. Now maybe that would happen to work if remove the sanity check, but it still seems horribly confusing. I'm afraid that discrepancy will come back to haunt us later if we leave it like that. So I'd like to fix that. Mulling over this for some more, I propose the attached patch. With the patch, we peek into the checkpoint record, and actually perform the timeline switch (by changing ThisTimeLineID) before replaying it. That way the checkpoint record is really considered to be on the new timeline for all purposes. At the moment, the only difference that makes in practice is that we set replayEndTLI, and thus minRecoveryPointTLI, to the new TLI, but it feels logically more correct to do it that way. - Heikki
Attachment
pgsql-hackers by date: