Re: Race condition in recovery? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Race condition in recovery? |
Date | |
Msg-id | CA+TgmoZcfxEFyxZYkwoiQpq6y602gdoYw4_zeRiiP=jo7fqd2g@mail.gmail.com Whole thread Raw |
In response to | Re: Race condition in recovery? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Race condition in recovery?
|
List | pgsql-hackers |
On Fri, May 21, 2021 at 12:52 PM Robert Haas <robertmhaas@gmail.com> wrote: > I had trouble following it completely, but I didn't really spot > anything that seemed definitely wrong. However, I don't understand > what it has to do with where we are now. What I want to understand is: > under exactly what circumstances does it matter that > WaitForWALToBecomeAvailable(), when currentSource == XLOG_FROM_STREAM, > will stream from receiveTLI rather than recoveryTargetTLI? Ah ha! I think I figured it out. To hit this bug, you need to meet the following conditions: 1. Both streaming and archiving have to be configured. 2. You have to promote a new primary. 3. After promoting the new primary you have to start a new standby that doesn't have local WAL and for which the backup was taken from the previous timeline. In Dilip's original scenario, this new standby is actually the old primary, but that's not required. 4. The new standby has to be able to find the history file it needs in the archive but not the WAL files. 5. The new standby needs to have recovery_target_timeline='latest' (which is the default) When you start the new standby, it will fetch the current TLI from its control file. Then, since recovery_target_timeline=latest, the system will try to figure out the latest timeline, which only works because archiving is configured. There seems to be no provision for detecting the latest timeline via streaming. With archiving enabled, though, findNewestTimeLine() will be able to restore the history file created by the promotion of the new primary, which will cause validateRecoveryParameters() to change recoveryTargetTLI. Then we'll try to read the WAL segment containing the checkpoint record and fail because, by stipulation, only history files are available from the archive. Now, because streaming is also configured, we'll try streaming. That will work, so we'll be able to read the checkpoint record, but now, because WaitForWALToBecomeAvailable() initialized expectedTLEs using receiveTLI instead of recoveryTargetTLI, we can't switch to the correct timeline and it all goes wrong. The attached test script, test.sh seems to reliably reproduce this. Put that file and the recalcitrant_cp script, also attached, into an empty directory, cd to that directory, and run test.sh. Afterwards examine pgcascade.log. Basically, these scripts just set up the scenario described above. We set up primary and a standby that use recalcitrant_cp as the archive command, and because it's recalcitrant, it's only willing to copy history files, and always fails for WAL files.Then we create a cascading standby by taking a base backup from the standby, but before actually starting it, we promote the original standby. So now it meets all the conditions described above. I tried a couple variants of this test. If I switch the archive command from recalcitrant_cp to just regular cp, then there's no problem. And if I switch it to something that always fails, then there's also no problem. That's because, with either of those changes, condition (4) above is no longer met. In the first case, both files end up in the archive, and in the second case, neither file. What about hitting this in real life, with a real archive command? Well, you'd probably need the archive command to be kind of slow and get unlucky on the timing, but there's nothing to prevent it from happening. But, it will be WAY more likely if you have Dilip's original scenario, where you try to repurpose an old primary as a standby. It would normally be unlikely that the backup used to create a new standby would have an older TLI, because you typically wouldn't switch masters in between taking a base backup and using it to create a new standby. But the old master always has an older TLI. So (3) is satisfied. For (4) to be satisfied, you need the old master to fail to archive all of its WAL when it shuts down. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: