Re: needless complexity in StartupXLOG - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: needless complexity in StartupXLOG |
Date | |
Msg-id | CA+TgmoaKy9RRm-zaeFF=UCwR7J=R0xEbVvQ-WLR=0mAbOZ2P+A@mail.gmail.com Whole thread Raw |
In response to | Re: needless complexity in StartupXLOG (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: needless complexity in StartupXLOG
Re: needless complexity in StartupXLOG |
List | pgsql-hackers |
On Mon, Jul 26, 2021 at 1:32 PM Stephen Frost <sfrost@snowman.net> wrote: > Yeah, tend to agree with this too ... but something I find a bit curious > is the comment: > > * Insert a special WAL record to mark the end of > * recovery, since we aren't doing a checkpoint. > > ... immediately after setting promoted = true, and then at the end of > StartupXLOG() having: > > if (promoted) > RequestCheckpoint(CHECKPOINT_FORCE); > > maybe I'm missing something, but seems like that comment isn't being > terribly clear. Perhaps we aren't doing a full checkpoint *there*, but > sure looks like we're going to do one moments later regardless of > anything else since we've set promoted to true..? Yep. So it's a question of whether we allow operations that might write WAL in the meantime. When we write the checkpoint record right here, there can't be any WAL from the new server lifetime until the checkpoint completes. When we write an end-of-recovery record, there can. And there could actually be quite a bit, because if we do the checkpoint right in this section of code, it will be a fast checkpoint, whereas in the code you quoted above, it's a spread checkpoint, which takes a lot longer. So the question is whether it's reasonable to give the checkpoint some time to complete or whether it needs to be completed right now. > Agreed that simpler logic is better, provided it's correct logic, of > course. Finding better ways to test all of this would be really nice. Yeah, and there again, it's a lot easier to test if we don't have as many cases. Now no single change is going to fix that, but the number of flag variables in xlog.c is simply bonkers. This particular stretch of code uses 3 of them to even decide whether to attempt the test in question, and all of those are set in complex ways depending on the values of still more flag variables. The comments where bgwriterLaunched is set claim that we only do this during archive recovery, not crash recovery, but the code depends on the value of ArchiveRecoveryRequested, not InArchiveRecovery. So I wonder if we can't get the bgwriter to run even during crash recovery in the scenario described by: * It's possible that archive recovery was requested, but we don't * know how far we need to replay the WAL before we reach consistency. * This can happen for example if a base backup is taken from a * running server using an atomic filesystem snapshot, without calling * pg_start/stop_backup. Or if you just kill a running primary server * and put it into archive recovery by creating a recovery signal * file. If we ran the bgwriter all the time during crash recovery, we'd know for sure whether that causes any problems. If we only do it like this in certain corner cases, it's much more likely that we have bugs. Grumble, grumble. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: