Re: Split xlog.c - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Split xlog.c |
Date | |
Msg-id | CA+Tgmoag1Tzw+aYB+px5QBG35kPXX-QbVqbtiae3L6o=VrUHig@mail.gmail.com Whole thread Raw |
In response to | Re: Split xlog.c (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Split xlog.c
|
List | pgsql-hackers |
On Thu, Sep 16, 2021 at 4:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Here is another rebase. Like probably everyone else who has an opinion on the topic, I like the idea of splitting xlog.c. I don't have a fully formed opinion on the changes yet, but it seems to be a surprisingly equal split, which seems good. Since I just spent a bunch of time being frustrated by ThisTimeLineID, I'm pleased to see that the giant amount of code that moves to xlogrecovery.c apparently ends up not needing that global variable, which I think is excellent. Perhaps the amount of code that needs that global variable can be further reduced in the future, maybe even to zero. I think that the small reorderings that you mention in your original post are the scary part: if we do stuff in a different order, maybe things won't work. In the rest of this email I'm going to try to go through and analyze that. I think it might have been a bit easier if you'd outlined the things you moved and the reasons why you thought that was OK; as it is, I have to reverse-engineer it. But I'd like to see this go forward, either as-is or with whatever modifications seem to be needed, so I'm going to give it a try. - RelationCacheInitFileRemove() moves later. The code over which it moves seems to include sanity checks and initializations of various bits of in-memory state, but nothing that touches anything on disk. Therefore I don't see how this can break anything. I also agree that the new placement of the call is more logical than the old one, since in the current code it's kind of in the middle of a bunch of things that, as your patch highlights, are really all about initializing WAL recovery, and this is a separate kind of a thing. Post-patch, it ends up near where we initialize a bunch of other subsystems. Cool. - Some logic to (a) sanity-check the control file's REDO pointer, (b) set InRecovery = true, and (c) update various bits of control file state in memory has been moved substantially earlier. The actual update of the control file on disk stays where it was before. At least on first reading, I don't really like this. On the one hand, I don't see a reason why it's necessary prerequisite for splitting xlog.c. On the other hand, it seems a bit dangerous. There's now ~8 calls to functions in other modules between the time you change things in memory and the time that you call UpdateControlFile(). Perhaps none of those functions can call anything that might in turn call UpdateControlFile() but I don't know why we should take the chance. Is there some advantage to having the in-memory state out of sync with the on-disk state across all that code? - Renaming backup_label and tablespace_map to .old is now done slightly earlier, just before pg_reset_all() and adjusting our notion of the minimum recovery point rather than just after. Seems OK. - The rm_startup() functions are now called later, only once we're sure that we have a WAL record to apply. Seems fine; slightly more efficient. Looks like the functions in question are just arranging to set up private memory contexts for the AMs that want them for WAL replay, so they won't care if we skip that in some corner cases where there's nothing to replay. - ResetUnloggedRelations(UNLOGGED_RELATION_INIT) is moved later. We'll now do a few minor bookkeeping tasks like setting EndOfLog and EndOfLogTLI first, and we'll also now check whether we reached the minimum recovery point OK before doing this. This appears to me to be a clear improvement, since checking whether the minimum recovery point has been reached is fast, and resetting unlogged relations might be slow, and is pointless if we're just going to error out. - The recoveryWakeupLatch is disowned far later than before. I can't see why this would hurt anything, but my first inclination was to prefer the existing placement of the call. We're only going to wait on the latch while applying WAL, and the existing code seems to release it fairly promptly after it's done applying WAL, which seems to make sense. On the other hand, I can see that your intent was (I believe, anyway) to group it together with shutting down the xlog reader and removing RECOVERYXLOG and RECOVERYHISTORY, and there doesn't seem to be anything wrong with that idea. - The code to clear InArchiveRecovery and close the WAL segment we had open moves earlier. I think it might be possible to fail Assert(InArchiveRecovery), because where you've moved this code, we haven't yet verified that we reached the minimum recovery point. See the comment which begins "It's possible that archive recovery was requested, but we don't know how far we need to replay the WAL before we reach consistency." What if we reach that point, then fail the big hairy if-test and don't set InArchiveRecovery = true? In that case, we can still do it later, in ReadRecord. But maybe that will never happen. Actually it's not entirely clear to me that the assertion is bulletproof even where it is right now, but moving it earlier makes me even less confident. Possibly I just don't understand this well enough. It's a little tempting, too, to see if you could somehow consolidate the two places that do if (readFile >= 0) { close(readFile); readFile = -1 } down to one. - getRecoveryStopReason() is now called earlier than before, and is now called whether or not ArchiveRecoveryRequested. This seems to just move the point of initialization further from the point of use to no real advantage, and I also think that the function is only designed to do something useful for archive recovery, so calling it in other cases just seems confusing. - RECOVERYXLOG and RECOVERYHISTORY are now removed later than before. It's now the last thing that happens before we enabled WAL writes. Doesn't seem like it should hurt anything. - The "archive recovery complete" message is now logged after rather than before writing and archiving a timeline history file. I think that's likely an improvement. That's all I have on 0001. Is this kind of review helpful? Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: