Re: Split xlog.c - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Split xlog.c
Date
Msg-id CA+Tgmoag1Tzw+aYB+px5QBG35kPXX-QbVqbtiae3L6o=VrUHig@mail.gmail.com
Whole thread Raw
In response to Re: Split xlog.c  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Split xlog.c
List pgsql-hackers
On Thu, Sep 16, 2021 at 4:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Here is another rebase.

Like probably everyone else who has an opinion on the topic, I like
the idea of splitting xlog.c. I don't have a fully formed opinion on
the changes yet, but it seems to be a surprisingly equal split, which
seems good. Since I just spent a bunch of time being frustrated by
ThisTimeLineID, I'm pleased to see that the giant amount of code that
moves to xlogrecovery.c apparently ends up not needing that global
variable, which I think is excellent. Perhaps the amount of code that
needs that global variable can be further reduced in the future, maybe
even to zero.

I think that the small reorderings that you mention in your original
post are the scary part: if we do stuff in a different order, maybe
things won't work. In the rest of this email I'm going to try to go
through and analyze that. I think it might have been a bit easier if
you'd outlined the things you moved and the reasons why you thought
that was OK; as it is, I have to reverse-engineer it. But I'd like to
see this go forward, either as-is or with whatever modifications seem
to be needed, so I'm going to give it a try.

- RelationCacheInitFileRemove() moves later. The code over which it
moves seems to include sanity checks and initializations of various
bits of in-memory state, but nothing that touches anything on disk.
Therefore I don't see how this can break anything. I also agree that
the new placement of the call is more logical than the old one, since
in the current code it's kind of in the middle of a bunch of things
that, as your patch highlights, are really all about initializing WAL
recovery, and this is a separate kind of a thing. Post-patch, it ends
up near where we initialize a bunch of other subsystems. Cool.

- Some logic to (a) sanity-check the control file's REDO pointer, (b)
set InRecovery = true, and (c) update various bits of control file
state in memory has been moved substantially earlier. The actual
update of the control file on disk stays where it was before. At least
on first reading, I don't really like this. On the one hand, I don't
see a reason why it's necessary prerequisite for splitting xlog.c. On
the other hand, it seems a bit dangerous. There's now ~8 calls to
functions in other modules between the time you change things in
memory and the time that you call UpdateControlFile(). Perhaps none of
those functions can call anything that might in turn call
UpdateControlFile() but I don't know why we should take the chance. Is
there some advantage to having the in-memory state out of sync with
the on-disk state across all that code?

- Renaming backup_label and tablespace_map to .old is now done
slightly earlier, just before pg_reset_all() and adjusting our notion
of the minimum recovery point rather than just after. Seems OK.

- The rm_startup() functions are now called later, only once we're
sure that we have a WAL record to apply. Seems fine; slightly more
efficient. Looks like the functions in question are just arranging to
set up private memory contexts for the AMs that want them for WAL
replay, so they won't care if we skip that in some corner cases where
there's nothing to replay.

- ResetUnloggedRelations(UNLOGGED_RELATION_INIT) is moved later. We'll
now do a few minor bookkeeping tasks like setting EndOfLog and
EndOfLogTLI first, and we'll also now check whether we reached the
minimum recovery point OK before doing this. This appears to me to be
a clear improvement, since checking whether the minimum recovery point
has been reached is fast, and resetting unlogged relations might be
slow, and is pointless if we're just going to error out.

- The recoveryWakeupLatch is disowned far later than before. I can't
see why this would hurt anything, but my first inclination was to
prefer the existing placement of the call. We're only going to wait on
the latch while applying WAL, and the existing code seems to release
it fairly promptly after it's done applying WAL, which seems to make
sense. On the other hand, I can see that your intent was (I believe,
anyway) to group it together with shutting down the xlog reader and
removing RECOVERYXLOG and RECOVERYHISTORY, and there doesn't seem to
be anything wrong with that idea.

- The code to clear InArchiveRecovery and close the WAL segment we had
open moves earlier. I think it might be possible to fail
Assert(InArchiveRecovery), because where you've moved this code, we
haven't yet verified that we reached the minimum recovery point. See
the comment which begins "It's possible that archive recovery was
requested, but we don't know how far we need to replay the WAL before
we reach consistency." What if we reach that point, then fail the big
hairy if-test and don't set InArchiveRecovery = true? In that case, we
can still do it later, in ReadRecord. But maybe that will never
happen. Actually it's not entirely clear to me that the assertion is
bulletproof even where it is right now, but moving it earlier makes me
even less confident. Possibly I just don't understand this well
enough.

It's a little tempting, too, to see if you could somehow consolidate
the two places that do if (readFile >= 0) { close(readFile); readFile
= -1 } down to one.

- getRecoveryStopReason() is now called earlier than before, and is
now called whether or not ArchiveRecoveryRequested. This seems to just
move the point of initialization further from the point of use to no
real advantage, and I also think that the function is only designed to
do something useful for archive recovery, so calling it in other cases
just seems confusing.

- RECOVERYXLOG and RECOVERYHISTORY are now removed later than before.
It's now the last thing that happens before we enabled WAL writes.
Doesn't seem like it should hurt anything.

- The "archive recovery complete" message is now logged after rather
than before writing and archiving a timeline history file. I think
that's likely an improvement.

That's all I have on 0001. Is this kind of review helpful?

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Extending amcheck to check toast size and compression
Next
From: Jeff Davis
Date:
Subject: Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)