Attached is the rebase version on top of the latest master head
includes refactoring patches posted by Robert.
On Thu, Jul 29, 2021 at 9:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 28, 2021 at 7:33 AM Amul Sul <sulamul@gmail.com> wrote:
> > I was too worried about how I could miss that & after thinking more
> > about that, I realized that the operation for ArchiveRecoveryRequested
> > is never going to be skipped in the startup process and that never
> > left for the checkpoint process to do that later. That is the reason
> > that assert was added there.
> >
> > When ArchiveRecoveryRequested, the server will no longer be in
> > the wal prohibited mode, we implicitly change the state to
> > wal-permitted. Here is the snip from the 0003 patch:
>
> Ugh, OK. That makes sense, but I'm still not sure that I like it. I've
> kind of been wondering: why not have XLogAcceptWrites() be the
> responsibility of the checkpointer all the time, in every case? That
> would require fixing some more things, and this is one of them, but
> then it would be consistent, which means that any bugs would be likely
> to get found and fixed. If calling XLogAcceptWrites() from the
> checkpointer is some funny case that only happens when the system
> crashes while WAL is prohibited, then we might fail to notice that we
> have a bug.
>
Unfortunately, I didn't get much time to think about this and don't
have a strong opinion on it either.
> This is especially true given that we have very little test coverage
> in this area. Andres was ranting to me about this earlier this week,
> and I wasn't sure he was right, but then I noticed that we have
> exactly zero tests in the entire source tree that make use of
> recovery_end_command. We really need a TAP test for that, I think.
> It's too scary to do much reorganization of the code without having
> any tests at all for the stuff we're moving around. Likewise, we're
> going to need TAP tests for the stuff that is specific to this patch.
> For example, we should have a test that crashes the server while it's
> read only, brings it back up, checks that we still can't write WAL,
> then re-enables WAL, and checks that we now can write WAL. There are
> probably a bunch of other things that we should test, too.
>
Yes, my next plan is to work on the TAP tests and look into the patch
posted by Prabhat to improve test coverage.
Regards,
Amul Sul