Re: needless complexity in StartupXLOG - Mailing list pgsql-hackers

From Andres Freund
Subject Re: needless complexity in StartupXLOG
Date
Msg-id 20210728192800.zxlpa7df3t6i3guv@alap3.anarazel.de
Whole thread Raw
In response to needless complexity in StartupXLOG  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: needless complexity in StartupXLOG
Re: needless complexity in StartupXLOG
List pgsql-hackers
Hi,

On 2021-07-26 12:12:53 -0400, Robert Haas wrote:
> My first thought was that we should do the check unconditionally,
> rather than just when bgwriterLaunched && LocalPromoteIsTriggered, and
> ERROR if it fails. But then I wondered what the point of that would be
> exactly. If we have such a bug -- and to the best of my knowledge
> there's no evidence that we do -- there's no particular reason it
> should only happen at the end of recovery. It could happen any time
> the system -- or the user, or malicious space aliens -- remove files
> from pg_wal, and we have no real idea about the timing of malicious
> space alien activity, so doing the test here rather than anywhere else
> just seems like a shot in the dark.

Yea. The history of that code being added doesn't suggest that there was
a concrete issue being addressed, from what I can tell.


> So at the moment I am leaning toward the view that we should just
> remove this check entirely, as in the attached, proposed patch.

+1


> Really, I think we should consider going further. If it's safe to
> write an end-of-recovery record rather than a checkpoint, why not do
> so all the time?

+many. The current split doesn't make much sense. For one, it often is a huge
issue if crash recovery takes a long time - why should we incur the cost that
we are OK avoiding during promotions? For another, end-of-recovery is a
crucial path for correctness, reducing the number of non-trivial paths is
good.


> Imagine if instead of
> all the hairy logic we have now we just replaced this whole if
> (IsInRecovery) stanza with this:
> 
> if (InRecovery)
>     CreateEndOfRecoveryRecord();
> 
> That would be WAY easier to reason about than the rat's nest we have
> here today. Now, I am not sure what it would take to get there, but I
> think that is the direction we ought to be heading.

What are we going to do in the single user ([1]) case in this awesome future?
I guess we could just not create a checkpoint until single user mode is shut
down / creates a checkpoint for other reasons?


Greetings,

Andres Freund


[1] I really wish somebody had the energy to just remove single user and
bootstrap modes. The degree to which they increase complexity in the rest of
the system is entirely unreasonable. There's not actually any reason
bootstrapping can't happen with checkpointer et al running, it's just
autovacuum that'd need to be blocked.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Andres Freund
Date:
Subject: Re: Out-of-memory error reports in libpq