Re: when the startup process doesn't - Mailing list pgsql-hackers

From Andres Freund
Subject Re: when the startup process doesn't
Date
Msg-id 20210421193605.cubavrq7q5bqkf4i@alap3.anarazel.de
Whole thread Raw
In response to Re: when the startup process doesn't  (Stephen Frost <sfrost@snowman.net>)
Responses Re: when the startup process doesn't  (Stephen Frost <sfrost@snowman.net>)
Re: when the startup process doesn't  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
List pgsql-hackers
Hi,

On 2021-04-21 14:36:24 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > Unfortunately I think something like a percentage is hard to calculate
> > right now.  Even just looking at crash recovery (vs replication or
> > PITR), we don't currently know where the WAL ends without reading all
> > the WAL. The easiest thing to return would be something in LSNs or
> > bytes and I suspect that we don't want to expose either unauthenticated?
> 
> While it obviously wouldn't be exactly accurate, I wonder if we couldn't
> just look at the WAL files we have to reply and then guess that we'll go
> through about half of them before we reach the end..?  I mean, wouldn't
> exactly be the first time that a percentage progress report wasn't
> completely accurate. :)

I don't think that'd work well, due to WAL segment recycling. We rename
WAL files into place when removing them, and sometimes that can be a
*lot* of files. It's one thing for there to be a ~20% inaccuracy in
estimated amount of work, another to have misestimates on the order of
magnitudes.



> > I wonder if we ought to occasionally update something like
> > ControlFileData->minRecoveryPoint on primaries, similar to what we do on
> > standbys? Then we could actually calculate a percentage, and it'd have
> > the added advantage of allowing to detect more cases where the end of
> > the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
> > adding a lot of fsyncs, but that seems doable?
> 
> This seems to go against Tom's concerns wrt rewriting pg_control.

I don't think that concern equally applies for what I am proposing
here. For one, we already have minRecoveryPoint in ControlData, and we
already use it for the purpose of determining where we need to recover
to, albeit only during crash recovery. Imo that's substantially
different from adding actual recovery progress status information to the
control file.

I also think that it'd actually be a significant reliability improvement
if we maintained an approximate minRecoveryPoint during normal running:
I've seen way too many cases where WAL files were lost / removed and
crash recovery just started up happily. Only hitting problems months
down the line. Yes, it'd obviously not bullet proof, since we'd not want
to add a significant stream of new fsyncs, but IME such WAL files
lost/removed issues tend not to be about a few hundred bytes of WAL but
many segments missing.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ondřej Žižka
Date:
Subject: Re: Synchronous commit behavior during network outage
Next
From: Stephen Frost
Date:
Subject: Re: when the startup process doesn't