Re: Make mesage at end-of-recovery less scary. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Make mesage at end-of-recovery less scary.
Date
Msg-id CA+TgmoZBqiN5VpuE6E0iCU_P+r8yo-EKuedx2RNr0Uz_cBHfSg@mail.gmail.com
Whole thread Raw
In response to Re: Make mesage at end-of-recovery less scary.  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: Make mesage at end-of-recovery less scary.  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Wed, Mar 25, 2020 at 8:53 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> HINT:  This is to be expected if this is the end of the WAL.  Otherwise,
> it could indicate corruption.

First, I agree that this general issue is a problem, because it's come
up for me in quite a number of customer situations. Either people get
scared when they shouldn't, because the message is innocuous, or they
don't get scared about other things that actually are scary, because
if some scary-looking messages are actually innocuous, it can lead
people to believe that the same is true in other cases.

Second, I don't really like the particular formulation you have above,
because the user still doesn't know whether or not to be scared. Can
we figure that out? I think if we're in crash recovery, I think that
we should not be scared, because we have no alternative to assuming
that we've reached the end of WAL, so all crash recoveries will end
like this. If we're in archive recovery, we should definitely be
scared if we haven't yet reached the minimum recovery point, because
more WAL than that should certainly exist. After that, it depends on
how we got the WAL. If it's being streamed, the question is whether
we've reached the end of what got streamed. If it's being copied from
the archive, we ought to have the whole segment, but maybe not more.
Can we get the right context to the point where the error is being
reported to know whether we hit the error at the end of the WAL that
was streamed? If not, can we somehow rejigger things so that we only
make it sound scary if we keep getting stuck at the same point when we
woud've expected to make progress meanwhile?

I'm just spitballing here, but it would be really good if there's a
way to know definitely whether or not you should be scared. Corrupted
WAL segments are definitely a thing that happens, but retries are a
lot more common.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: backup manifests
Next
From: Vik Fearing
Date:
Subject: Re: proposal \gcsv