Thread: possible bug in handling of contrecords in dd38ff28ad (Fix recovery_prefetch with low maintenance_io_concurrency)

Hi,

I think there's some sort of bug in how dd38ff28ad deals with
contrecords. Consider something as simple as

  pgbench -i -s 100

and then doing pg_waldump on the WAL segments, I get this for every
single one:

  pg_waldump: error: error in WAL record at 0/1FFFF98: missing
                     contrecord at 0/1FFFFE0

This only happens since dd38ff28ad, and revert makes it disappear.

It's possible we still have some issue with contrecords, but IIUC we
fixed those. So unless there's some unknown one (and considering this
seems to happen for *every* WAL segment that's hard to believe), this
seems more like an issue in the error detection.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Sun, Jul 2, 2023 at 1:40 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I think there's some sort of bug in how dd38ff28ad deals with
> contrecords. Consider something as simple as
>
>   pgbench -i -s 100
>
> and then doing pg_waldump on the WAL segments, I get this for every
> single one:
>
>   pg_waldump: error: error in WAL record at 0/1FFFF98: missing
>                      contrecord at 0/1FFFFE0
>
> This only happens since dd38ff28ad, and revert makes it disappear.
>
> It's possible we still have some issue with contrecords, but IIUC we
> fixed those. So unless there's some unknown one (and considering this
> seems to happen for *every* WAL segment that's hard to believe), this
> seems more like an issue in the error detection.

Yeah.  That message is due to this part of dd38ff28ad's change:

    Also add an explicit error message for missing contrecords.  It was a
    bit strange that we didn't report an error already, and became a latent
    bug with prefetching, since the internal state that tracks aborted
    contrecords would not survive retrying, as revealed by
    026_overwrite_contrecord.pl with this adjustment.  Reporting an error
    prevents that.

We can change 'missing contrecord' back to silent end-of-decoding (as
it was in 14) with the attached.  Here [1] is some analysis of this
error that I wrote last year.  The reason for my hesitation in pushing
a fix was something like this:

1.  For pg_waldump, it's "you told me to decode only this WAL segment,
so decoding failed here", which is useless noise
2.  For walsender, it's "you told me to shut down, so decoding failed
here", which is useless noise
3.  For crash recovery, "I ran out of data, so decoding failed here",
which seems like a report-worthy condition, I think?

When I added that new error I was thinking about that third case.  We
generally expect to detect the end of WAL replay after a crash with an
error ("invalid record length ...: wanted 24, got 0" + several other
possibilities), but in this rare case it would be silent.  The effect
on the first two cases is cosmetic, but certainly annoying.  Perhaps I
should go ahead and back-patch the attached change, and then we can
discuss whether/how we should do a better job of distinguishing "user
requested artificial end of decoding" from "unexpectedly ran out of
data" for v17?

[1] https://www.postgresql.org/message-id/CA+hUKG+WKsZpdoryeqM8_rk5uQPCqS2HGY92WiMGFsK2wVkcig@mail.gmail.com

Attachment

On 7/2/23 04:09, Thomas Munro wrote:
> On Sun, Jul 2, 2023 at 1:40 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> I think there's some sort of bug in how dd38ff28ad deals with
>> contrecords. Consider something as simple as
>>
>>   pgbench -i -s 100
>>
>> and then doing pg_waldump on the WAL segments, I get this for every
>> single one:
>>
>>   pg_waldump: error: error in WAL record at 0/1FFFF98: missing
>>                      contrecord at 0/1FFFFE0
>>
>> This only happens since dd38ff28ad, and revert makes it disappear.
>>
>> It's possible we still have some issue with contrecords, but IIUC we
>> fixed those. So unless there's some unknown one (and considering this
>> seems to happen for *every* WAL segment that's hard to believe), this
>> seems more like an issue in the error detection.
> 
> Yeah.  That message is due to this part of dd38ff28ad's change:
> 
>     Also add an explicit error message for missing contrecords.  It was a
>     bit strange that we didn't report an error already, and became a latent
>     bug with prefetching, since the internal state that tracks aborted
>     contrecords would not survive retrying, as revealed by
>     026_overwrite_contrecord.pl with this adjustment.  Reporting an error
>     prevents that.
> 
> We can change 'missing contrecord' back to silent end-of-decoding (as
> it was in 14) with the attached.  Here [1] is some analysis of this
> error that I wrote last year.  The reason for my hesitation in pushing
> a fix was something like this:
> 
> 1.  For pg_waldump, it's "you told me to decode only this WAL segment,
> so decoding failed here", which is useless noise
> 2.  For walsender, it's "you told me to shut down, so decoding failed
> here", which is useless noise
> 3.  For crash recovery, "I ran out of data, so decoding failed here",
> which seems like a report-worthy condition, I think?
> 
> When I added that new error I was thinking about that third case.  We
> generally expect to detect the end of WAL replay after a crash with an
> error ("invalid record length ...: wanted 24, got 0" + several other
> possibilities), but in this rare case it would be silent.  The effect
> on the first two cases is cosmetic, but certainly annoying.  Perhaps I
> should go ahead and back-patch the attached change, and then we can
> discuss whether/how we should do a better job of distinguishing "user
> requested artificial end of decoding" from "unexpectedly ran out of
> data" for v17?
> 

Yeah, I think that'd be sensible. IMHO we have a habit of scaring users
with stuff that might be dangerous/bad, but 99% of the time it's
actually fine and perhaps even expected. It's almost as if we're
conditioning people to ignore errors.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, Jul 3, 2023 at 6:12 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 7/2/23 04:09, Thomas Munro wrote:
> > When I added that new error I was thinking about that third case.  We
> > generally expect to detect the end of WAL replay after a crash with an
> > error ("invalid record length ...: wanted 24, got 0" + several other
> > possibilities), but in this rare case it would be silent.  The effect
> > on the first two cases is cosmetic, but certainly annoying.  Perhaps I
> > should go ahead and back-patch the attached change, and then we can
> > discuss whether/how we should do a better job of distinguishing "user
> > requested artificial end of decoding" from "unexpectedly ran out of
> > data" for v17?
> >
>
> Yeah, I think that'd be sensible. IMHO we have a habit of scaring users
> with stuff that might be dangerous/bad, but 99% of the time it's
> actually fine and perhaps even expected. It's almost as if we're
> conditioning people to ignore errors.

Done.

There is CF #2490 "Make message at end-of-recovery less scary".
Perhaps we should think about how to classify this type of failure in
the context of that proposal.