Home > mailing lists

Re: logical decoding and replication of sequences - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: logical decoding and replication of sequences
Date	August 8, 2022 06:15:46
Msg-id	CA+hUKGLptJc_k=cEqGtyRb-2pyF++uUMNi-+VEhrUV6pmezJmg@mail.gmail.com Whole thread Raw
In response to	Re: logical decoding and replication of sequences (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: logical decoding and replication of sequences Re: logical decoding and replication of sequences
List	pgsql-hackers

Tree view

On Mon, Aug 8, 2022 at 9:09 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Thanks for the repro patch and bisection work.  Looking...

I don't have the complete explanation yet, but it's something like
this.  We hit the following branch in xlogrecovery.c...

        if (StandbyMode &&
                !XLogReaderValidatePageHeader(xlogreader,
targetPagePtr, readBuf))
        {
                /*
                 * Emit this error right now then retry this page
immediately. Use
                 * errmsg_internal() because the message was already translated.
                 */
                if (xlogreader->errormsg_buf[0])
                        ereport(emode_for_corrupt_record(emode,
xlogreader->EndRecPtr),
                                        (errmsg_internal("%s",
xlogreader->errormsg_buf)));

                /* reset any error XLogReaderValidatePageHeader()
might have set */
                xlogreader->errormsg_buf[0] = '\0';
                goto next_record_is_invalid;
        }

... but, even though there was a (suppressed) error, nothing
invalidates the reader's page buffer.  Normally,
XLogReadValidatePageHeader() failure or any other kind of error
encountered by xlogreader.c'd decoding logic would do that, but here
the read_page callback is directly calling the header validation.
Without prefetching, that doesn't seem to matter, but reading ahead
can cause us to have the problem page in our buffer at the wrong time,
and then not re-read it when we should.  Or something like that.

The attached patch that simply moves the cache invalidation into
report_invalid_record(), so that it's reached by the above code and
everything else that reports an error, seems to fix the problem in
src/bin/pg_ctl/t/003_promote.pl with Noah's spanner-in-the-works patch
applied, and passes check-world without it.  I need to look at this
some more, though, and figure out if it's the right fix.

Attachment

0001-WIP-Fix-XLogPageRead-cache-invalidation.patch

pgsql-hackers by date:

From: Kyotaro Horiguchi
Date: 08 August 2022, 06:12:08
Subject: Re: Patch to address creation of PgStat* contexts with null parent context

From: Bharath Rupireddy
Date: 08 August 2022, 06:29:19
Subject: Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

Re: logical decoding and replication of sequences - Mailing list pgsql-hackers

Attachment

Previous

Next