Home > mailing lists

Re: Infinite loop in XLogPageRead() on standby - Mailing list pgsql-hackers

From	Alexander Kukushkin
Subject	Re: Infinite loop in XLogPageRead() on standby
Date	February 29, 2024 16:36:29
Msg-id	CAFh8B=nPSERv7NyYHmjVXK4xK3va1XzU3-rhOswjgEZMWkV=RQ@mail.gmail.com Whole thread
In response to	Re: Infinite loop in XLogPageRead() on standby (Michael Paquier <michael@paquier.xyz>)
List	pgsql-hackers

Tree view

Hi Michael,

On Thu, 29 Feb 2024 at 06:05, Michael Paquier <michael@paquier.xyz> wrote:

Wow. Have you seen that in an actual production environment?

Yes, we see it regularly, and it is reproducible in test environments as well.

my $start_page = start_of_page($end_lsn);
my $wal_file = write_wal($primary, $TLI, $start_page,
"\x00" x $WAL_BLOCK_SIZE);
# copy the file we just "hacked" to the archive
copy($wal_file, $primary->archive_dir);

So you are emulating a failure by filling with zeros the second page
where the last emit_message() generated a record, and the page before
that includes the continuation record. Then abuse of WAL archiving to
force the replay of the last record. That's kind of cool.

Right, at this point it is easier than to cause an artificial crash on the primary after it finished writing just one page.

> To be honest, I don't know yet how to fix it nicely. I am thinking about
> returning XLREAD_FAIL from XLogPageRead() if it suddenly switched to a new
> timeline while trying to read a page and if this page is invalid.

Hmm. I suspect that you may be right on a TLI change when reading a
page. There are a bunch of side cases with continuation records and
header validation around XLogReaderValidatePageHeader(). Perhaps you
have an idea of patch to show your point?

Not yet, but hopefully I will get something done next week.

Nit. In your test, it seems to me that you should not call directly
set_standby_mode and enable_restoring, just rely on has_restoring with
the standby option included.

Thanks, I'll look into it.

Regards,

Alexander Kukushkin

pgsql-hackers by date:

From: Nathan Bossart
Date: 29 February 2024, 16:34:12
Subject: Re: Atomic ops for unlogged LSN

From: Dean Rasheed
Date: 29 February 2024, 16:37:28
Subject: Re: Supporting MERGE on updatable views

Re: Infinite loop in XLogPageRead() on standby - Mailing list pgsql-hackers

Previous

Next