Home > mailing lists

pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages - Mailing list pgsql-hackers

From	vignesh C
Subject	pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages
Date	June 25 19:49:55
Msg-id	CALDaNm34m36PDHzsU_GdcNXU0gLTfFY5rzh9GSQv=w6B+QVNRQ@mail.gmail.com Whole thread Raw
List	pgsql-hackers

Tree view

Hi,

Buildfarm identified one issue at [1] where it could not identify a
partial WAL record spanning across 2 pages was written due to
immediate shutdown.
Consider a scenario where a WAL record is split across multiple WAL
pages. If the server crashes before the entire WAL record is written,
the recovery process detects the incomplete (broken) record. In such
cases, a special flag, XLP_FIRST_IS_OVERWRITE_CONTRECORD, is set in
the page header to indicate this condition. When we attempt to
retrieve changes using a logical replication slot, the system reads
WAL data based on the record header. In one specific instance, a total
of 8133 bytes were required to reconstruct the WAL record. Of this,
277 bytes were expected to be present in the subsequent WAL page.
However, only 248 bytes were available, because no new WAL records had
been generated as there were no operations performed on the system
after the crash. As a result, the slot_get_changes function
continuously checks whether the current WAL flush position has reached
0x29004115 (the offset corresponding to the required 277 bytes in the
next WAL page). Since no new WAL is being generated, the flush
position always returns 0x290040F8 (the 248-byte offset), causing the
function to wait indefinitely at read_local_xlog_page_guts function.

Currently, the logic attempts to read the complete WAL record based on
the size obtained before the crash—even though only a partial record
was written. It then checks the page header to determine whether the
XLP_FIRST_IS_OVERWRITE_CONTRECORD flag is set only after reading the
complete WAL record at XLogDecodeNextRecord function, but since that
much WAL data was not available in the system we never get a chance to
check the header after this.. To address this issue, a more robust
approach would be to first read the page header, check if the
XLP_FIRST_IS_OVERWRITE_CONTRECORD flag is set, and only then proceed
to read the WAL record size if the record is not marked as a partial
overwrite. This would prevent the system from waiting for WAL data
that will never arrive. Attached partial_wal_record_fix.patch patch
for this.

I don't have a consistent test to reproduce this issue, currently this
issue can be reproduced by running the 046_checkpoint_logical_slot
test about 50 times. This test 046_checkpoint_logical_slot was
reverted recently after it caused a few buildfarm failures discussed
at [2]. The same test is also attached as
reverted_test_046_checkpoint_logical_slot.patch.

Added members who were involved in the discussion of this issue.

Thoughts?

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2025-06-19+23%3A47%3A08
[2] - https://www.postgresql.org/message-id/CAPpHfdsO9s5he3xHWNFtwvXtvsftu3nNz%3DPV9fdN32UOh-Z3tA%40mail.gmail.com

Regards,
Vignesh

Attachment

pgsql-hackers by date:

From: Tom Lane
Date: 25 June, 19:28:02
Subject: Re: Unnecessary scan from non-overlapping range predicates

From: Pavel Stehule
Date: 25 June, 20:33:32
Subject: Re: proposal: schema variables

pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages - Mailing list pgsql-hackers

Attachment

Previous

Next