Re: pgsql: Improve runtime and output of tests for replication slots checkp - Mailing list pgsql-committers

From Alexander Korotkov
Subject Re: pgsql: Improve runtime and output of tests for replication slots checkp
Date
Msg-id CAPpHfdurV-j_e0pb=UFENAy3tyzxfF+yHveNDNQk2gM82WBU5A@mail.gmail.com
Whole thread Raw
In response to pgsql: Improve runtime and output of tests for replication slots checkp  (Alexander Korotkov <akorotkov@postgresql.org>)
Responses Re: pgsql: Improve runtime and output of tests for replication slots checkp
List pgsql-committers
On Sat, Jun 21, 2025 at 2:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > And I see the following variable values.
>
> > (lldb) p/x targetPagePtr
> > (XLogRecPtr) 0x0000000029004000
> > (lldb) p/x RecPtr
> > (XLogRecPtr) 0x0000000029002138
>
> > I hardly understand how is this possible given it was compiled with "-O0".
> > I'm planning to continue investigating this tomorrow.
>
> Yeah, I see
>
> (lldb) p/x targetPagePtr
> (XLogRecPtr) 0x0000000029004000
> (lldb) p/x RecPtr
> (XLogRecPtr) 0x0000000029002138
> (lldb) p/x RecPtr - (RecPtr % 8192)
> (XLogRecPtr) 0x0000000029002000
>
> We're here:
>
>             /* Calculate pointer to beginning of next page */
>             targetPagePtr += XLOG_BLCKSZ;
>
>             /* Wait for the next page to become available */
>             readOff = ReadPageInternal(state, targetPagePtr,
>                                        Min(total_len - gotlen + SizeOfXLogShortPHD,
>                                            XLOG_BLCKSZ));
>
> so that's where the increment of targetPagePtr came from.
> But "Wait for the next page to become available" seems awfully
> trusting that there will be another page.  Should this be
> using the no-wait code path?

Thank you for the help.  It seems to me that problem is deeper.  The
code seems to only trying to read till the end of given WAL record,
but can't reach it.  According to the values I've seen in XLogCtl, it
seems that RedoRecPtr points somewhere inside of that record's body.
I don't feel confident about to understand what's going on and how to
fix it.

I've tried two things.
1) slot_tests_wait_for_checkpoint.patch
Make tests wait for checkpoint completion (as I think they were
originally intended).  However, the problem still persists.
2) revert_slot_last_saved_restart_lsn.patch
Revert ca307d5cec90 and make new tests reserve WAL using wal_keep_size
GUC.  The problem still persists.  It seems to be some problem
independent to my attempts to fix retaining WAL files with slot's
restart_lsn.  The new tests just spotted the existing bug.

------
Regards,
Alexander Korotkov
Supabase

Attachment

pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Doc: improve documentation about width_bucket().
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Improve runtime and output of tests for replication slots checkp