On Sat, Jun 21, 2025 at 2:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > And I see the following variable values.
>
> > (lldb) p/x targetPagePtr
> > (XLogRecPtr) 0x0000000029004000
> > (lldb) p/x RecPtr
> > (XLogRecPtr) 0x0000000029002138
>
> > I hardly understand how is this possible given it was compiled with "-O0".
> > I'm planning to continue investigating this tomorrow.
>
> Yeah, I see
>
> (lldb) p/x targetPagePtr
> (XLogRecPtr) 0x0000000029004000
> (lldb) p/x RecPtr
> (XLogRecPtr) 0x0000000029002138
> (lldb) p/x RecPtr - (RecPtr % 8192)
> (XLogRecPtr) 0x0000000029002000
>
> We're here:
>
> /* Calculate pointer to beginning of next page */
> targetPagePtr += XLOG_BLCKSZ;
>
> /* Wait for the next page to become available */
> readOff = ReadPageInternal(state, targetPagePtr,
> Min(total_len - gotlen + SizeOfXLogShortPHD,
> XLOG_BLCKSZ));
>
> so that's where the increment of targetPagePtr came from.
> But "Wait for the next page to become available" seems awfully
> trusting that there will be another page. Should this be
> using the no-wait code path?
Thank you for the help. It seems to me that problem is deeper. The
code seems to only trying to read till the end of given WAL record,
but can't reach it. According to the values I've seen in XLogCtl, it
seems that RedoRecPtr points somewhere inside of that record's body.
I don't feel confident about to understand what's going on and how to
fix it.
I've tried two things.
1) slot_tests_wait_for_checkpoint.patch
Make tests wait for checkpoint completion (as I think they were
originally intended). However, the problem still persists.
2) revert_slot_last_saved_restart_lsn.patch
Revert ca307d5cec90 and make new tests reserve WAL using wal_keep_size
GUC. The problem still persists. It seems to be some problem
independent to my attempts to fix retaining WAL files with slot's
restart_lsn. The new tests just spotted the existing bug.
------
Regards,
Alexander Korotkov
Supabase