Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages - Mailing list pgsql-hackers
| From | Alexander Korotkov |
|---|---|
| Subject | Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages |
| Date | |
| Msg-id | CAPpHfdtguXBVnCF=oFsWeFGa7AdG0XnnofcLXLTBOiMHAOFyrQ@mail.gmail.com Whole thread Raw |
| In response to | Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages
|
| List | pgsql-hackers |
On Sat, Jul 19, 2025 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > I went trough the patchset. Everything looks good to me. I only did
> > some improvements to comments and commit messages. I'm going to push
> > this if no objections.
>
> There's apparently something wrong in the v17 branch, as three
> separate buildfarm members have now hit timeout failures in
> 046_checkpoint_logical_slot.pl [1][2][3]. I tried to reproduce
> this locally, and didn't have much luck initially. However,
> if I build with a configuration similar to grassquit's, it
> will hang up maybe one time in ten:
>
> export
ASAN_OPTIONS='print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0:detect_stack_use_after_return=0'
>
> export UBSAN_OPTIONS='print_stacktrace=1:disable_coredump=0:abort_on_error=1'
>
> ./configure ... usual flags plus ... CFLAGS='-O1 -ggdb -g3 -fno-omit-frame-pointer -Wall -Wextra
-Wno-unused-parameter-Wno-sign-compare -Wno-missing-field-initializers -fsanitize=address -fno-sanitize-recover=all'
--enable-injection-points
>
> The fact that 046_checkpoint_logical_slot.pl is skipped in
> non-injection-point builds is probably reducing the number
> of buildfarm failures, since only a minority of animals
> have that turned on yet.
>
> I don't see anything obviously wrong in the test changes, and the
> postmaster log from the failures looks pretty clearly like what is
> hanging up is the pg_logical_slot_get_changes call:
>
> 2025-07-19 16:10:07.276 CEST [3458309][client backend][0/2:0] LOG: statement: select count(*) from
pg_logical_slot_get_changes('slot_logical',null, null);
> 2025-07-19 16:10:07.278 CEST [3458309][client backend][0/2:0] LOG: starting logical decoding for slot "slot_logical"
> 2025-07-19 16:10:07.278 CEST [3458309][client backend][0/2:0] DETAIL: Streaming transactions committing after
0/290000F8,reading WAL from 0/1540F40.
> 2025-07-19 16:10:07.278 CEST [3458309][client backend][0/2:0] STATEMENT: select count(*) from
pg_logical_slot_get_changes('slot_logical',null, null);
> 2025-07-19 16:10:07.278 CEST [3458309][client backend][0/2:0] LOG: logical decoding found consistent point at
0/1540F40
> 2025-07-19 16:10:07.278 CEST [3458309][client backend][0/2:0] DETAIL: There are no running transactions.
> 2025-07-19 16:10:07.278 CEST [3458309][client backend][0/2:0] STATEMENT: select count(*) from
pg_logical_slot_get_changes('slot_logical',null, null);
> 2025-07-19 16:59:56.828 CEST [3458140][postmaster][:0] LOG: received immediate shutdown request
> 2025-07-19 16:59:56.841 CEST [3458309][client backend][0/2:0] LOG: could not send data to client: Broken pipe
> 2025-07-19 16:59:56.841 CEST [3458309][client backend][0/2:0] STATEMENT: select count(*) from
pg_logical_slot_get_changes('slot_logical',null, null);
> 2025-07-19 16:59:56.851 CEST [3458140][postmaster][:0] LOG: database system is shut down
>
> So my impression is that the bug is not reliably fixed in 17.
>
> One other interesting thing is that once it's hung, the test does
> not stop after PG_TEST_TIMEOUT_DEFAULT elapses. You can see
> above that olingo took nearly 50 minutes to give up, and in
> manual testing it doesn't seem to stop either (though I've not
> got the patience to wait 50 minutes...)
Thank you for pointing!
Apparently I've backpatched d3917d8f13e7 everywhere but not in
REL_17_STABLE. Will be fixed now.
------
Regards,
Alexander Korotkov
Supabase
pgsql-hackers by date: