Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Alexander Lakhin
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id 72bd036d-4f2a-8d50-b56e-6b1e3b9ba0a9@gmail.com
Whole thread Raw
In response to BUG #17928: Standby fails to decode WAL on termination of primary  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #17928: Standby fails to decode WAL on termination of primary
List pgsql-bugs
11.05.2023 11:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17928
> ...
> `git bisect` for this behavior blames 3f1ce9734 (where
> XLogDecodeNextRecord() -> XLogReadRecordAlloc() call was introduced).
>
> A reproducer for the anomaly to follow.
The TAP test that demonstrates the issue is attached. To catch the failure
faster, I place it in multiple directories src/test/recoveryX/t, add
minimal Makefiles, and run (on tmpfs):
for ((i=1;i<=10;i++)); do echo "iteration $i"; NO_TEMP_INSTALL=1 parallel --halt now,fail=1 -j7 --linebuffer --tag make

-s check -C src/test/{} ::: recovery1 recovery2 recovery3 recovery4 recovery5 recovery6 recovery7 || break; done

iteration 1
recovery1       +++ tap check in src/test/recovery1 +++
recovery2       +++ tap check in src/test/recovery2 +++
recovery3       +++ tap check in src/test/recovery3 +++
recovery4       +++ tap check in src/test/recovery4 +++
recovery5       +++ tap check in src/test/recovery5 +++
recovery6       +++ tap check in src/test/recovery6 +++
recovery7       +++ tap check in src/test/recovery7 +++
...
recovery5       # Restarting primary instance (49)
recovery3       # Restarting primary instance (49)
recovery7       # Restarting primary instance (49)
recovery2       Bailout called.  Further testing stopped:  pg_ctl stop failed
recovery2       FAILED--Further testing stopped: pg_ctl stop failed
recovery2       make: *** [Makefile:6: check] Error 255
parallel: This job failed:
make -s check -C src/test/recovery2

tail src/test/recovery2/tmp_check/log/099_restart_with_stanby_standby.log
2023-05-11 20:19:22.247 MSK [2046385] DETAIL:  End of WAL reached on timeline 1 at 3/64BDFF8.
2023-05-11 20:19:22.247 MSK [2046385] FATAL:  could not send end-of-streaming message to primary: server closed the 
connection unexpectedly
                 This probably means the server terminated abnormally
                 before or while processing the request.
         no COPY in progress
2023-05-11 20:19:22.248 MSK [2037134] FATAL:  invalid memory alloc request size 2021163525
2023-05-11 20:19:22.248 MSK [2037114] LOG:  startup process (PID 2037134) exited with exit code 1
2023-05-11 20:19:22.248 MSK [2037114] LOG:  terminating any other active server processes
2023-05-11 20:19:22.248 MSK [2037114] LOG:  shutting down due to startup process failure
2023-05-11 20:19:22.249 MSK [2037114] LOG:  database system is shut down

Best regards,
Alexander
Attachment

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)
Next
From: Robert Haas
Date:
Subject: Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)