BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #17928: Standby fails to decode WAL on termination of primary |
Date | |
Msg-id | 17928-aa92416a70ff44a2@postgresql.org Whole thread Raw |
Responses |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17928 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 15.2 Operating system: Ubuntu 22.04 Description: When a primary server is stopped, standby might fail to replay WAL with the memory allocation error, for example: 2023-05-11 08:42:38.254 MSK [1706654] LOG: recovery restart point at 7/12012E88 2023-05-11 08:42:38.254 MSK [1706654] DETAIL: Last completed transaction was at log time 2023-05-11 08:42:38.245399+03. 2023-05-11 08:42:38.420 MSK [1706654] LOG: restartpoint starting: wal 2023-05-11 08:42:38.451 MSK [1725929] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2023-05-11 08:42:38.468 MSK [1706656] FATAL: invalid memory alloc request size 2021163525 2023-05-11 08:42:38.469 MSK [1706643] LOG: startup process (PID 1706656) exited with exit code 1 2023-05-11 08:42:38.469 MSK [1706643] LOG: terminating any other active server processes 2023-05-11 08:42:38.469 MSK [1706643] LOG: shutting down due to startup process failure 2023-05-11 08:42:38.470 MSK [1706643] LOG: database system is shut down The call stack of the error: ... #6 0x000055cfabab8e97 in palloc_extended (size=2021163525, flags=2) at mcxt.c:1143 #7 0x000055cfab456ae0 in XLogReadRecordAlloc (state=0x55cfacef4a08, xl_tot_len=2021161080, allow_oversized=true) at xlogreader.c:524 #8 0x000055cfab456dee in XLogDecodeNextRecord (state=0x55cfacef4a08, nonblocking=false) at xlogreader.c:689 #9 0x000055cfab4575c0 in XLogReadAhead (state=0x55cfacef4a08, nonblocking=false) at xlogreader.c:960 #10 0x000055cfab45519d in XLogPrefetcherNextBlock (pgsr_private=94350447959632, lsn=0x55cfacf4bc58) at xlogprefetcher.c:496 #11 0x000055cfab454b1c in lrq_prefetch (lrq=0x55cfacf4bb08) at xlogprefetcher.c:256 #12 0x000055cfab454cdf in lrq_complete_lsn (lrq=0x55cfacf4bb08, lsn=2976923632) at xlogprefetcher.c:294 #13 0x000055cfab455df2 in XLogPrefetcherReadRecord (prefetcher=0x55cfacef5e50, errmsg=0x7ffc2e410458) at xlogprefetcher.c:1039 #14 0x000055cfab45f136 in ReadRecord (xlogprefetcher=0x55cfacef5e50, emode=15, fetching_ckpt=false, replayTLI=1) at xlogrecovery.c:3047 #15 0x000055cfab45c922 in PerformWalRecovery () at xlogrecovery.c:1754 #16 0x000055cfab448869 in StartupXLOG () at xlog.c:5300 #17 0x000055cfab7d151a in StartupProcessMain () at startup.c:267 #18 0x000055cfab7c3f80 in AuxiliaryProcessMain (auxtype=StartupProcess) at auxprocess.c:141 #19 0x000055cfab7cfb4b in StartChildProcess (type=StartupProcess) at postmaster.c:5429 #20 0x000055cfab7ca5c0 in PostmasterMain (argc=4, argv=0x55cfacef3e40) at postmaster.c:1470 #21 0x000055cfab6be302 in main (argc=4, argv=0x55cfacef3e40) at main.c:202 (gdb) frame 8 #8 0x000055cfab456dee in XLogDecodeNextRecord (state=0x55cfacef4a08, nonblocking=false) at xlogreader.c:689 689 decoded = XLogReadRecordAlloc(state, (gdb) print/x *record $5 = {xl_tot_len = 0x78787878, xl_xid = 0x78787878, xl_prev = 0x7878787878787878, xl_info = 0x0, xl_rmid = 0x10, xl_crc = 0x0} (gdb) print/x state->NextRecPtr $6 = 0xb1703ff0 hexdump -C .../standby_data/pgdata/pg_wal/0000000100000000000000B1 00703fe0 00 00 04 40 00 00 0c 56 00 00 00 00 00 00 00 00 ...@...V........ 00703ff0 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx 00704000 10 d1 05 00 01 00 00 00 00 40 70 30 00 00 00 00 .........@p0.... 00704010 db 00 00 00 00 00 00 00 78 78 78 78 78 78 78 78 ........xxxxxxxx 00704020 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx * 007040f0 09 00 00 00 00 00 00 00 33 02 00 00 1b 0d 00 00 ........3....... hexdump -C .../primary_data/pgdata/pg_wal/0000000100000000000000B1 00703fe0 00 00 04 40 00 00 0c 56 00 00 00 00 00 00 00 00 ...@...V........ 00703ff0 33 02 00 00 54 2a 00 00 e0 3e 70 b1 00 00 00 00 3...T*...>p..... 00704000 10 d1 05 00 01 00 00 00 00 40 70 b1 00 00 00 00 .........@p..... 00704010 23 02 00 00 00 00 00 00 80 0a 00 00 55 0f 07 2c #...........U.., 00704020 00 60 02 02 7f 06 00 00 05 00 00 00 f7 85 01 00 .`.............. 00704030 00 00 00 00 ff 03 02 00 02 08 18 00 01 00 00 00 ................ 00704040 e0 07 00 00 78 78 78 78 78 78 78 78 78 78 78 78 ....xxxxxxxxxxxx 00704050 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx * 00704230 78 78 78 78 78 78 78 78 01 00 00 00 00 00 00 00 xxxxxxxx........ (In this case, rows with 500 'x' were inserted when the primary was stopped.) `git bisect` for this behavior blames 3f1ce9734 (where XLogDecodeNextRecord() -> XLogReadRecordAlloc() call was introduced). A reproducer for the anomaly to follow.
pgsql-bugs by date: