Hi,
One CI run for the meson branch just failed in a way I hadn't seen before on
windows, when nothing had changed on windows
https://cirrus-ci.com/task/6111743586861056
027_stream_regress.pl ended up failing due to a timeout. Which in turn was
caused by the standby crashing.
2022-08-10 01:46:20.731 GMT [2212][startup] PANIC: hash_xlog_split_allocate_page: failed to acquire cleanup lock
2022-08-10 01:46:20.731 GMT [2212][startup] CONTEXT: WAL redo at 0/7A6EED8 for Hash/SPLIT_ALLOCATE_PAGE: new_bucket
31,meta_page_masks_updated F, issplitpoint_changed F; blkref #0: rel 1663/16384/24210, blk 23; blkref #1: rel
1663/16384/24210,blk 45; blkref #2: rel 1663/16384/24210, blk 0
abort() has been called2022-08-10 01:46:31.919 GMT [7560][checkpointer] LOG: restartpoint starting: time
2022-08-10 01:46:32.430 GMT [8304][postmaster] LOG: startup process (PID 2212) was terminated by exception 0xC0000354
stack dump:
https://api.cirrus-ci.com/v1/artifact/task/6111743586861056/crashlog/crashlog-postgres.exe_21c8_2022-08-10_01-46-28-215.txt
The relevant code triggering it:
newbuf = XLogInitBufferForRedo(record, 1);
_hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket,
xlrec->new_bucket_flag, true);
if (!IsBufferCleanupOK(newbuf))
elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire cleanup lock");
Why do we just crash if we don't already have a cleanup lock? That can't be
right. Or is there supposed to be a guarantee this can't happen?
Greetings,
Andres Freund