Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED |
Date | |
Msg-id | 20230218010649.vlksvn3o7jtshp4y@awork3.anarazel.de Whole thread Raw |
In response to | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
|
List | pgsql-hackers |
Hi, On 2023-02-18 13:27:04 +1300, Thomas Munro wrote: > I still have no theory for how this condition was reached despite a > lot of time thinking about it and searching for more clues. As far as > I can tell, the recent improvements to postmaster's signal and event > handling shouldn't be related: the state management and logic was > unchanged. Yea, it's all very odd. If you look at the log: 2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name DETAIL: No valid identifier after ".". 2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name STATEMENT: SELECT parse_ident('xxx.1020'); ... TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), File: "../src/backend/storage/ipc/pmsignal.c",Line: 329, PID: 5948 abort() has been called ... 2023-02-08 00:53:27.420 GMT postmaster[872] LOG: server process (PID 5948) was terminated by exception 0xC0000354 2023-02-08 00:53:27.420 GMT postmaster[872] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value. 2023-02-08 00:53:27.420 GMT postmaster[872] LOG: terminating any other active server processes 2023-02-08 00:53:27.434 GMT postmaster[872] LOG: all server processes terminated; reinitializing and that it's indeed the money test that failed: money ... FAILED (test process exited with exit code 2) 7337 ms it's very hard to understand how this stack can come to be: 00000085`f03ffa40 00007ff6`fd89faa8 ucrtbased!abort(void)+0x5a [minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77] 00000085`f03ffa80 00007ff6`fd6474dc postgres!ExceptionalCondition( char * conditionName = 0x00007ff6`fdd03ca8 "PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED", char * fileName = 0x00007ff6`fdd03c80 "../src/backend/storage/ipc/pmsignal.c", int lineNumber = 0n329)+0x78 [c:\cirrus\src\backend\utils\error\assert.c @ 67] 00000085`f03ffac0 00007ff6`fd676eff postgres!MarkPostmasterChildActive(void)+0x7c [c:\cirrus\src\backend\storage\ipc\pmsignal.c@ 329] 00000085`f03ffb00 00007ff6`fd59aa3a postgres!InitProcess(void)+0x2ef [c:\cirrus\src\backend\storage\lmgr\proc.c @ 375] 00000085`f03ffb60 00007ff6`fd467689 postgres!SubPostmasterMain( int argc = 0n3, char ** argv = 0x000001c6`f3814e80)+0x33a [c:\cirrus\src\backend\postmaster\postmaster.c @ 4962] 00000085`f03ffd90 00007ff6`fda0e1c9 postgres!main( int argc = 0n3, char ** argv = 0x000001c6`f3814e80)+0x2f9 [c:\cirrus\src\backend\main\main.c @ 192] How can a process that we did notify crashing, that has already executed SQL statements, end up in MarkPostmasterChildActive()? > While failing to understand this, I worked[1] on CI log indexing tool > with public reports that highlight this sort of thing[2], so I'll be > watching out for more evidence. Unfortunately I have no data from > before 1 Feb (cfbot previously wasn't interested in the past at all; > I'd need to get my hands on the commit IDs for earlier testing but I > can't figure out how to get those out of Cirrus or Github -- anyone > know how?). FWIW I have a thing I call bfbot for slurping up similar > data from the build farm. It's not pretty enough for public > consumption, but I do know that this assertion hasn't failed there, > except the cases I mentioned earlier, and a load of failures on > lorikeet which was completely b0rked until recently. > [1] https://xkcd.com/974/ > [2] http://cfbot.cputube.org/highlights/assertion-90.html I think this extremely useful. Greetings, Andres Freund
pgsql-hackers by date: