Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED - Mailing list pgsql-hackers

From Andres Freund
Subject Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Date
Msg-id 20230218010649.vlksvn3o7jtshp4y@awork3.anarazel.de
Whole thread Raw
In response to Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
List pgsql-hackers
Hi,

On 2023-02-18 13:27:04 +1300, Thomas Munro wrote:
> I still have no theory for how this condition was reached despite a
> lot of time thinking about it and searching for more clues.  As far as
> I can tell, the recent improvements to postmaster's signal and event
> handling shouldn't be related: the state management and logic was
> unchanged.

Yea, it's all very odd.

If you look at the log:

2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name DETAIL:  No valid identifier after ".".
2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name STATEMENT:  SELECT parse_ident('xxx.1020');
...
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), File:
"../src/backend/storage/ipc/pmsignal.c",Line: 329, PID: 5948
 
abort() has been called
...
2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  server process (PID 5948) was terminated by exception 0xC0000354
2023-02-08 00:53:27.420 GMT postmaster[872] HINT:  See C include file "ntstatus.h" for a description of the hexadecimal
value.
2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  terminating any other active server processes
2023-02-08 00:53:27.434 GMT postmaster[872] LOG:  all server processes terminated; reinitializing


and that it's indeed the money test that failed:
     money                        ... FAILED (test process exited with exit code 2)     7337 ms

it's very hard to understand how this stack can come to be:

00000085`f03ffa40 00007ff6`fd89faa8     ucrtbased!abort(void)+0x5a [minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @
77]
00000085`f03ffa80 00007ff6`fd6474dc     postgres!ExceptionalCondition(
            char * conditionName = 0x00007ff6`fdd03ca8 "PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED", 
            char * fileName = 0x00007ff6`fdd03c80 "../src/backend/storage/ipc/pmsignal.c", 
            int lineNumber = 0n329)+0x78 [c:\cirrus\src\backend\utils\error\assert.c @ 67]
00000085`f03ffac0 00007ff6`fd676eff     postgres!MarkPostmasterChildActive(void)+0x7c
[c:\cirrus\src\backend\storage\ipc\pmsignal.c@ 329]
 
00000085`f03ffb00 00007ff6`fd59aa3a     postgres!InitProcess(void)+0x2ef [c:\cirrus\src\backend\storage\lmgr\proc.c @
375]
00000085`f03ffb60 00007ff6`fd467689     postgres!SubPostmasterMain(
            int argc = 0n3, 
            char ** argv = 0x000001c6`f3814e80)+0x33a [c:\cirrus\src\backend\postmaster\postmaster.c @ 4962]
00000085`f03ffd90 00007ff6`fda0e1c9     postgres!main(
            int argc = 0n3, 
            char ** argv = 0x000001c6`f3814e80)+0x2f9 [c:\cirrus\src\backend\main\main.c @ 192]

How can a process that we did notify crashing, that has already executed SQL
statements, end up in MarkPostmasterChildActive()?



> While failing to understand this, I worked[1] on CI log indexing tool
> with public reports that highlight this sort of thing[2], so I'll be
> watching out for more evidence.  Unfortunately I have no data from
> before 1 Feb (cfbot previously wasn't interested in the past at all;
> I'd need to get my hands on the commit IDs for earlier testing but I
> can't figure out how to get those out of Cirrus or Github -- anyone
> know how?).  FWIW I have a thing I call bfbot for slurping up similar
> data from the build farm.  It's not pretty enough for public
> consumption, but I do know that this assertion hasn't failed there,
> except the cases I mentioned earlier, and a load of failures on
> lorikeet which was completely b0rked until recently.

> [1] https://xkcd.com/974/
> [2] http://cfbot.cputube.org/highlights/assertion-90.html

I think this extremely useful.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: File* argument order, argument types
Next
From: "Imseih (AWS), Sami"
Date:
Subject: Re: Add index scan progress to pg_stat_progress_vacuum