Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Date
Msg-id CA+hUKGJUH_UN2G1EHpmvKBaJMbyuhrrxORw8yzmO4BHwUdqEMw@mail.gmail.com
Whole thread Raw
In response to Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
List pgsql-hackers
I still have no theory for how this condition was reached despite a
lot of time thinking about it and searching for more clues.  As far as
I can tell, the recent improvements to postmaster's signal and event
handling shouldn't be related: the state management and logic was
unchanged.

While failing to understand this, I worked[1] on CI log indexing tool
with public reports that highlight this sort of thing[2], so I'll be
watching out for more evidence.  Unfortunately I have no data from
before 1 Feb (cfbot previously wasn't interested in the past at all;
I'd need to get my hands on the commit IDs for earlier testing but I
can't figure out how to get those out of Cirrus or Github -- anyone
know how?).  FWIW I have a thing I call bfbot for slurping up similar
data from the build farm.  It's not pretty enough for public
consumption, but I do know that this assertion hasn't failed there,
except the cases I mentioned earlier, and a load of failures on
lorikeet which was completely b0rked until recently.

[1] https://xkcd.com/974/
[2] http://cfbot.cputube.org/highlights/assertion-90.html



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Next
From: Amin
Date:
Subject: Share variable between psql backends in CustomScan