Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED |
Date | |
Msg-id | CA+hUKGL6SrmFV7j1kTkfxyQ0ed_V1pcC36PwYpudCynHRRD32g@mail.gmail.com Whole thread Raw |
In response to | Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED (Andrew Dunstan <andrew@dunslane.net>) |
List | pgsql-hackers |
On Mon, Feb 20, 2023 at 2:46 AM Andrew Dunstan <andrew@dunslane.net> wrote: > On 2023-02-17 Fr 19:27, Thomas Munro wrote: >> FWIW I have a thing I call bfbot for slurping up similar >> data from the build farm. It's not pretty enough for public >> consumption, but I do know that this assertion hasn't failed there, >> except the cases I mentioned earlier, and a load of failures on >> lorikeet which was completely b0rked until recently. > > Are there things we need to do on the server side to make data extraction easier? It's a good question. One thought Andres mentioned to me is whether we might want to have an in-tree tool to find interesting stuff. That is, even locally during development, but also in the CI + buildfarm, a common tool could find and spit out human- and machine-readable highlights (backtraces, PANICs, assertions, ... like cfbot is now doing). Then the knowledge of what's interesting would be maintained and extended by all of us. On the other hand, as we think of new patterns over time to look out for, it's also nice to be able to re-scan old data to see if the new patterns occurred in the past (I've done this several times with cfbot's new highlight analyser as I corrected mistakes and added patterns). So maybe that's also a good idea, but a separate thing. Even if the analyser logic is not in-tree, we could try to make something that works pretty much the same across CI and BF. Perhaps we could think about some of those ideas once the BF is using meson? Aside from having just one system to think about, the meson build system is a bit more structured: it has a more formal concept of test suites and tests with machine readable results from the top level (JSON files etc), with names strictly corresponding to directories where the output is, etc. I think I'd basically want a complete list of available files (= like the artifacts on CI), and then I'd pull down the meson test result file and then decide which other files I also want to pull down (ie stuff relating to failed tests) to analyse. (Not that any of that is intractable with the autoconf or handrolled perl/MSVC stuff, it's just messier, and hard to get motivated when its days are numbered.) One little thing I remembered while looking into this general topic is the noise you get when we crash during pg_regress, which it'd be nice to fix: https://www.postgresql.org/message-id/flat/CA%2BhUKGL7hxqbadkto7e1FCOLQhuHg%3DwVn_PDZd6fDMbQrrZisA%40mail.gmail.com Another topic I'm interested in is how to find useful signals in the timing data. For example, when Nathan and I worked on walreceiver wakeup improvements, we didn't notice that we'd caused some tests to become dramatically slower, because of a pre-existing bug/thinko we hadn't noticed. I want a computer to tell me about this stuff. That's somewhat tricky because of all the noise, but hopefully it's not beyond the powers of statistics to notice that a test unexpectedly took a nap for 10s.
pgsql-hackers by date: