Hi,
On 2025-03-05 11:19:46 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Post-commit issues due to debug_parallel_query=regress seem rather common,
> > surely not helped by CI/cfbot not flagging them. I wonder if we ought to make
> > one of the CI tasks use debug_parallel_query=regress, to avoid that problem?
> 
> Yeah, it certainly seems like a test coverage gap.
I decided to use freebsd, as it's a relatively fast task.  Additionally I
thought it might be interesting to do this testing on the test run that also
does debug_write_read_parse_plan_trees etc.
I tested it by intentionally not including the revert, and it indeed finds the
problem (not that that was really in doubt, but it seemed worth verifying).
https://cirrus-ci.com/task/5782413399293952?logs=test_world#L214
> However, we seem to be moving towards a situation where each type of CI run
> is a special snowflake that differs in multiple dimensions from other types.
> That might make it difficult to figure out which dimension is responsible
> for a particular failure.
True, but I don't really see an alternative. Having dedicated tasks for
testing just debug_parallel_query=regress (and about half a dozen other
things) on their own seems like a *lot* of resource usage for the gain.
> (OTOH, the same can be said of the buildfarm, and we've survived
> regardless.  So maybe I'm worried over nothing.)
The alternative seems to be to figure out the problem after commit, with
similar issues, as you point out. I'd much rather have to spend a minute
analyzing why one task triggers an issue than doing so under pressure after
commit.
I guess we could be add a "standardized" section at the top of each task
describing their oddities? Not sure it's worth it.
Greetings,
Andres Freund