On Mon, Apr 6, 2020 at 9:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > I don't know, I've tried running the tests on a number of machines,
> > similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
> > everywhere while the failures seem consistent.
>
> On my machine, it reproduces about one time in six with
> force_parallel_mode = regress. It seems possible given your
> results that reducing max_parallel_workers would make it more
> likely, but I've not tried that.
>
> What I'm seeing, after adding some debug printouts, is that sortMethod is
> frequently zero when we reach the EXPLAIN output for a worker. In many of
> the tests this happens even though there is no visible failure, because
> we've got a filter function hiding the output :-(
>
> So I concur with James' conclusion that the existing code is relying on
> sortMethod initializing to zeroes, and that we did the wrong thing by
> trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation.
> I do not like his patch though, particularly not the type pun with NULL.
Sentinel and NULL? I hadn't caught that at all.
> I think the correct fix is to change the enum declaration.
Hmm. I don't actually really like that, because it means the value
here isn't actually semantically correct. That is, the sort type is
not "in progress"; it's "we never started a sort at all". I don't
really love the conflating of those things that the old enum
declaration had (even it'd had a helpful comment). It seems to me that
we should make "we don't have a type" and "we have a type" distinct.
We could add a new enum value SORT_TYPE_UNINITIALIZED or similar though.
James