My guess is that the consequences of that bad estimate are sensitive to arbitrary other parameters moving around, as you can see from the big jump in execution time I showed in the that message, measured on unpatched master of the day:
4 workers = 9.5s 3 workers = 39.7s
That's why why both parallel hash join and partition-wise join are showing regressions on Q21: it's just flip-flopping between various badly costed plans. Note that even without parallelism, the fix that Tom Lane suggested gives a much better plan:
Following the discussion at [1], with the patch Thomas posted there, now Q21 completes in some 160 seconds. The plan is changed for the good but does not use partition-wise join. The output of explain analyse is attached.
Not just the join orders but the join strategy itself changed, with the patch no hash semi join is picked which was consuming most time there, rather nested loop semi join is in picture now, though the estimates are still way-off, but the change in join-order made them terrible from horrible. It appears like this query is performing efficient now particularly because of worse under-estimated hash-join as compared to under-estimated nested loop join.