Home > mailing lists

Re: BUG #16627: union all with partioned table yields random aggregate results - Mailing list pgsql-bugs

From	David Rowley
Subject	Re: BUG #16627: union all with partioned table yields random aggregate results
Date	September 23, 2020 06:10:08
Msg-id	CAApHDvrOj=Wgy16ZRXgKN7ZrzuSfosx58GvY1Bqn804pMu=13w@mail.gmail.com Whole thread Raw
In response to	BUG #16627: union all with partioned table yields random aggregate results (PG Bug reporting form <noreply@postgresql.org>)
Responses	RE: BUG #16627: union all with partioned table yields random aggregate results
List	pgsql-bugs

Tree view

On Wed, 23 Sep 2020 at 03:30, PG Bug reporting form
<noreply@postgresql.org> wrote:
>
>                         -- these columns will be missing in some tbls
>                         avg(cast(relevance_score as float4)) as relevance_score,
>                         avg(cast(positive_feedback as float4)) as positive_feedback,
>                         avg(cast(negative_feedback as float4)) as negative_feedback,
>

In addition to Tom's request, it would also be good to understand more
about these "random" results.

I see you have a number of aggregate functions run on the results. It
would be good to understand if all of them are incorrect in the
parallel version of the query or if it's just some of them, e.g just
the float4 ones I've quoted above.  Essentially there is some sense of
randomness to the results of floating-point aggregation in all cases
due to the lossy precision of floating points. This is much more
likely to be seen during parallel aggregation as how each tuple is
aggregated is highly likely to differ for each run of the query. That
order is much more likely to be stable in the serial query.

If you are just seeing the floating-point aggregates vary then it
might be worth casting floating-point values to NUMERIC to see if that
gets rid of the randomness.

If that's not the issue then you may also like to help try to narrow
the scope of the problem. Does the problem still occur when you
disable enable_parallel_hash? How about enable_parallel_append?

There was also some mangling of the query plans you posted. I took the
time to format those again and attached the result.

Looking over the row counts between each version of the plan I didn't
find anything that would indicate additional rows were seen in the
parallel version of the plan. However, there's some precision loss
which I couldn't account for around the actual rows being divided by
the parallel workers with integer division, so that process is not
100% accurate.

David

Attachment

explain.txt

pgsql-bugs by date:

From: PG Bug reporting form
Date: 23 September 2020, 03:30:40
Subject: BUG #16629: Repo Error During Upgrade

From: PG Bug reporting form
Date: 23 September 2020, 11:26:55
Subject: BUG #16630: TupleDesc reference leak: TupleDesc 0x7fad89c9c928 (16386,-1) still referenced

Re: BUG #16627: union all with partioned table yields random aggregate results - Mailing list pgsql-bugs

Attachment

Previous

Next