Re: BUG #16627: union all with partioned table yields random aggregate results - Mailing list pgsql-bugs

From David Rowley
Subject Re: BUG #16627: union all with partioned table yields random aggregate results
Date
Msg-id CAApHDvrSeaVsJ_Dbj8CZroCda+4Q1tCAoxOvBsL-Rbfgk0Li8Q@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16627: union all with partioned table yields random aggregate results  (Brian Kanaga <kanaga@consumeracquisition.com>)
List pgsql-bugs
Hi Brian,

On Sat, 26 Sep 2020 at 05:58, Brian Kanaga
<kanaga@consumeracquisition.com> wrote:
> Here's a dump file that is able to reproduce the issue.  I would ask that this message and dump file url be kept
non-public.
> Please let me know when you've retrieved it and I will remove.  Thank you!

Thanks for sending me the updated link.

I can confirm that I can recreate this issue on 11.4. The two attached
explain files show a variance on actual rows on the parallel index
scan;

->  Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey
on fb_ad_activity_daily_archive_2019  (cost=0.56..37964.90 rows=49
width=1618) (actual time=1.909..8.301 rows=430 loops=3)

from one execution, and;

->  Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey
on fb_ad_activity_daily_archive_2019  (cost=0.56..37964.90 rows=49
width=1618) (actual time=2.039..9.043 rows=472 loops=3)

from another.

The minimum case to reproduce that I could find is:

create table t (a int not null);
insert into t select x from generate_Series(1,100) x,
generate_Series(1,4000000);
create index on t (a);
analyze t;
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
alter table t set (parallel_workers=8);
explain select count(*) from t where a in(10,20,30,40,50,60) and a in
(10,20,30,40,50,60);

The problem only seems to occur with the redundant IN clause added.
Your query is getting that as the outer one was pushed down into the
inner query, but it already existed there.

Although, I was testing on 11.3. On trying the latest, yet to be
released v12 code on REL_12_STABLE, I can't reproduce.

On looking a bit further as to why, I found a fix has already been
pushed [1], but the commit message there does not really mention the
wrong results issue. There's some discussion in  [2]. There's another
case to reproduce it on that thread too.

If you have the ability to build from source away from production,
feel free to try on the REL_11_STABLE branch and confirm that it's now
working ok.

I don't know the exact dates, but what will become 11.10 already has
that fix backpatched.  I expect that will be released around
mid-November.

In the meantime, you could remove the inner WHERE clause items which
are duplicated on the outer query. These should get pushed down into
the inner scans anyway. However, I'm not sure exactly how realiable
that will be as a fix.

(Copying in Amit, to let him know that someone did stumble upon this
in the wild.)

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4bc63462d9d8dd12a5564c3d4ca06c99449d2d07
[2] https://www.postgresql.org/message-id/flat/4248CABC-25E3-4809-B4D0-128E1BAABC3C%40amazon.com

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16638: Some tables in the database cannot be opened
Next
From: PG Bug reporting form
Date:
Subject: BUG #16639: Syntax error in pgsql when executing local \copy instead of SQL copy