Parallel Append can break run-time partition pruning - Mailing list pgsql-hackers

From David Rowley
Subject Parallel Append can break run-time partition pruning
Date
Msg-id CAApHDvqSchs+ubdybcfFaSPB++EA7kqMaoqajtP0GtZvzOOR3g@mail.gmail.com
Whole thread Raw
Responses Re: Parallel Append can break run-time partition pruning  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
I had a report from the wilds that run-time partition pruning was not
working in certain cases.

After some investigation and obtaining the mockup of the actual case,
I discovered that the problem was down to accumulate_append_subpath()
hitting the case where it does not pullup a Parallel Append where the
first parallel node is > 0.

What's actually happening is that the plan is left with a nested
Append, and in this particular case, the top-level Append only has a
single subpath, to which the code for 8edd0e794 (Suppress Append and
MergeAppend plan nodes that have a single child) causes the nested
Append to be pulled up to become the main Append.   This causes
run-time pruning to break since we only attach the pruning information
to the top-level Append.

The most simplified test case I can find to demonstrate this issue is:

create table list (a int, b int) partition by list(a);
create table list_12 partition of list for values in(1,2) partition by list(a);
create table list_12_1 partition of list_12 for values in(1);
create table list_12_2 partition of list_12 for values in(2);

insert into list select 2,0 from generate_Series(1,1000000) x;
vacuum analyze list;

explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;

-- force the 2nd subnode of the Append to be non-parallel.
alter table list_12_1 set (parallel_workers=0);

explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;


The results of this in master are:

postgres=# explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
                                QUERY PLAN
---------------------------------------------------------------------------
 Gather (actual rows=0 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Result (actual rows=1 loops=1)
   ->  Parallel Append (actual rows=0 loops=3)
         ->  Parallel Seq Scan on list_12_2 list_2 (never executed)
               Filter: ((b > 0) AND (a = $0))
         ->  Parallel Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
               Filter: ((b > 0) AND (a = $0))
(11 rows)


postgres=# alter table list_12_1 set (parallel_workers=0);
ALTER TABLE
postgres=# explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
                                QUERY PLAN
---------------------------------------------------------------------------
 Gather (actual rows=0 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Result (actual rows=1 loops=1)
   ->  Parallel Append (actual rows=0 loops=3)
         ->  Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
               Filter: ((b > 0) AND (a = $0))
         ->  Parallel Seq Scan on list_12_2 list_2 (actual rows=0 loops=3)
               Filter: ((b > 0) AND (a = $0))
               Rows Removed by Filter: 333333
(12 rows)

Notice that we don't get "(never executed)" for list_12_2 in the 2nd case.

I'm a bit divided on what the correct fix is.  If I blame Parallel
Append for not trying hard enough to pull up the lower Append in
accumulate_append_subpath(), then clearly the parallel append code is
to blame. However, perhaps run-time pruning should be tagging on
PartitionPruneInfo to more than top-level Appends. Fixing the latter
case, code-wise is about as simple as removing the "rel->reloptkind ==
RELOPT_BASEREL &&" line from create_append_plan(). Certainly, if the
outer Append hadn't been a single subpath Append, then we wouldn't
have pulled up the lower-level Append, so perhaps we should be
run-time pruning lower-level ones too.

What do other people think?

(copying in Robert and Amit K due to their work on Parallel Append,
Tom as I seem to remember him complaining about
accumulate_append_subpath() at some point and Amit L because...
partitioning...)

David



pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Allow pg_read_all_stats to read pg_stat_progress_*
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority