Re: Parallel Append can break run-time partition pruning - Mailing list pgsql-hackers

From David Rowley
Subject Re: Parallel Append can break run-time partition pruning
Date
Msg-id CAApHDvqv9s_6__JnoGru9Hq0yE_40eOzPGeHXDatr29qECE3yg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Append can break run-time partition pruning  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, 23 Apr 2020 at 14:37, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Apr 22, 2020 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > If there was some reason that a Parallel Append could come out more
> > expensive, then maybe we could just create a non-parallel Append using
> > the same subpath list and add_partial_path() it. I just don't quite
> > see how that would ever win though. I'm willing to be proven wrong
> > though.
>
> I think you're talking about the thing that this comment is trying to explain:
>
>     /*
>      * Consider a parallel-aware append using a mix of partial and non-partial
>      * paths.  (This only makes sense if there's at least one child which has
>      * a non-partial path that is substantially cheaper than any partial path;
>      * otherwise, we should use the append path added in the previous step.)
>      */
>
> Like, suppose there are two relations A and B, and we're appending
> them. A has no indexes, so we can only choose between a Seq Scan and
> an Index Scan. B has a GIST index that is well-suited to the query,
> but GIST indexes don't support parallel scans. So we've got three
> choices:
>
> 1. Don't use parallelism at all. Then, we can do a normal Append with
> a Seq Scan on a and an Index Scan on b. ("If we found unparameterized
> paths for all children, build an unordered, unparameterized Append
> path for the rel.")
>
> 2. Use parallelism for both relations. Then, we can do a Gather over a
> Parallel Append (or a regular Append, if Parallel Append is disabled)
> with a Parallel Seq Scan on a and a Parallel Seq Scan on b. As
> compared with #1, this should win for a, but it might lose heavily for
> b, because switching from an index scan to a Seq Scan could be a big
> loser. ("Consider an append of unordered, unparameterized partial
> paths.  Make it parallel-aware if possible.")
>
> 3. Use parallelism for a but not for b. The only way to do this is a
> Parallel Append, because there's no other way to mix partial and
> non-partial paths at present. This lets us get the benefit of a
> Parallel Seq Scan on a while still being able to do a non-parallel
> GIST Index Scan on b. This has a good chance of being better than #2,
> but it's fundamentally a costing decision, because if the table is
> small enough or the query isn't very selective, #2 will actually be
> faster, just on the raw power of more workers and less random I/O
> ("Consider a parallel-aware append using a mix of partial and
> non-partial paths.")

Thanks for those examples.

I ran this situation through the code but used a hash index instead of
GIST. The 3 settings which give us control over this plan are
enable_parallel_append, enable_indexscan, enable_bitmapscan.
enable_bitmapscan must be included since we can still get a parallel
bitmap scan with a hash index.

For completeness, I just tried with each of the 8 combinations of the
GUCs, but I'd detailed below which of your cases I'm testing as a
comment.  There are 4 cases since #2 works with parallel and
non-parallel Append.  Naturally, the aim is that the patched version
does not change the behaviour.

-- Test case
create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);
insert into listp select x,y from generate_Series(1,2) x,
generate_Series(1,1000000) y;
create index on listp2 using hash(b);
vacuum analyze listp;

explain (costs off) select * from listp where b = 1;
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1;
SET enable_indexscan = on;
SET enable_bitmapscan = off;
explain (costs off) select * from listp where b = 1; -- case #3, Mixed
scan of parallel and non-parallel paths with a Parallel Append
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1; -- case #2 with
Parallel Append
SET enable_indexscan = on;
SET enable_bitmapscan = on;
SET enable_parallel_append = off;
explain (costs off) select * from listp where b = 1;
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1; -- case #2 with
non-Parallel Append
SET enable_indexscan = on;
SET enable_bitmapscan = off;
explain (costs off) select * from listp where b = 1; -- case #1, best
serial plan
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1;

The results, patched/unpatched, are the same.

David



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: [PATCH] Fix buffer not null terminated on (ecpg lib)
Next
From: Craig Ringer
Date:
Subject: [PATCH] Fix install-tests target for vpath builds