Re: Parallel Append can break run-time partition pruning - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Append can break run-time partition pruning |
Date | |
Msg-id | CAApHDvqv9s_6__JnoGru9Hq0yE_40eOzPGeHXDatr29qECE3yg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Append can break run-time partition pruning (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Thu, 23 Apr 2020 at 14:37, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Apr 22, 2020 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote: > > If there was some reason that a Parallel Append could come out more > > expensive, then maybe we could just create a non-parallel Append using > > the same subpath list and add_partial_path() it. I just don't quite > > see how that would ever win though. I'm willing to be proven wrong > > though. > > I think you're talking about the thing that this comment is trying to explain: > > /* > * Consider a parallel-aware append using a mix of partial and non-partial > * paths. (This only makes sense if there's at least one child which has > * a non-partial path that is substantially cheaper than any partial path; > * otherwise, we should use the append path added in the previous step.) > */ > > Like, suppose there are two relations A and B, and we're appending > them. A has no indexes, so we can only choose between a Seq Scan and > an Index Scan. B has a GIST index that is well-suited to the query, > but GIST indexes don't support parallel scans. So we've got three > choices: > > 1. Don't use parallelism at all. Then, we can do a normal Append with > a Seq Scan on a and an Index Scan on b. ("If we found unparameterized > paths for all children, build an unordered, unparameterized Append > path for the rel.") > > 2. Use parallelism for both relations. Then, we can do a Gather over a > Parallel Append (or a regular Append, if Parallel Append is disabled) > with a Parallel Seq Scan on a and a Parallel Seq Scan on b. As > compared with #1, this should win for a, but it might lose heavily for > b, because switching from an index scan to a Seq Scan could be a big > loser. ("Consider an append of unordered, unparameterized partial > paths. Make it parallel-aware if possible.") > > 3. Use parallelism for a but not for b. The only way to do this is a > Parallel Append, because there's no other way to mix partial and > non-partial paths at present. This lets us get the benefit of a > Parallel Seq Scan on a while still being able to do a non-parallel > GIST Index Scan on b. This has a good chance of being better than #2, > but it's fundamentally a costing decision, because if the table is > small enough or the query isn't very selective, #2 will actually be > faster, just on the raw power of more workers and less random I/O > ("Consider a parallel-aware append using a mix of partial and > non-partial paths.") Thanks for those examples. I ran this situation through the code but used a hash index instead of GIST. The 3 settings which give us control over this plan are enable_parallel_append, enable_indexscan, enable_bitmapscan. enable_bitmapscan must be included since we can still get a parallel bitmap scan with a hash index. For completeness, I just tried with each of the 8 combinations of the GUCs, but I'd detailed below which of your cases I'm testing as a comment. There are 4 cases since #2 works with parallel and non-parallel Append. Naturally, the aim is that the patched version does not change the behaviour. -- Test case create table listp (a int, b int) partition by list(a); create table listp1 partition of listp for values in(1); create table listp2 partition of listp for values in(2); insert into listp select x,y from generate_Series(1,2) x, generate_Series(1,1000000) y; create index on listp2 using hash(b); vacuum analyze listp; explain (costs off) select * from listp where b = 1; SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; SET enable_indexscan = on; SET enable_bitmapscan = off; explain (costs off) select * from listp where b = 1; -- case #3, Mixed scan of parallel and non-parallel paths with a Parallel Append SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; -- case #2 with Parallel Append SET enable_indexscan = on; SET enable_bitmapscan = on; SET enable_parallel_append = off; explain (costs off) select * from listp where b = 1; SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; -- case #2 with non-Parallel Append SET enable_indexscan = on; SET enable_bitmapscan = off; explain (costs off) select * from listp where b = 1; -- case #1, best serial plan SET enable_indexscan = off; explain (costs off) select * from listp where b = 1; The results, patched/unpatched, are the same. David
pgsql-hackers by date: