Re: Parallel Append can break run-time partition pruning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel Append can break run-time partition pruning |
Date | |
Msg-id | CA+TgmoYeAOArB5erBtw8BXSF1XThKtB6nG-g5NEjM5rva1TbjQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Append can break run-time partition pruning (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Parallel Append can break run-time partition pruning
|
List | pgsql-hackers |
On Wed, Apr 22, 2020 at 7:36 PM David Rowley <dgrowleyml@gmail.com> wrote: > If there was some reason that a Parallel Append could come out more > expensive, then maybe we could just create a non-parallel Append using > the same subpath list and add_partial_path() it. I just don't quite > see how that would ever win though. I'm willing to be proven wrong > though. I think you're talking about the thing that this comment is trying to explain: /* * Consider a parallel-aware append using a mix of partial and non-partial * paths. (This only makes sense if there's at least one child which has * a non-partial path that is substantially cheaper than any partial path; * otherwise, we should use the append path added in the previous step.) */ Like, suppose there are two relations A and B, and we're appending them. A has no indexes, so we can only choose between a Seq Scan and an Index Scan. B has a GIST index that is well-suited to the query, but GIST indexes don't support parallel scans. So we've got three choices: 1. Don't use parallelism at all. Then, we can do a normal Append with a Seq Scan on a and an Index Scan on b. ("If we found unparameterized paths for all children, build an unordered, unparameterized Append path for the rel.") 2. Use parallelism for both relations. Then, we can do a Gather over a Parallel Append (or a regular Append, if Parallel Append is disabled) with a Parallel Seq Scan on a and a Parallel Seq Scan on b. As compared with #1, this should win for a, but it might lose heavily for b, because switching from an index scan to a Seq Scan could be a big loser. ("Consider an append of unordered, unparameterized partial paths. Make it parallel-aware if possible.") 3. Use parallelism for a but not for b. The only way to do this is a Parallel Append, because there's no other way to mix partial and non-partial paths at present. This lets us get the benefit of a Parallel Seq Scan on a while still being able to do a non-parallel GIST Index Scan on b. This has a good chance of being better than #2, but it's fundamentally a costing decision, because if the table is small enough or the query isn't very selective, #2 will actually be faster, just on the raw power of more workers and less random I/O ("Consider a parallel-aware append using a mix of partial and non-partial paths.") It seems to me that all three strategies are viable. The third one is much less likely to be used now that we have parallel index scans for btree and parallel bitmap heap scans, but I believe it can be a winner if you have the right case. You want to think about cases where there are parallel plans available for everything in the tree, but at least some of the children have much better non-parallel plans. Note that for strategy #2 we always prefer Parallel Append to non-Parallel Append on the optimistic assumption that Parallel Append will always be better; we only use regular Append if Parallel Append is disabled. But for strategy #3 there is no such choice to be made: a regular Append would not be valid. If placed under a Gather, it would execute the non-partial paths more than once; if not placed under a Gather, we'd have partial paths without any Gather above them, which is an invalid plan shape. So you can't "just use a regular Append" in case #3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: