Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Seq Scan
Date
Msg-id CA+TgmoZFprsXkYdmSeu_ZnOwv48DVpVW66wT5LYJpocNA4kLNA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan
List pgsql-hackers
On Tue, Oct 13, 2015 at 5:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> - Although the changes in parallelpaths.c are in a good direction, I'm
> pretty sure this is not yet up to scratch.  I am less sure exactly
> what needs to be fixed, so I'll have to give some more thought to
> that.

Please find attached a proposed set of changes that I think are
better.  These changes compute a consider_parallel flag for each
RelOptInfo, which is true if it's a non-temporary relation whose
baserestrictinfo references no PARAM_EXEC parameters, sublinks, or
parallel-restricted functions.  Actually, I made an effort to set the
flag correctly even for baserels other than plain tables, and for
joinrels, though we don't technically need that stuff until we get to
the point of pushing joins beneath Gather nodes.  When we get there,
it will be important - any joinrel for which consider_parallel = false
needn't even try to generate parallel paths, while if
consider_parallel = true then we can consider it, if the costing makes
sense.

The advantage of this is that the logic is centralized.  If we have
parallel seq scan and also, say, parallel bitmap heap scan, your
approach would require that we duplicate the logic to check for
parallel-restricted functions for each path generation function.  By
caching it in the RelOptInfo, we don't have to do that.  The function
you wrote to generate parallel paths can just check the flag; if it's
false, return without generating any paths.  If it's true, then
parallel paths can be considered.

Ultimately, I think that each RelOptInfo should have a new List *
member containing a list of partial paths for that relation.  For a
baserel, we generate a partial path (e.g. Partial Seq Scan).  Then, we
can consider turning each partial path into a complete path by pushing
a Gather path on top of it.  For a joinrel, we can consider generating
a partial hash join or partial nest loop path by taking an outer
partial path and an ordinary inner path and putting the appropriate
path on top.  In theory it would also be correct to generate merge
join paths this way, but it's difficult to believe that such a plan
would ever be anything but a disaster.  These can then be used to
generate a complete path by putting a Gather node on top of them, or
they can bubble up to the next level of the join tree in the same way.
However, I think for the first version of this we can keep it simple:
if the consider_parallel flag is set on a relation, consider Gather ->
Partial Seq Scan.  If not, forget it.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Avoid full page images in streaming replication?
Next
From: Tom Lane
Date:
Subject: Re: Re: [BUGS] BUG #13611: test_postmaster_connection failed (Windows, listen_addresses = '0.0.0.0' or '::')