Re: Choosing parallel_degree - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Choosing parallel_degree
Date
Msg-id CA+TgmoaKz3VZb16tdDeiRZOE2qwkOxqp==dkqOOi+gZJxUr3fw@mail.gmail.com
Whole thread Raw
In response to Re: Choosing parallel_degree  (Paul Ramsey <pramsey@cleverelephant.ca>)
Responses Re: Choosing parallel_degree  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Apr 8, 2016 at 12:27 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Insofar as the patch is throttling how many parallel workers you get
> based solely on your relsize, it does concern this patch, but it's a
> general issue in both the extreme and not obviously related costings
> needed to trip parallel sequence and parallel aggregate plans. The
> parallel join seems to not take function/operator costs into account
> at all [3], at least I couldn't plump up a high enough cost to trip it
> without also adjusting the global parallel tuple cost configuration.
>
> I've seen a number of asides to the effect that "yes, costs are
> important, but we probably can't do anything about that for 9.6" in
> parallel patch threads, including this one, so I'm getting concerned
> that the core improvement we've been hoping for for years won't
> actually address our use cases when it is first released. That may
> just be the way it is, c'est la vie, but it would be unfortunate.

So, I don't think that the patch makes anything worse for PostGIS.  At
most, it doesn't make anything better for PostGIS.  Without the patch,
the number of workers in a parallel plan is based strictly on the size
of the relation upon which the parallel plan performs a parallel
sequential scan.  With the patch, you can - if you wish - substitute
some other number for the one the planner comes up with.  But you
don't have to do that, so it can't be worse than before.  To the
contrary, I'm fairly confident that it is useful, if not everything
you might ever have wanted.

Now, I think you really want is for the planner to be much smarter on
a broad level in determining when to use parallelism and how many
workers to pick on a per-query basis, and to take function and
operator costs into account in making that determination.  I agree
that's something that the planner should do, but it's definitely not
going to happen in 9.6.  It's going to require major new planner
infrastructure that will probably take years to build and get right,
unless Tom gets interested (he said hopefully).

There are basically two ways that a query can use too many workers.
First, at some point, when you add workers to a plan tree, the new and
larger number of workers don't produce tuples any faster in the
aggregate than a smaller number of workers would have done.  For
example, if a scan becomes I/O bound, adding more workers probably
won't help much.  Second, at some point, the workers you already have
are producing tuples as fast as the leader can consume them, and
producing them faster doesn't do you any good because the extra
workers will just spend time waiting for the leader to get around to
servicing them.

I have an idea about how to solve the first problem, which I've
started to mock up as a PoC.  What it does is track the portion of the
cost of each node that is believed to represent non-parallelizable
work, hereinafter serial_cost.  So if the query involves a lot of
CPU-intensive functions, then the ration of total_cost / serial_cost
will be high, suggesting that many workers will be useful.  If that
ratio is low, it suggests that the query is mostly I/O-bound or
IPC-bound anyway, and adding workers isn't going to do much.  Details
are still a bit fuzzy, and I don't have this developed enough that
it's actually testable in any way yet, but that's my idea.

The second problem is substantially harder, from what I can see.  To
judge how whether the leader can keep up, you need to know how much
CPU work it will have to do per gathered tuple.  In order to know
that, you'd need to know the costs of the plan nodes above the Gather
before knowing the cost of the Gather itself.  That's obviously a bit
challenging since our costing code doesn't work anything like that.
To some extent, this circularity is also an issue for the first
problem, since the costing of the lower nodes depending on how many
workers you have, but the number of workers you want at the point
where you Gather is dependent on the costing of those same lower
nodes.

I think there's probably a way through this mess - other people have
written good parallel optimizers before - but it is not like there is
some reasonably small tweak that will give you what you are hoping for
here.  Figuring out how all of this needs to work is a massive project
in its own right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Christian Ullrich
Date:
Subject: Re: VS 2015 support in src/tools/msvc
Next
From: Robert Haas
Date:
Subject: Re: Combining Aggregates