Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
Date
Msg-id CAEepm=1KoDvMPGussumtdW=eaj5G_ZANfpBiwionVyCY36Zf5Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Fri, May 5, 2017 at 2:23 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 5 May 2017 at 13:37, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-05-02 15:13:58 -0400, Robert Haas wrote:
>>> Multiple people (including David Rowley
>>> as well as folks here at EnterpriseDB) have demonstrated that for
>>> certain queries, we can actually use a lot more workers and everything
>>> works great.  The problem is that for other queries, using a lot of
>>> workers works terribly.  The planner doesn't know how to figure out
>>> which it'll be - and honestly, I don't either.
>>
>> Have those benchmarks, even in a very informal form, been shared /
>> collected / referenced centrally?  I'd be very interested to know where
>> the different contention points are. Possibilities:
>
> I posted mine on [1], although the post does not go into much detail
> about the contention points. I only really briefly mention it at the
> end.

Just for fun, check out pages 42 and 43 of Wei Hong's thesis.  He
worked on Berkeley POSTGRES parallel query and a spin-off called XPRS,
and they got linear seq scan scaling up to number of spindles:

http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf

It gather from flicking through the POSTGRES 4.2 sources and this
stuff about XPRS that they switched from a "launch N workers!" model
to a "generate tasks and schedule them" model somewhere between these
systems.  Chapters 2 and 3 cover the problem of avoiding excessive
parallelism that reduces performance adjusting dynamically to maximum
throughput.  I suspect we're going that way too at some point, and it
would certainly fix some problems I ran into with Parallel Shared
Hash.

XPRS's cost model included resource consumption, not just 'timerons'.
This is something I grappled with when trying to put a price tag on
Parallel Shared Hash plans where just one worker builds the hash table
while the others wait.  I removed that plan from the patch because it
became mostly redundant, but when it was there Postgres thought it was
the same cost as a plan where every worker hammers your system
building the same hash table, whereas XPRS would have considered such
a plan ludicrously expensive (depending on his 'w' term, see page 28,
which determines whether you care more about resource usage or
response time).

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] modeling parallel contention (was: Parallel Appendimplementation)
Next
From: Joe Conway
Date:
Subject: Re: [HACKERS] CTE inlining