Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] Parallel Hash take II
Date
Msg-id CAEepm=0GBJVFRdsjbhtjCWuQk=QzxrTUhhySnezq6FvYTdb=1A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Hash take II  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Nov 16, 2017 at 8:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 15, 2017 at 1:35 PM, Andres Freund <andres@anarazel.de> wrote:
>> But this does bug me, and I think it's what made me pause here to make a
>> bad joke.  The way that parallelism treats work_mem makes it even more
>> useless of a config knob than it was before.  Parallelism, especially
>> after this patch, shouldn't compete / be benchmarked against a
>> single-process run with the same work_mem. To make it "fair" you need to
>> compare parallelism against a single threaded run with work_mem *
>> max_parallelism.
>
> I don't really know how to do a fair comparison between a parallel
> plan and a non-parallel plan.  Even if the parallel plan contains zero
> nodes that use work_mem, it might still use more memory than the
> non-parallel plan, because a new backend uses a bunch of memory.  If
> you really want a comparison that is fair on the basis of memory
> usage, you have to take that into account somehow.
>
> But even then, the parallel plan is also almost certainly consuming
> more CPU cycles to produce the same results.  Parallelism is all about
> trading away efficiency for execution time.  Not just because of
> current planner and executor limitations, but intrinsically, parallel
> plans are less efficient.  The globally optimal solution on a system
> that is short on either memory or CPU cycles is to turn parallelism
> off.

The guys who worked on the first attempt at Parallel Query for
Berkeley POSTGRES (and then ripped that out, moving to another project
called XPRS which I have found no trace of, perhaps it finished up in
some commercial RDBMS) wrote this[1]:

"The objective function that XPRS uses for query optimization is a
combination of resource consumption and response time as follows:
 cost = resource consumption + w * response time

Here w is a system-specifc weighting factor. A small w mostly
optimizes resource consumption, while a large w mostly optimizes
response time. Resource consumption is measured by the number of disk
pages accessed and number of tuples processed, while response time is
the elapsed time for executing the query."

http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] Parallel Hash take II
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Re: protocol version negotiation (Re: LibpqPGRES_COPY_BOTH - version compatibility)