Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
Date
Msg-id CAH2-Wzkhso9LTHKHW+KxZt=CEV1=T0wHpptkSz29oJcpDs02UQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, May 5, 2017 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> One idea that crossed my mind is to just have workers write all of
> their output tuples to a temp file and have the leader read them back
> in.  At some cost in I/O, this would completely eliminate the overhead
> of workers waiting for the leader.  In some cases, it might be worth
> it.  At the least, it could be interesting to try a prototype
> implementation of this with different queries (TPC-H, maybe) and see
> what happens.  It would give us some idea how much of a problem
> stalling on the leader is in practice.  Wait event monitoring could
> possibly also be used to figure out an answer to that question.

The use of temp files in all cases was effective in my parallel
external sort patch, relative to what I imagine an approach built on a
gather node would get you, but not because of the inherent slowness of
a Gather node. I'm not so sure that Gather is actually inherently
slow, given the interface it supports.

While incremental, retail processing of each tuple is flexible and
composable, it will tend to be slow compared to an approach based on
batch processing (for tasks where you happen to be able to get away
with batch processing). This is true for all the usual reasons --
better locality of access, better branch prediction properties, lower
"effective instruction count" due to having very tight inner loops,
and so on.

I agree with Andres that we shouldn't put too much effort into
modelling concurrency ahead of optimizing serial performance. The
machine's *aggregate* memory bandwidth should be used as efficiently
as possible, and parallelism is just one (very important) tool for
making that happen.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] modeling parallel contention (was: Parallel Appendimplementation)
Next
From: "MauMau"
Date:
Subject: Re: [HACKERS] [patch] Build pgoutput with MSVC