Home > mailing lists

Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
Date	May 6, 2017 03:42:32
Msg-id	CAH2-Wzkhso9LTHKHW+KxZt=CEV1=T0wHpptkSz29oJcpDs02UQ@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, May 5, 2017 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> One idea that crossed my mind is to just have workers write all of
> their output tuples to a temp file and have the leader read them back
> in.  At some cost in I/O, this would completely eliminate the overhead
> of workers waiting for the leader.  In some cases, it might be worth
> it.  At the least, it could be interesting to try a prototype
> implementation of this with different queries (TPC-H, maybe) and see
> what happens.  It would give us some idea how much of a problem
> stalling on the leader is in practice.  Wait event monitoring could
> possibly also be used to figure out an answer to that question.

The use of temp files in all cases was effective in my parallel
external sort patch, relative to what I imagine an approach built on a
gather node would get you, but not because of the inherent slowness of
a Gather node. I'm not so sure that Gather is actually inherently
slow, given the interface it supports.

While incremental, retail processing of each tuple is flexible and
composable, it will tend to be slow compared to an approach based on
batch processing (for tasks where you happen to be able to get away
with batch processing). This is true for all the usual reasons --
better locality of access, better branch prediction properties, lower
"effective instruction count" due to having very tight inner loops,
and so on.

I agree with Andres that we shouldn't put too much effort into
modelling concurrency ahead of optimizing serial performance. The
machine's *aggregate* memory bandwidth should be used as efficiently
as possible, and parallelism is just one (very important) tool for
making that happen.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

pgsql-hackers by date:

From: Andres Freund
Date: 06 May 2017, 03:24:06
Subject: Re: [HACKERS] modeling parallel contention (was: Parallel Appendimplementation)

From: "MauMau"
Date: 06 May 2017, 04:07:18
Subject: Re: [HACKERS] [patch] Build pgoutput with MSVC

Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers

Previous

Next