Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] Parallel Hash take II
Date
Msg-id CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Hash take II  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Parallel Hash take II  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] Parallel Hash take II  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Sat, Dec 2, 2017 at 3:46 PM, Andres Freund <andres@anarazel.de> wrote:
> Looking at 0004-Add-shared-tuplestores.patch
>
> Comments:
> - I'd rename mutex to lock. Seems quite possible that we end up with shared
>   lockers too.

Done.

> - Expand "Simple mechanism for sharing tuples between backends." intro
>   comment - that doesn't really seem like a meaningful description of
>   the mechanism. Should probably mention that it's similar to
>   tuplestores etc...

Done.

> - I'm still concerned about the chunking mechanism. How about this
>   sketch of an alternative solution?
>
>   Chunks are always the same length. To avoid having to read the length
>   from disk while holding a lock, introduce continuation chunks which
>   store the amount of space needed by the overlarge tuple at the
>   start. The reading process stays largely the same, except that if a
>   backend reads a chunk that's a continuation, it reads the length of
>   the continuation and skips ahead. That could mean that more than one
>   backend read continuation chunks, but the window is small and there's
>   normally not goign to be that many huge tuples (otherwise things are
>   slow anyway).

Done.

I've also included a simple test harness that can be used to drive
SharedTuplestore independently of Parallel Hash, but that patch is not
for commit.  See example of usage in the commit message.
(Incidentally I noticed that ParallelWorkerNumber is not marked
PGDLLIMPORT so that fails to build on Windows CI.)

> - why are we using a separate hardcoded 32 for sts names? Why not just
>   go for NAMEDATALEN or such?

Done.

> - I'd replace most of the "should's" in comments with "need".

Done.

Another problem I discovered is that v27's way of installing a
different function into ExecProcNode in ExecInitHashJoin() was broken:
it didn't allow for the possibility that there is no DSA area
available due to lack of resources.  ExecInitNode() is too soon to
decide.  My solution is to provide a way for executor nodes to change
their ExecProcNode functions at any later time, which requires a way
for execProcnode.c to redo any wrapper functions.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Boolean partitions syntax
Next
From: Rushabh Lathia
Date:
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)