Home > mailing lists

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date	March 1, 2017 15:29:23
Msg-id	CAEepm=38obdA4xgOmfOEuj9PCzPX_xzNgHwBhvqkYEXyzqVbKQ@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Thomas Munro <thomas.munro@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Sat, Feb 11, 2017 at 1:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 9, 2017 at 6:38 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Yes, potentially unbounded in rare case.  If we plan for N batches,
>> and then run out of work_mem because our estimates were just wrong or
>> the distributions of keys is sufficiently skewed, we'll run
>> HashIncreaseNumBatches, and that could happen more than once.  I have
>> a suite of contrived test queries that hits all the various modes and
>> code paths of hash join, and it includes a query that plans for one
>> batch but finishes up creating many, and then the leader exits.  I'll
>> post that to the other thread along with my latest patch series soon.
>
> Hmm, OK.  So that's going to probably require something where a fixed
> amount of DSM can describe an arbitrary number of temp file series.
> But that also means this is an even-more-special-purpose tool that
> shouldn't be deeply tied into parallel.c so that it can run before any
> errors happen.
>
> Basically, I think the "let's write the code between here and here so
> it throws no errors" technique is, for 99% of PostgreSQL programming,
> difficult and fragile.  We shouldn't rely on it if there is some other
> reasonable option.

I'm testing a patch that lets you set up a fixed sized
SharedBufFileSet object in a DSM segment, with its own refcount for
the reason you explained.  It supports a dynamically expandable set of
numbered files, so each participant gets to export file 0, file 1,
file 2 and so on as required, in any order.  I think this should suit
both Parallel Tuplesort which needs to export just one file from each
participant, and Parallel Shared Hash which doesn't know up front how
many batches it will produce.  Not quite ready but I will post a
version tomorrow to get Peter's reaction.

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: Craig Ringer
Date: 01 March 2017, 15:24:38
Subject: Re: [HACKERS] logical decoding of two-phase transactions

From: Thomas Munro
Date: 01 March 2017, 15:40:05
Subject: Re: [HACKERS] WIP: [[Parallel] Shared] Hash

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

Previous

Next