Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAEepm=38obdA4xgOmfOEuj9PCzPX_xzNgHwBhvqkYEXyzqVbKQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Sat, Feb 11, 2017 at 1:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 9, 2017 at 6:38 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Yes, potentially unbounded in rare case.  If we plan for N batches,
>> and then run out of work_mem because our estimates were just wrong or
>> the distributions of keys is sufficiently skewed, we'll run
>> HashIncreaseNumBatches, and that could happen more than once.  I have
>> a suite of contrived test queries that hits all the various modes and
>> code paths of hash join, and it includes a query that plans for one
>> batch but finishes up creating many, and then the leader exits.  I'll
>> post that to the other thread along with my latest patch series soon.
>
> Hmm, OK.  So that's going to probably require something where a fixed
> amount of DSM can describe an arbitrary number of temp file series.
> But that also means this is an even-more-special-purpose tool that
> shouldn't be deeply tied into parallel.c so that it can run before any
> errors happen.
>
> Basically, I think the "let's write the code between here and here so
> it throws no errors" technique is, for 99% of PostgreSQL programming,
> difficult and fragile.  We shouldn't rely on it if there is some other
> reasonable option.

I'm testing a patch that lets you set up a fixed sized
SharedBufFileSet object in a DSM segment, with its own refcount for
the reason you explained.  It supports a dynamically expandable set of
numbered files, so each participant gets to export file 0, file 1,
file 2 and so on as required, in any order.  I think this should suit
both Parallel Tuplesort which needs to export just one file from each
participant, and Parallel Shared Hash which doesn't know up front how
many batches it will produce.  Not quite ready but I will post a
version tomorrow to get Peter's reaction.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Thomas Munro
Date:
Subject: Re: [HACKERS] WIP: [[Parallel] Shared] Hash