Home > mailing lists

Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Parallel tuplesort (for parallel B-Tree index creation)
Date	September 6, 2016 07:08:59
Msg-id	286647c2-761a-0e8b-7db9-ea508c2316ba@iki.fi Whole thread Raw
In response to	Re: Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Parallel tuplesort (for parallel B-Tree index creation)
List	pgsql-hackers

Tree view

On 08/16/2016 03:33 AM, Peter Geoghegan wrote:
> I attach a patch that changes how we maintain the heap invariant
> during tuplesort merging. I already mentioned this over on the
> "Parallel tuplesort, partitioning, merging, and the future" thread. As
> noted already on that thread, this patch makes merging clustered
> numeric input about 2.1x faster overall in one case, which is
> particularly useful in the context of a serial final/leader merge
> during a parallel CREATE INDEX. Even *random* non-C-collated text
> input is made significantly faster. This work is totally orthogonal to
> parallelism, though; it's just very timely, given our discussion of
> the merge bottleneck on this thread.

Nice!

> The patch makes tuplesort merging shift down and displace the root
> tuple with the tape's next preread tuple, rather than compacting and
> then inserting into the heap anew. This approach to maintaining the
> heap as tuples are returned to caller will always produce fewer
> comparisons overall. The new approach is also simpler. We were already
> shifting down to compact the heap within the misleadingly named [2]
> function tuplesort_heap_siftup() -- why not instead just use the
> caller tuple (the tuple that we currently go on to insert) when
> initially shifting down (not the heap's preexisting last tuple, which
> is guaranteed to go straight to the leaf level again)? That way, we
> don't need to enlarge the heap at all through insertion, shifting up,
> etc. We're done, and are *guaranteed* to have performed less work
> (fewer comparisons and swaps) than with the existing approach (this is
> the reason for my optimism about getting this stuff out of the way
> early).

Makes sense.

> This new approach is more or less the *conventional* way to maintain
> the heap invariant when returning elements from a heap during k-way
> merging. Our existing approach is convoluted; merging was presumably
> only coded that way because the generic functions
> tuplesort_heap_siftup() and tuplesort_heap_insert() happened to be
> available. Perhaps the problem was masked by unrelated bottlenecks
> that existed at the time, too.

Yeah, this seems like a very obvious optimization. Is there a standard 
name for this technique in the literature? I'm OK with "displace", or 
perhaps just "replace" or "siftup+insert", but if there's a standard 
name for this, let's use that.

- Heikki

pgsql-hackers by date:

From: Abhijit Menon-Sen
Date: 06 September 2016, 07:07:04
Subject: Re: Proposal for changes to recovery.conf API

From: Michael Paquier
Date: 06 September 2016, 07:11:58
Subject: Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013

Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

Previous

Next