Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAM3SWZTtaxXn3P0hkDQh3S7WA_-9_3qfhqRhpZfcBCAQd9OhsA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel tuplesort (for parallel B-Tree index creation)  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Fri, Sep 9, 2016 at 5:22 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> Since it is true that doing so would make it impossible to keep the
> asserts about tupindex in tuplesort_heap_root_displace, I guess it
> depends on how useful those asserts are (ie: how likely it is that
> those conditions could be violated, and how damaging it could be if
> they were). If it is decided the refactor is desirable, I'd suggest
> making the common siftup producedure static inline, to allow
> tuplesort_heap_root_displace to inline and specialize it, since it
> will be called with checkIndex=False and that simplifies the resulting
> code considerably.

Right. I want to keep it as a separate function for all these reasons.
I also think that I'll end up further optimizing what I've called
tuplesort_heap_root_displace in the future, to adopt to clustered
input. I'm thinking of something like Timsort's "galloping mode". What
I've come up with here still needs 2 comparisons and a swap per call
for presorted input. There is still a missed opportunity for clustered
or (inverse) correlated input -- we can make merging opportunistically
skip ahead to determine that the root tape's 100th tuple (say) would
still fit in the root position of the merge minheap. So, immediately
return 100 tuples from the root's tape without bothering to compare
them to anything. Do a binary search to find the best candidate
minheap root before the 100th tuple if a guess of 100 doesn't work
out. Adapt to trends. Stuff like that.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Use LEFT JOINs in some system views in case referenced row doesn
Next
From: Andrew Borodin
Date:
Subject: Re: GiST penalty functions [PoC]