Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort" - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Date
Msg-id CAM3SWZQ-ZfoYjCfJoA2w6CCR3xgAqY0of=-DXAfW9Lpgn5UA5g@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Tue, Aug 4, 2015 at 1:24 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Yeah, something like that. To paraphrase, if I'm now understanding it
> correctly, Peter's idea is:
>
> When all the tuples have been fed to tuplesort, and it's time to perform the
> sort, quicksort all the tuples currently in the heap, ignoring the run
> numbers, and turn the resulting array into another tape. That tape is
> special: it's actually stored completely in memory. It is merged with the
> "real" tapes when tuples are returned from the tuplesort, just like regular
> tapes in TSS_FINALMERGE.

Yeah. I imagine that we'll want to put memory prefetch hints for the
new case, since I've independently shown that that works well for the
in-memory case, which this can be very close to.

My next patch will also include quicksorting of runs after we give up
on heapification (after there is more than one run and it is
established that we cannot use my "quicksort with spillover"
optimization, so there are two or more "real" runs on tape). Once
there is clearly not going to be one huge run (which can happen due to
everything largely being in order, even when work_mem is small), and
once incrementally spilling does not end in time to do a "quicksort
with spillover", then the replacement selection thing isn't too
valuable. Especially with large memory sizes but memory bandwidth +
latency as a bottleneck, which is the norm these days.

This seems simpler than my earlier idea of reusing half the memtuples
array only, and resorting the entire array each time, to have
something that consistently approximates replacement selection but
with quicksorting + batching, which I discussed before.

I have this working, and it takes about a good chunk of the runtime
off a sort that merges 3 runs on one reasonable case tested where
work_mem was 300MB. It went from about 56.6 seconds with master to
35.8 seconds with this new approach when tested just now (this
approach saves no writing of tuples, so it's not as effective as the
original "quicksort with spillover" patch can be, but covers a
fundamentally different case). I just need to clean up the patch, and
see if I missed any further optimizations, but this feels like the way
forward multi-run wise. I think it's worth giving up on replacement
selection style runs after the first run is produced, because that's
where the benefits are, if anywhere.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: FSM versus GIN pending list bloat
Next
From: Shigeru Hanada
Date:
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)