Home > mailing lists

Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort" - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Date	August 4, 2015 11:40:45
Msg-id	CAM3SWZQ-ZfoYjCfJoA2w6CCR3xgAqY0of=-DXAfW9Lpgn5UA5g@mail.gmail.com Whole thread Raw
In response to	Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort" (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

On Tue, Aug 4, 2015 at 1:24 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Yeah, something like that. To paraphrase, if I'm now understanding it
> correctly, Peter's idea is:
>
> When all the tuples have been fed to tuplesort, and it's time to perform the
> sort, quicksort all the tuples currently in the heap, ignoring the run
> numbers, and turn the resulting array into another tape. That tape is
> special: it's actually stored completely in memory. It is merged with the
> "real" tapes when tuples are returned from the tuplesort, just like regular
> tapes in TSS_FINALMERGE.

Yeah. I imagine that we'll want to put memory prefetch hints for the
new case, since I've independently shown that that works well for the
in-memory case, which this can be very close to.

My next patch will also include quicksorting of runs after we give up
on heapification (after there is more than one run and it is
established that we cannot use my "quicksort with spillover"
optimization, so there are two or more "real" runs on tape). Once
there is clearly not going to be one huge run (which can happen due to
everything largely being in order, even when work_mem is small), and
once incrementally spilling does not end in time to do a "quicksort
with spillover", then the replacement selection thing isn't too
valuable. Especially with large memory sizes but memory bandwidth +
latency as a bottleneck, which is the norm these days.

This seems simpler than my earlier idea of reusing half the memtuples
array only, and resorting the entire array each time, to have
something that consistently approximates replacement selection but
with quicksorting + batching, which I discussed before.

I have this working, and it takes about a good chunk of the runtime
off a sort that merges 3 runs on one reasonable case tested where
work_mem was 300MB. It went from about 56.6 seconds with master to
35.8 seconds with this new approach when tested just now (this
approach saves no writing of tuples, so it's not as effective as the
original "quicksort with spillover" patch can be, but covers a
fundamentally different case). I just need to clean up the patch, and
see if I missed any further optimizations, but this feels like the way
forward multi-run wise. I think it's worth giving up on replacement
selection style runs after the first run is produced, because that's
where the benefits are, if anywhere.

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Simon Riggs
Date: 04 August 2015, 11:39:28
Subject: Re: FSM versus GIN pending list bloat

From: Shigeru Hanada
Date: 04 August 2015, 11:50:56
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort" - Mailing list pgsql-hackers

Previous

Next