Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort" - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Date
Msg-id CAM-w4HOSZ_a1jN0SciHs2KDJqMNFyiqb+mMVZrRyvojXG2qeLQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
List pgsql-hackers

On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

True, you need a heap to hold the next tuple from each tape in the merge step. To avoid exceeding work_mem, you'd need to push some tuples from the in-memory array to the tape to make room for that. In practice, though, the memory needed for the merge step's heap is tiny. Even if you merge 1000 tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll need some logic to share/divide the in-memory array between the heap and the "in-memory tail" of the last tape.

It's a bit worse than that because we buffer up a significant chunk of the tape to avoid randomly seeking between tapes after every tuple read. But I think in today's era of large memory we don't need anywhere near the entire work_mem just to buffer to avoid random access. Something simple like a fixed buffer size per tape probably much less than 1MB/tape.

I'm a bit confused where the big win comes from though. Is what's going on that the external sort only exceeded memory by a small amount  so nearly all the tuples are still in memory?


--
greg

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Next
From: Simon Riggs
Date:
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"