On 30 July 2015 at 12:26, Greg Stark <stark@mit.edu> wrote:
On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
True, you need a heap to hold the next tuple from each tape in the merge step. To avoid exceeding work_mem, you'd need to push some tuples from the in-memory array to the tape to make room for that. In practice, though, the memory needed for the merge step's heap is tiny. Even if you merge 1000 tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll need some logic to share/divide the in-memory array between the heap and the "in-memory tail" of the last tape.
It's a bit worse than that because we buffer up a significant chunk of the tape to avoid randomly seeking between tapes after every tuple read. But I think in today's era of large memory we don't need anywhere near the entire work_mem just to buffer to avoid random access. Something simple like a fixed buffer size per tape probably much less than 1MB/tape.
MERGE_BUFFER_SIZE is currently 0.25 MB, but there was benefit seen above that. I'd say we should scale that up to 1 MB if memory allows.
Yes, that could be tiny for small number of runs. I mention it because Heikki's comment that we could use this in the general case would not be true for larger numbers of runs. Number of runs decreases quickly with more memory anyway, so the technique is mostly good for larger memory but certainly not for small memory allocations.
--
Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services