Re: Memory usage during sorting - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Memory usage during sorting |
Date | |
Msg-id | 8425.1332084300@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Memory usage during sorting (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Memory usage during sorting
Re: Memory usage during sorting Re: Memory usage during sorting Re: Memory usage during sorting Re: Memory usage during sorting |
List | pgsql-hackers |
Jeff Janes <jeff.janes@gmail.com> writes: > On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> Anyway, I think the logtape could use redoing. > The problem there is that none of the files can be deleted until it > was entirely read, so you end up with all the data on disk twice. I > don't know how often people run their databases so close to the edge > on disk space that this matters, but someone felt that that extra > storage was worth avoiding. Yeah, that was me, and it came out of actual user complaints ten or more years back. (It's actually not 2X growth but more like 4X growth according to the comments in logtape.c, though I no longer remember the exact reasons why.) We knew when we put in the logtape logic that we were trading off speed for space, and we accepted that. It's possible that with the growth of hard drive sizes, real-world applications would no longer care that much about whether the space required to sort is 4X data size rather than 1X. Or then again, maybe their data has grown just as fast and they still care. > As a desirable side effect, I think it would mean > that we could dispense with retail palloc and pfree altogether. �We > could just allocate a big chunk of memory, copy tuples into it until > it's full, using a pointer to keep track of the next unused byte, and > then, after writing the run, reset the allocation pointer back to the > beginning of the buffer. �That would not only avoid the cost of going > through palloc/pfree, but also the memory overhead imposed by > bookkeeping and power-of-two rounding. That would be worthwhile, probably. The power-of-2 rounding in palloc is not nice at all for tuplesort's purposes. We've occasionally talked about inventing a different memory context type that is less willing to waste space ... but on the other hand, such an allocator would run slower, so you're right back to making a speed for space tradeoff. If the tuples could be deallocated in bulk then that catch-22 goes away. >> No, it's about reducing the number of comparisons needed to maintain >> the heap property. > That sounds very interesting. I didn't know it was even theoretically > possible to do that. So has somebody found a hole in the n log n lower bound on the cost of comparison-based sorting? I thought that had been proven pretty rigorously. regards, tom lane
pgsql-hackers by date: