Re: Memory usage during sorting - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: Memory usage during sorting |
Date | |
Msg-id | CAMkU=1y_4UOtJ4GkTec=ZEZZqu23R4VCCwVPJW0=s26-WJ=XCw@mail.gmail.com Whole thread Raw |
In response to | Re: Memory usage during sorting (Greg Stark <stark@mit.edu>) |
List | pgsql-hackers |
On Sat, Mar 17, 2012 at 10:47 AM, Greg Stark <stark@mit.edu> wrote: > On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> But it would mean we have about 1.7x more runs that need to be merged >>> (for initially random data). Considering the minimum merge order is >>> 6, that increase in runs is likely not to lead to an additional level >>> of merging, in which case the extra speed of building the runs would >>> definitely win. But if it does cause an additional level of merge, it >>> could end up being a loss. >> >> That's true, but the real hit to the run size should be quite a bit >> less than 1.7x, because we'd also be using memory more efficiently, >> and therefore more tuples should fit. > > I'm not sure I believe the 1.7x. Keep in mind that even after > starting a new tape we can continue to read in new tuples that belong > on the first tape. So even if you have tuples that are more than N > positions out of place (where N is the size of your heap) as long as > you don't have very many you can keep writing out the first tape for > quite a while. > > I suspect analyzing truly random inputs is also a bit like saying no > compression algorithm can work on average. I'm not sure what you mean here. The "also" suggests the previous discussion was about something other than the assumption of randomness, but that discussion doesn't make much sense unless both paragraphs are about that. Anyway I think keeping best/worst cases in mind is certainly a good thing to do. I've certainly seen train wrecks unfold when people assumed anything could be compressed without verifying it. > Partly sorted data is quite > common and the tapesort algorithm would be able to do a lot of cases > in a single merge that the quicksort and merge would generate a large > number of merges for. This is where some common and credible corpus would be invaluable. > All that said, quicksort and merge would always do no more i/o in > cases where the total number of tuples to be sorted is significantly > less than N^2 since that would be guaranteed to be possible to process > with a single merge pass. (Where "significantly less" has something to > do with how many tuples you have to read in one i/o to be efficient). Unless the tuples are quite large, the number needed for efficient i/o is quite high. > That probably covers a lot of cases, and Postgres already has the > stats to predict when it would happen, more or less. Currently sort makes no attempt to use the planner stats. Would that be a good thing to work on? > Fwiw when I was doing the top-k sorting I did a bunch of experiements > and came up with a rule-of-thumb that our quicksort was about twice as > fast as our heapsort. I'm not sure whether that's a big deal or not in > this case. I've been seeing around 3 fold, and that might go up more with some of the work being done that speeds up qsort but not tape sort. > Keep in mind that the only win would be reducing the cpu > time on a sort where every tuple was being written to disk and read > back. For most users that won't run noticeably faster, just reduce cpu > time consumption. But in many (perhaps most) tape sorts CPU time is most of it. A lot of tape sorts are entirely memory backed. The tuples either never reach disk, or are partially written but never read, instead being served back from the file-systems cache. By changing to a tape sort, you are buying insurance against a bunch of other sorts also running simultaneously, but often the insurance doesn't pay off because those numerous other sorts don't exist and the kernel manages to keep the few ones that do in RAM anyway. One avenue for major surgery on the sorts would be for an entry admission where jobs can negotiable over the memory. Something like allocation by halves, but with a possibility that a job could decide it would rather wait for another sort to finish and its memory to be freed up, than to make do with what is currently available. Cheers, Jeff
pgsql-hackers by date: