Re: Memory usage during sorting - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Memory usage during sorting
Date
Msg-id CAEYLb_Uhwz0BRQR42GJvmcUtSxDW1WYOzD5fOzE-3b1jPrCGpQ@mail.gmail.com
Whole thread Raw
In response to Re: Memory usage during sorting  (Greg Stark <stark@mit.edu>)
Responses Re: Memory usage during sorting  (Peter Geoghegan <peter@2ndquadrant.com>)
List pgsql-hackers
On 14 April 2012 13:32, Greg Stark <stark@mit.edu> wrote:
> On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Well, timsort is specifically designed to take advantage of pre-sorted
>> data. It does appear to have a lot of traction, as wikipedia points
>> out:
>
> I hadn't heard of it. But reading up on it it does seem like a good
> fit for us. It trades some additional storage -- an array of pointers
> into the sort array where in our case the pointers would be much
> smaller than a whole SortTuple -- for fewer comparisons -- which in
> our case are often much slower than a simple integer comparison.

I wouldn't go so far as to suggest getting rid of quicksort, of
course. Quicksort is generally faster than other average case O(n log
n) algorithms in practice, for various reasons, principal among them
that it takes advantage of memory locality so well. I don't think that
it's a coincidence that timsort is popular in Python and Java land.
Even the Python C implementation has to sort integers through all that
PyObject reference indirection, I think. I'd now speculate that an
appropriate use of this algorithm might be to simply use it with types
that don't have a SortSupport function, that are largely passed by
reference, and have expensive comparisons. FWIW, I started playing
with adding timsort to Postgres last night:

https://github.com/Peter2ndQuadrant/postgres/tree/timsort

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: column name of pg_stat_replication.backend_start
Next
From: Peter Geoghegan
Date:
Subject: Re: Patch: add timing of buffer I/O requests