Re: Memory usage during sorting - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Memory usage during sorting
Date
Msg-id CAEYLb_WuP-POiDykip0Dcdy07k3GC5Dn4riK5BFY-JWg8Eq9RA@mail.gmail.com
Whole thread Raw
In response to Re: Memory usage during sorting  (Greg Stark <stark@mit.edu>)
Responses Re: Memory usage during sorting  (Peter Geoghegan <peter@2ndquadrant.com>)
List pgsql-hackers
On 13 April 2012 16:03, Greg Stark <stark@mit.edu> wrote:
> Fwiw it also only holds for comparison sorts. If you can sort your
> items without actually comparing each item with the others then you
> aren't bound by it at all. Notably algorithms like radix sort and
> direct sort aren't bound by it and are O(n). I'm hoping to look at
> trying some of these for integer sorts when they apply using the new
> sort specialization code Peter added.

Actually, though a later revision of the patch did nominally allow for
user-defined sorting functions through the SortSupport infrastructure
(I didn't intend that it would be complete/useful enough to be really
worth documenting), the version that was finally committed did not.
However, there is a fairly obvious entry point for a radix sort, which
is here:
        /*         * We were able to accumulate all the tuples within the allowed         * amount of memory.  Just
qsort'em and we're done.         */        if (state->memtupcount > 1)        {            /* Can we use the single-key
sortfunction? */            if (state->onlyKey != NULL)                qsort_ssup(state->memtuples, state->memtupcount,
                         state->onlyKey);            else                qsort_tuple(state->memtuples,
         state->memtupcount,                            state->comparetup,                            state);        } 

The only specialisation that occurs here is between tuple sorts of a
single key and multiple (type-specific specialisations were ultimately
not judged to be worth the increase in binary size relative to their
diminishing benefits).  You'd probably set-up a type-specific
positional notation machinery within the state's SortSupport struct
(the type-specific abstraction through which we elide the SQL function
call machinery for types that support it).

One insight that I had at the time was that text comparisons where the
c locale isn't used are really rather expensive, and I doubt that
there is too much that can be done to address that directly.  However,
if we were to support timsort, that could substantially cut down on
the number of comparisons for text columns, which could be a real win.
Maybe that would happen through some explicit mechanism, or maybe the
planner could actually infer that it was likely to be optimal to use
timsort based on a high statistical correlation between physical row
ordering and logical ordering of a key column's values.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Improving our clauseless-join heuristics
Next
From: Robert Haas
Date:
Subject: Re: [trivial patch] grammar fixes in doc/src/sgml/high-availability.sgml