Re: Memory usage during sorting - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Memory usage during sorting
Date
Msg-id CA+TgmoZy-sYZSbHFdNAE6ZN44FGcaPb4mBuP1fJ-2R0xOghTkQ@mail.gmail.com
Whole thread Raw
In response to Re: Memory usage during sorting  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: Memory usage during sorting
List pgsql-hackers
On Fri, Apr 13, 2012 at 1:15 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 13 April 2012 17:42, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> One insight that I had at the time was that text comparisons where the
>> c locale isn't used are really rather expensive, and I doubt that
>> there is too much that can be done to address that directly.  However,
>> if we were to support timsort, that could substantially cut down on
>> the number of comparisons for text columns, which could be a real win.
>> Maybe that would happen through some explicit mechanism, or maybe the
>> planner could actually infer that it was likely to be optimal to use
>> timsort based on a high statistical correlation between physical row
>> ordering and logical ordering of a key column's values.
>
> Further thoughts:
>
> At the time we committed our own quicksort implementation, based on
> the NetBSD one, we eschewed the optimisation of using insertion sort
> when n is fairly low. This happens to be a very common optimisation,
> so I'm not really super-confident that that was a good decision.
> However, we also added our own optimisation, which is to attempt,
> regardless of the size of n, to ascertain that the array is
> pre-sorted, in which case we don't quicksort at all.

We do use insertion sort for partitions of size < 7.  I assume you are
referring to something else.

One thing that has struck me as odd is that we check for presorted
arrays at every level of recursion.  That is, if we start out with 100
elements, we'll check whether the input is presorted.  Then, if we
partition it into a block of 60 elements and a block of 40 elements,
we'll check each of those partitions over again to see whether they
are presorted.  There is definitely some potential upside here,
because if the input is mostly sorted it may contain runs where
everything is in order, and we'll detect that and avoid some work.
But it's also kind of expensive; I've wondered whether we should, for
example, move the test for presorted input down a bit, so that it only
happens in the n > 40 case.  Another idea is to arrange things so that
we remember the offset at which the last presort check failed.  When
we tail-recurse into the left half of the array, there's no need to
recheck - if the presort check fell out after our partition boundary,
it's presorted; if it fell out before the partition boundary, it
isn't.

> So if we attempt to quicksort an array which is almost pre-sorted, but
> say only has its very last element out-of-order, we'll do fairly
> horribly, because we waste the effort of almost an entire "bubble sort
> iteration". So almost-sorted data would seem to be a compelling thing
> to optimise for that reason, as well as for the simple observation
> that it isn't exactly uncommon in a relational database.

Heap-sorting could also benefit from some optimization in this area.
Right now, if you heap-sort data that's already in order, it gets
progressively slower as you crank up work_mem, because the heap gets
deeper and so extraction and siftup get more expensive, for no real
benefit.  We could do something like check, every time we use up our
available memory, whether the heap is already entirely in order.  If
so, we could dump all but one tuple to the current run and the start
reading tuples again.  Or maybe dump some percentage of the array,
though that might require a bit more bookkeeping.  Or maybe there's a
smarter algorithm that would also cater to mostly-sorted data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: 9.2 release notes, beta time?
Next
From: Ozgun Erdogan
Date:
Subject: Command counter increment vs updating an active snapshot