Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date
Msg-id CA+Tgmoa8_u=HQjpWvfrBdqDRCidyZsF05RC24ogtvPxWHuhCWQ@mail.gmail.com
Whole thread Raw
In response to Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Peter Geoghegan <peter@2ndquadrant.com>)
List pgsql-hackers
On Tue, Apr 24, 2012 at 12:51 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 24 April 2012 16:17, Robert Haas <robertmhaas@gmail.com> wrote:
>> If they are in sorted order with an empty string
>> appended onto the end, it takes about 25 seconds.
>
> That's exactly what I'd have expected, but was surprised to have not
> found with my own test. Perhaps it was same kind of fluke (i.e. a
> re-creatable one - I'm quite confident that my benchmark was not
> methodologically flawed, at least in execution).

Oh, I read your results as showing something quite similar.

>> Based on that, I'm inclined to propose rejiggering things so that the
>> presorted-input check runs only at the top level, and not during any
>> recursive steps.  The theory is that it won't cost much because if the
>> data is randomly ordered we won't do many comparisons before falling
>> out, and that seems to be true.  But the only point of doing it in the
>> recursive steps is to win when the data is partially ordered, and we
>> actually seem to be losing big-time in that case, perhaps because when
>> the data is partially ordered, the presort check will frequently to
>> run through a significant part of the array - wasting cycles - but
>> fall out before it actually gets to the end.
>
> That makes sense to me, but obviously more data is needed here.

What more data do you think is needed?  I've been suspicious of that
code since the first time I looked at it, and I'm now fairly well
convinced that it's full of suckitude.  Honestly, I'm not sure I could
manage to contrive a case where that code wins if I set out to do so.

>> Of course, even if we did that, it might not be as good as your
>> timsort numbers, but that doesn't mean we shouldn't do it...
>
> The frustrating thing about my timsort numbers, as I mentioned in
> reply to Dimitri (He modified the subject a bit, so that might appear
> to be a different thread to you), is that they appear to be almost
> consistently winning when you consider the number of comparisons, but
> in fact lose when you measure the duration of execution or TPS or
> whatever. I expected a certain amount of this, but not enough to
> entirely derail the case for replacing quicksort with timsort when
> sorting a single key of text, which is obviously the really compelling
> case for optimisation here. This situation is only going to be made
> "worse" by the work you've done on SortSupport for text, ...

That is quite baffling.  Have you profiled it at all?

> ...which,
> incidentally, I agree is worthwhile.

Cool, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Next
From: Peter Geoghegan
Date:
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)