Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date
Msg-id CAEYLb_UaXsPcBRq4N=h+Ddq03UX438nPxvOr=6CAhmwQuVJWZQ@mail.gmail.com
Whole thread Raw
In response to Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 24 April 2012 16:17, Robert Haas <robertmhaas@gmail.com> wrote:
> If they are in sorted order with an empty string
> appended onto the end, it takes about 25 seconds.

That's exactly what I'd have expected, but was surprised to have not
found with my own test. Perhaps it was same kind of fluke (i.e. a
re-creatable one - I'm quite confident that my benchmark was not
methodologically flawed, at least in execution).

> Based on that, I'm inclined to propose rejiggering things so that the
> presorted-input check runs only at the top level, and not during any
> recursive steps.  The theory is that it won't cost much because if the
> data is randomly ordered we won't do many comparisons before falling
> out, and that seems to be true.  But the only point of doing it in the
> recursive steps is to win when the data is partially ordered, and we
> actually seem to be losing big-time in that case, perhaps because when
> the data is partially ordered, the presort check will frequently to
> run through a significant part of the array - wasting cycles - but
> fall out before it actually gets to the end.

That makes sense to me, but obviously more data is needed here.

> Of course, even if we did that, it might not be as good as your
> timsort numbers, but that doesn't mean we shouldn't do it...

The frustrating thing about my timsort numbers, as I mentioned in
reply to Dimitri (He modified the subject a bit, so that might appear
to be a different thread to you), is that they appear to be almost
consistently winning when you consider the number of comparisons, but
in fact lose when you measure the duration of execution or TPS or
whatever. I expected a certain amount of this, but not enough to
entirely derail the case for replacing quicksort with timsort when
sorting a single key of text, which is obviously the really compelling
case for optimisation here. This situation is only going to be made
"worse" by the work you've done on SortSupport for text, which,
incidentally, I agree is worthwhile.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Next
From: Josh Berkus
Date:
Subject: Welcome 2012 GSOC students