Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date
Msg-id CAEYLb_Xr3vj-x7baHGcgymmdarwpuKJeVEYiOi+KQ+vwcit0zA@mail.gmail.com
Whole thread Raw
In response to Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
List pgsql-hackers
On 24 April 2012 22:01, Robert Haas <robertmhaas@gmail.com> wrote:
>> That's exactly what I'd have expected, but was surprised to have not
>> found with my own test. Perhaps it was same kind of fluke (i.e. a
>> re-creatable one - I'm quite confident that my benchmark was not
>> methodologically flawed, at least in execution).
>
> Oh, I read your results as showing something quite similar.

Sorry, I think my quick reading of your e-mail somehow left me with
the impression that the differences were not so pronounced for your
experiment, but indeed they are. Blame it on the fact that I've been
off work for a week, I suppose.

>> That makes sense to me, but obviously more data is needed here.
>
> What more data do you think is needed?  I've been suspicious of that
> code since the first time I looked at it, and I'm now fairly well
> convinced that it's full of suckitude.  Honestly, I'm not sure I could
> manage to contrive a case where that code wins if I set out to do so.

Yeah, I thought that the rationale for introducing the pre-sorted
check, as it appeared in a commit message, was a little weak. I don't
know that I'd go as far as to say that it was full of suckitude. The
worst thing about that code to my mind is that despite being highly
performance critical, it has exactly no comments beyond a brief
description, and the names of variables are arcanely curt.

>>> Of course, even if we did that, it might not be as good as your
>>> timsort numbers, but that doesn't mean we shouldn't do it...
>>
>> The frustrating thing about my timsort numbers, as I mentioned in
>> reply to Dimitri (He modified the subject a bit, so that might appear
>> to be a different thread to you), is that they appear to be almost
>> consistently winning when you consider the number of comparisons, but
>> in fact lose when you measure the duration of execution or TPS or
>> whatever. I expected a certain amount of this, but not enough to
>> entirely derail the case for replacing quicksort with timsort when
>> sorting a single key of text, which is obviously the really compelling
>> case for optimisation here. This situation is only going to be made
>> "worse" by the work you've done on SortSupport for text, ...
>
> That is quite baffling.  Have you profiled it at all?

Yes, I have profiled it, but didn't get much further than actually
producing the figures. gprof output for a backend that had a few of
these queries executed against it is attached, FWIW. I'm not sure that
the information is really very actionable, and I'm aware that oprofile
is preferred here (besides the known deficiencies of gprof, oprofile's
annotated source feature could be particularly useful here), which is
something I might get around to. Just before I decided that this might
not be the best way to spend my time off, but after profiling, I got
as far as doing this:

https://github.com/Peter2ndQuadrant/postgres/commit/cbbbd7b1d4ef82773eb272ee35a483cbf6678756

Experimentally, this appeared to make quite a big difference (at least
for the limited cases that I tried), but not so much as to alter the
practical implications of the outcome. It's not as if I've gone to any
great lengths to try and optimise the code yet though. CPython puts
the value of MIN_MERGE at 64, whereas for Java it's 32. A comment
within the Java implementation for this value reads:

  * This is the minimum sized sequence that will be merged.  Shorter
  * sequences will be lengthened by calling binarySort.  If the entire
  * array is less than this length, no merges will be performed.
  *
  * This constant should be a power of two.  It was 64 in Tim Peter's C
  * implementation, but 32 was empirically determined to work better in
  * this implementation.  In the unlikely event that you set this constant
  * to be a number that's not a power of two, you'll need to change the
...

 * See listsort.txt for a discussion
 * of the minimum stack length required as a function of the length
 * of the array being sorted and the minimum merge sequence length.

I did not determine why, and I do not have any theories as to why the
even lower value of 8 (and consequently, a greater tendency towards
merging rather than performing insertion sorts) was a win for me.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Next
From: Robert Haas
Date:
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)