Home > mailing lists

Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date	April 24, 2012 15:17:26
Msg-id	CA+TgmobKFcb6ajVEN8eSnBa78sB3FSwuEWTHXd2x9JC5DOh5OA@mail.gmail.com Whole thread Raw
In response to	Timsort performance, quicksort (was: Re: Memory usage during sorting) (Peter Geoghegan <peter@2ndquadrant.com>)
Responses	Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
List	pgsql-hackers

Tree view

On Wed, Apr 18, 2012 at 9:31 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Thoughts?

Interesting work.  I thought about trying to code up timsort at one
point, but I've been running short of round tuits.

I did some quick tests of quicksort using half a million random
strings.  On my MacBook Pro, if the strings are in random order,
quicksort takes about 12 seconds.  If they are presorted, it takes
about 800 ms.  If they are in sorted order with an empty string
appended onto the end, it takes about 25 seconds.

If I modify gen_qsort_tuple.pl to perform the "presorted" input check
only when n > 40, the for the
presorted-except-the-last-element-is-small test drops from 25 seconds
to about 21.5 seconds, without apparently harming either of the other
two cases.  If I remove the check altogether, the time further drops
to about 13.5 seconds, the time to sort completely-ordered data rises
to about 10-10.5 seconds, and the time to sort randomly ordered data
still doesn't seem to change much.

Based on that, I'm inclined to propose rejiggering things so that the
presorted-input check runs only at the top level, and not during any
recursive steps.  The theory is that it won't cost much because if the
data is randomly ordered we won't do many comparisons before falling
out, and that seems to be true.  But the only point of doing it in the
recursive steps is to win when the data is partially ordered, and we
actually seem to be losing big-time in that case, perhaps because when
the data is partially ordered, the presort check will frequently to
run through a significant part of the array - wasting cycles - but
fall out before it actually gets to the end.

Of course, even if we did that, it might not be as good as your
timsort numbers, but that doesn't mean we shouldn't do it...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Tom Lane
Date: 24 April 2012, 14:11:54
Subject: Re: Usage of planner_ctx

From: Robert Haas
Date: 24 April 2012, 16:00:38
Subject: Re: New sync commit mode remote_write

Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) - Mailing list pgsql-hackers

Previous

Next