Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date
Msg-id CA+TgmoY0j_js_qmkqJS+5eFwAQykVrQq9D5kOYp_u6-Ok1+UiQ@mail.gmail.com
Whole thread Raw
In response to Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Tue, Apr 24, 2012 at 12:30 PM, Greg Stark <stark@mit.edu> wrote:
> On Tue, Apr 24, 2012 at 4:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Based on that, I'm inclined to propose rejiggering things so that the
>> presorted-input check runs only at the top level, and not during any
>> recursive steps.
>
> Just a thought. What about running only every nth step. Maybe
> something like every 4th step.

If there's actually some advantage to that, sure.  But so far, it
looks like less is more.

> But actually I'm confused. This seems to be misguided to me. Quicksort
> isn't stable so even if you have a partially sorted data set the
> recursive partitions are going to be at best partially sorted after a
> pivot.

Exactly.  That's why I think we should do it only at the topmost
level, before we've done any pivoting.  Doing it at any lower level is
apparently a waste of energy and counterproductive.

> I haven't walked through it but suspect even your
> all-but-one-sorted data set is not finding
> the data sorted in either partition on the next iteration.

I suspect that, too.  Actually, I'm a bit confused about why it's
picking such terrible pivots.  Our median-of-medians optimization
should be doing better than this, I would think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: New sync commit mode remote_write
Next
From: Robert Haas
Date:
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)