Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour) - Mailing list pgsql-hackers

From Jonah H. Harris
Subject Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Date
Msg-id 36e682920603021050s4d8a6a02j3c0acf54f19e4881@mail.gmail.com
Whole thread Raw
In response to Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
My introsort is almost complete and its the fastest variant of quicksort I can find, I'll submit it to -patches in the next couple days as-well.

On 3/2/06, Bruce Momjian <pgman@candle.pha.pa.us> wrote:

Added to TODO:

        * Improve port/qsort() to handle sorts with 50% unique and 50% duplicate
          value [qsort]

          This involves choosing better pivot points for the quicksort.


---------------------------------------------------------------------------

Dann Corbit wrote:
>
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> > owner@postgresql.org] On Behalf Of Tom Lane
> > Sent: Wednesday, February 15, 2006 5:22 PM
> > To: Ron
> > Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
> Index
> > behaviour)
> >
> > Ron <rjpeace@earthlink.net> writes:
> > > How are we choosing our pivots?
> >
> > See qsort.c: it looks like median of nine equally spaced inputs (ie,
> > the 1/8th points of the initial input array, plus the end points),
> > implemented as two rounds of median-of-three choices.  With half of
> the
> > data inputs zero, it's not too improbable for two out of the three
> > samples to be zeroes in which case I think the med3 result will be
> zero
> > --- so choosing a pivot of zero is much more probable than one would
> > like, and doing so in many levels of recursion causes the problem.
>
> Adding some randomness to the selection of the pivot is a known
> technique to fix the oddball partitions problem.  However, Bentley and
> Sedgewick proved that every quick sort algorithm has some input set that
> makes it go quadratic (hence the recent popularity of introspective
> sort, which switches to heapsort if quadratic behavior is detected.  The
> C++ template I submitted was an example of introspective sort, but
> PostgreSQL does not use C++ so it was not helpful).
>
> > I think.  I'm not too sure if the code isn't just being sloppy about
> the
> > case where many data values are equal to the pivot --- there's a
> special
> > case there to switch to insertion sort, and maybe that's getting
> invoked
> > too soon.
>
> Here are some cases known to make qsort go quadratic:
> 1. Data already sorted
> 2. Data reverse sorted
> 3. Data organ-pipe sorted or ramp
> 4. Almost all data of the same value
>
> There are probably other cases.  Randomizing the pivot helps some, as
> does check for in-order or reverse order partitions.
>
> Imagine if 1/3 of the partitions fall into a category that causes
> quadratic behavior (have one of the above formats and have more than
> CUTOFF elements in them).
>
> It is doubtful that the switch to insertion sort is causing any sort of
> problems.  It is only going to be invoked on tiny sets, for which it has
> a fixed cost that is probably less that qsort() function calls on sets
> of the same size.
>
> >It'd be useful to get a line-level profile of the behavior of
> > this code in the slow cases...
>
> I guess that my in-order or presorted tests [which often arise when
> there are very few distinct values] may solve the bad partition
> problems.  Don't forget that the algorithm is called recursively.
>
> >                     regards, tom lane
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 3: Have you checked our extensive FAQ?
> >
> >                http://www.postgresql.org/docs/faq
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend



--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: [SQL] Interval subtracting
Next
From: "Zeugswetter Andreas DCP SD"
Date:
Subject: Re: Automatic free space map filling