Re: finding medians - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: finding medians
Date
Msg-id D90A5A6C612A39408103E6ECDD77B82906F460@voyager.corporate.connx.com
Whole thread Raw
In response to finding medians  (Josh Burdick <jburdick@gradient.cis.upenn.edu>)
List pgsql-hackers
ACK!  Sorting to find a median is criminal.

"Introduction to Algorithms" by Thomas H. Cormen, Charles E. Leiserson,
Ronald L. Rivest
ISBN: 0262031418
explains the better algorithm very well.

Here is a freely available C++ template (written by me) for a bunch of
statistics (everything *but* the selection problem):
ftp://cap.connx.com/pub/tournament_software/STATS.HPP

It uses this template for improved summation accuracy:
ftp://cap.connx.com/pub/tournament_software/Kahan.Hpp

Here is an outline for selection.  I wrote it in C++, but a rewrite to C
is trivial:

// Quickselect: find Kth smallest of first N items in array A
// recursive routine finds Kth smallest in A[Low..High]
// Etype: must have copy constructor, oeprator=, and operator<
// Nonrecursive driver is omitted.

template < class Etype >
void
QuickSelect (Etype A[], int Low, int High, int k)
{   if (Low + Cutoff > High)       InsertionSort (&A[Low], High - Low + 1);   else   {
// Sort Low, Middle, High       int Middle = (Low + High) / 2;
       if (A[Middle] < A[Low])           Swap (A[Low], A[Middle]);       if (A[High] < A[Low])           Swap (A[Low],
A[High]);      if (A[High] < A[Middle])           Swap (A[Middle], A[High]); 

// Place pivot at Position High-1       Etype Pivot = A[Middle];       Swap (A[Middle], A[High - 1]);

// Begin partitioning       int i, j;       for (i = Low, j = High - 1;;)       {           while (A[++i] < Pivot);
     while (Pivot < A[--j]);           if (i < j)               Swap (A[i], A[j]);           else               break;
    } 

// Restore pivot       Swap (A[i], A[High - 1]);

// Recurse: only this part changes       if (k < i)           QuickSelect (A, Low, i - 1, k);       else if (k > i)
     QuickSelect (A, i + 1, High, k);   } 
}


template < class Etype >
void
QuickSelect (Etype A[], int N, int k)
{   QuickSelect (A, 0, N - 1, k - 1);
}

If you want to use this stuff to improve statistics with vacuum, be my
guest.


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: finding medians
Next
From: "Dann Corbit"
Date:
Subject: Re: finding medians