Re: No merge sort? - Mailing list pgsql-hackers

From cbbrowne@cbbrowne.com
Subject Re: No merge sort?
Date
Msg-id 20030414165257.D226D56FE4@cbbrowne.com
Whole thread Raw
In response to Re: No merge sort?  ("Ron Peacetree" <rjpeace@earthlink.net>)
List pgsql-hackers
Ron Peacetree wrote:
> ""Dann Corbit"" <DCorbit@connx.com> wrote in message
> news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co
> m...
> > > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20
> > > Sent: Wednesday, April 09, 2003 12:38 PM
> > > You only need as many bins as you have unique key values=20
> > > silly ;-) Remember, at its core Radix sort is just a=20
> > > distribution counting sort (that's the name I learned for the=20
> > > general technique).  The simplest implementation uses bits as=20
> > > the atomic unit, but there's nothing saying you have to...=20=20
> > > In a DB, we know all the values of the fields we currently=20
> > > have stored in the DB.  We can take advantage of that.
> >
> > By what magic do we know this?  If a database knew a-priori what all
> the
> > distinct values were, it would indeed be excellent magic.
> For any table already in the DB, this is self evident.  If you are
> talking about sorting data =before= it gets put into the DB (say for a
> load), then things are different, and the best you can do is used
> comparison based methods (and probably some reasonably sophisticated
> external merging routines in addition if the data set to be sorted and
> loaded is big enough).  The original question as I understood it was
> about a sort as part of a query.  That means everything to be sorted
> is in the DB, and we can take advantage of what we know.
> 
> > With 80 bytes, you have 2^320 possible values.  There is no way
> > around that.  If you are going to count them or use them as a
> > radix, you will have to classify them.  The only way you will
> > know how many unique values you have in
> > "Company+Division" is to ...
> > Either sort them or by some means discover all that are distinct

> Ummm, Indexes, particularly Primary Key Indexes, anyone?  Finding the
> unique values in an index should be perceived as trivial... ...and you
> often have the index in memory for other reasons already.

But this does not remove the fact that a radix sort on an 80 byte field
requires O(2^320) space, that is, O(N), where N is the size of the state
space of the key...

Perhaps you need to reread Gonnet; it tells you that...
--
output = reverse("moc.enworbbc@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/multiplexor.html
Rules of  the Evil  Overlord #28. "My  pet monster  will be kept  in a
secure cage  from which it  cannot escape and  into which I  could not
accidentally stumble." <http://www.eviloverlord.com/>



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [GENERAL] Problem about pgsql's column alias
Next
From: "John Gray"
Date:
Subject: Re: How do you execute a postgresql function from perl?