Re: [HACKERS] A Better External Sort? - Mailing list pgsql-performance

From Jeffrey W. Baker
Subject Re: [HACKERS] A Better External Sort?
Date
Msg-id 1127968040.8954.12.camel@noodles
Whole thread Raw
In response to Re: [HACKERS] A Better External Sort?  (Ron Peacetree <rjpeace@earthlink.net>)
Responses Re: [HACKERS] A Better External Sort?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
On Wed, 2005-09-28 at 12:03 -0400, Ron Peacetree wrote:
> >From: "Jeffrey W. Baker" <jwbaker@acm.org>
> >Sent: Sep 27, 2005 1:26 PM
> >To: Ron Peacetree <rjpeace@earthlink.net>
> >Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
> >
> >On Tue, 2005-09-27 at 13:15 -0400, Ron Peacetree wrote:
> >
> >>That Btree can be used to generate a physical reordering of the data
> >>in one pass, but that's the weakest use for it.  The more powerful
> >>uses involve allowing the Btree to persist and using it for more
> >>efficient re-searches or combining it with other such Btrees (either as
> >>a step in task distribution across multiple CPUs or as a more efficient
> >>way to do things like joins by manipulating these Btrees rather than
> >>the actual records.)
> >
> >Maybe you could describe some concrete use cases.  I can see what
> >you are getting at, and I can imagine some advantageous uses, but
> >I'd like to know what you are thinking.
> >
> >Specifically I'd like to see some cases where this would beat sequential
> >scan.  I'm thinking that in your example of a terabyte table with a
> >column having only two values, all the queries I can think of would be
> >better served with a sequential scan.
> >
> In my original example, a sequential scan of the 1TB of 2KB or 4KB
> records, => 250M or 500M records of data, being sorted on a binary
> value key will take ~1000x more time than reading in the ~1GB Btree
> I described that used a Key+RID (plus node pointers) representation
> of the data.

You are engaging in a length and verbose exercise in mental
masturbation, because you have not yet given a concrete example of a
query where this stuff would come in handy.  A common, general-purpose
case would be the best.

We can all see that the method you describe might be a good way to sort
a very large dataset with some known properties, which would be fine if
you are trying to break the terasort benchmark.  But that's not what
we're doing here.  We are designing and operating relational databases.
So please explain the application.

Your main example seems to focus on a large table where a key column has
constrained values.  This case is interesting in proportion to the
number of possible values.  If I have billions of rows, each having one
of only two values, I can think of a trivial and very fast method of
returning the table "sorted" by that key: make two sequential passes,
returning the first value on the first pass and the second value on the
second pass.  This will be faster than the method you propose.

I think an important aspect you have failed to address is how much of
the heap you must visit after the sort is complete.  If you are
returning every tuple in the heap then the optimal plan will be very
different from the case when you needn't.

-jwb

PS: Whatever mailer you use doesn't understand or respect threading nor
attribution.  Out of respect for the list's readers, please try a mailer
that supports these 30-year-old fundamentals of electronic mail.


pgsql-performance by date:

Previous
From: Gavin Sherry
Date:
Subject: Re: Comparative performance
Next
From: "Jeffrey W. Baker"
Date:
Subject: Sequential I/O Cost (was Re: A Better External Sort?)