Re: [HACKERS] A Better External Sort? - Mailing list pgsql-performance
From | Jeffrey W. Baker |
---|---|
Subject | Re: [HACKERS] A Better External Sort? |
Date | |
Msg-id | 1127968040.8954.12.camel@noodles Whole thread Raw |
In response to | Re: [HACKERS] A Better External Sort? (Ron Peacetree <rjpeace@earthlink.net>) |
Responses |
Re: [HACKERS] A Better External Sort?
|
List | pgsql-performance |
On Wed, 2005-09-28 at 12:03 -0400, Ron Peacetree wrote: > >From: "Jeffrey W. Baker" <jwbaker@acm.org> > >Sent: Sep 27, 2005 1:26 PM > >To: Ron Peacetree <rjpeace@earthlink.net> > >Subject: Re: [HACKERS] [PERFORM] A Better External Sort? > > > >On Tue, 2005-09-27 at 13:15 -0400, Ron Peacetree wrote: > > > >>That Btree can be used to generate a physical reordering of the data > >>in one pass, but that's the weakest use for it. The more powerful > >>uses involve allowing the Btree to persist and using it for more > >>efficient re-searches or combining it with other such Btrees (either as > >>a step in task distribution across multiple CPUs or as a more efficient > >>way to do things like joins by manipulating these Btrees rather than > >>the actual records.) > > > >Maybe you could describe some concrete use cases. I can see what > >you are getting at, and I can imagine some advantageous uses, but > >I'd like to know what you are thinking. > > > >Specifically I'd like to see some cases where this would beat sequential > >scan. I'm thinking that in your example of a terabyte table with a > >column having only two values, all the queries I can think of would be > >better served with a sequential scan. > > > In my original example, a sequential scan of the 1TB of 2KB or 4KB > records, => 250M or 500M records of data, being sorted on a binary > value key will take ~1000x more time than reading in the ~1GB Btree > I described that used a Key+RID (plus node pointers) representation > of the data. You are engaging in a length and verbose exercise in mental masturbation, because you have not yet given a concrete example of a query where this stuff would come in handy. A common, general-purpose case would be the best. We can all see that the method you describe might be a good way to sort a very large dataset with some known properties, which would be fine if you are trying to break the terasort benchmark. But that's not what we're doing here. We are designing and operating relational databases. So please explain the application. Your main example seems to focus on a large table where a key column has constrained values. This case is interesting in proportion to the number of possible values. If I have billions of rows, each having one of only two values, I can think of a trivial and very fast method of returning the table "sorted" by that key: make two sequential passes, returning the first value on the first pass and the second value on the second pass. This will be faster than the method you propose. I think an important aspect you have failed to address is how much of the heap you must visit after the sort is complete. If you are returning every tuple in the heap then the optimal plan will be very different from the case when you needn't. -jwb PS: Whatever mailer you use doesn't understand or respect threading nor attribution. Out of respect for the list's readers, please try a mailer that supports these 30-year-old fundamentals of electronic mail.
pgsql-performance by date: