Re: [PERFORM] A Better External Sort? - Mailing list pgsql-hackers
From | Jonah H. Harris |
---|---|
Subject | Re: [PERFORM] A Better External Sort? |
Date | |
Msg-id | 36e6829205092705492b0c9fdd@mail.gmail.com Whole thread Raw |
In response to | Re: [PERFORM] A Better External Sort? (Ron Peacetree <rjpeace@earthlink.net>) |
List | pgsql-hackers |
Ron,
Again, if you feel strongly enough about the theory to argue it, I recommend that you spend your time constructively; create an implemenation of it. Citing academics is cool and all, but code speaks louder than theory in this case. As Tom mentioned, this has to be portable. Making assumptions about computing architectures (especially those in the future), is fine for theory, but not practical for something that needs to be maintained in the real-world. Go forth and write thy code.
-Jonah
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
Again, if you feel strongly enough about the theory to argue it, I recommend that you spend your time constructively; create an implemenation of it. Citing academics is cool and all, but code speaks louder than theory in this case. As Tom mentioned, this has to be portable. Making assumptions about computing architectures (especially those in the future), is fine for theory, but not practical for something that needs to be maintained in the real-world. Go forth and write thy code.
-Jonah
On 9/27/05, Ron Peacetree <rjpeace@earthlink.net> wrote:
SECOND ATTEMPT AT POST. Web mailer appears to have
eaten first one. I apologize in advance if anyone gets two
versions of this post.
=r
>From: Tom Lane <tgl@sss.pgh.pa.us >
>Sent: Sep 26, 2005 9:42 PM
>Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
>
>So far, you've blithely assumed that you know the size of a cache line,
>the sizes of L1 and L2 cache,
>
NO. I used exact values only as examples. Realistic examples drawn
from an extensive survey of past, present, and what I could find out
about future systems; but only examples nonetheless. For instance,
Hennessy and Patterson 3ed points out that 64B cache lines are
optimally performing for caches between 16KB and 256KB. The same
source as well as sources specifically on CPU memory hierarchy
design points out that we are not likely to see L1 caches larger than
256KB in the forseeable future.
The important point was the idea of an efficient Key, rather than
Record, sort using a CPU cache friendly data structure with provably
good space and IO characteristics based on a reasonable model of
current and likely future single box computer architecture (although
it would be fairly easy to extend it to include the effects of
networking.)
No apriori exact or known values are required for the method to work.
>and that you are working with sort keys that you can efficiently pack
>into cache lines.
>
Not "pack". "map". n items can not take on more than n values. n
values can be represented in lgn bits. Less efficient mappings can
also work. Either way I demonstrated that we have plenty of space in
a likely and common cache line size. Creating a mapping function
to represent m values in lgm bits is a well known hack, and if we keep
track of minimum and maximum values for fields during insert and
delete operations, we can even create mapping functions fairly easily.
(IIRC, Oracle does keep track of minimum and maximum field
values.)
>And that you know the relative access speeds of the caches and
>memory so that you can schedule transfers,
>
Again, no. I created a reasonable model of a computer system that
holds remarkably well over a _very_ wide range of examples. I
don't need the numbers to be exactly right to justify my approach
to this problem or understand why other approaches may have
downsides. I just have to get the relative performance of the
system components and the relative performance gap between them
reasonably correct. The stated model does that very well.
Please don't take my word for it. Go grab some random box:
laptop, desktop, unix server, etc and try it for yourself. Part of the
reason I published the model was so that others could examine it.
>and that the hardware lets you get at that transfer timing.
>
Never said anything about this, and in fact I do not need any such.
>And that the number of distinct key values isn't very large.
>
Quite the opposite in fact. I went out of my way to show that the
method still works well even if every Key is distinct. It is _more
efficient_ when the number of distinct keys is small compared to
the number of data items, but it works as well as any other Btree
would when all n of the Keys are distinct. This is just a CPU cache
and more IO friendly Btree, not some magical and unheard of
technique. It's just as general purpose as Btrees usually are.
I'm simply looking at the current and likely future state of computer
systems architecture and coming up with a slight twist on how to use
already well known and characterized techniques. not trying to start
a revolution.
I'm trying very hard NOT to waste anyone's time around here.
Including my own
Ron
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
pgsql-hackers by date: