Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: Sorting Improvements for 8.4
Date
Msg-id D425483C2C5C9F49B5B7A41F8944154701000B4A@postal.corporate.connx.com
Whole thread Raw
In response to Re: Sorting Improvements for 8.4  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Sorting Improvements for 8.4  (Greg Smith <gsmith@gregsmith.com>)
Re: Sorting Improvements for 8.4  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Gregory Stark
> Sent: Wednesday, December 19, 2007 5:26 PM
> To: Jeff Davis
> Cc: Mark Mielke; Michał Zaborowski; Simon Riggs; Ron Mayer; pgsql-
> hackers@postgresql.org
> Subject: Re: [HACKERS] Sorting Improvements for 8.4
>
>
> "Jeff Davis" <pgsql@j-davis.com> writes:
>
> > test=> explain analyze select * from sorter order by t;
> > test=> explain analyze select * from sorter order by b;
> > test=> explain analyze select * from sorter order by f;
> >
> > On my machine this table fits easily in memory (so there aren't any disk
> > reads at all). Sorting takes 7 seconds for floats, 9 seconds for binary
> > data, and 20 seconds for localized text. That's much longer than it
> > would take to read that data from disk, since it's only 70MB (which
> > takes a fraction of a second on my machine).
> >
> > I think this disproves your hypothesis that sorting happens at disk
> > speed.
>
> I suspect most of that is spent just copying the data around. Which would
> not
> be helped by having multiple threads doing the copying -- and in fact
> might be
> exacerbated if it required an extra copy to consolidate all the data in
> the
> end.

Benchmarking a single system will really only explain that system.
Someone may have a disk farm with 2GB/Sec throughput:
http://www.sql-server-performance.com/articles/per/system_storage_configuration_p1.aspx

But such a configuration is very unlikely.

Someone may have 10GB/S NIC cards, but those too, are rare.

So for any benchmark, we will really just end up with a number for that system.

Typically, disk is the bottleneck.
I found this on the net somewhere, but it's quite a useful table for capacity planning (to find the weak link in the
chainusing back of the envelope calculations): 

Interface    Width    Frequency    Bytes/Sec    Bits/Sec
4-way interleaved PC1600 (DDR200) SDRAM    4 x 64bits    100 MHz DDR    6.4 GB/s    51 Gbps
Opteron HyperTransport memory bus    128bits    200 MHz DDR    6.4 GB/s    51 Gbps
Pentium 4 "800 MHz" FSB    64bits    200 MHz QDR    6.4 GB/s    51 Gbps
PC2 6400 (DDR-II 800) SDRAM    64bits    400 MHz DDR    6.4 GB/s    51 Gbps
PC2 5300 (DDR-II 667) SDRAM    64bits    333 MHz DDR    5.3 GB/s    43 Gbps
Pentium 4 "533 MHz" FSB    64bits    133 MHz QDR    4.3 GB/s    34 Gbps
PC2 4300 (DDR-II 533) SDRAM    64bits    266 MHz DDR    4.3 GB/s    34 Gbps
2-channel PC1066 RDRAM    2 x 16bits    533 MHz DDR    4.3 GB/s    34 Gbps
PCI-X 533    64bits    533 MHz    4.3 GB/s    34 Gbps
PCI-Express x16    serial/16lanes    2.5 GHz    4 GB/s    32 Gbps
Pentium 4 "400 MHz" FSB    64bits    100 MHz QDR    3.2 GB/s    25.6 Gbps
2-channel PC800 RDRAM    2 x 16bits    400 MHz DDR    3.2 GB/s    25.6 Gbps
2-way interleaved PC1600 (DDR200) SDRAM    2 x 64bits    100 MHz DDR    3.2 GB/s    25.6 Gbps
PC2 3200 (DDR-II 400) SDRAM    64bits    200 MHz DDR    3.2 GB/s    25.6 Gbps
PC3200 (DDR400) SDRAM    64bits    200 MHz DDR    3.2 GB/s    25.6 Gbps
PC2700 (DDR333) SDRAM    64bits    167 MHz DDR    2.7 GB/s    21 Gbps
PC2100 (DDR266) SDRAM    64bits    133 MHz DDR    2.1 GB/s    17 Gbps
AGP 8x    32bits    533 MHz    2.1 GB/s    17 Gbps
PCI-X 266    64bits    266 MHz    2.1 GB/s    17 Gbps
PCI-Express x8    serial/8lanes    2.5 GHz    2 GB/s    16 Gbps
EV6 bus (Athlon/Duron FSB)    64bits    100 MHz DDR    1.6 GB/s    13 Gbps
PC1600 (DDR200) SDRAM    64bits    100 MHz DDR    1.6 GB/s    13 Gbps
PC800 RDRAM    16bits    400 MHz DDR    1.6 GB/s    13 Gbps
PC150 SDRAM    64bits    150 MHz    1.3 GB/s    10.2 Gbps
10 gigabit ethernet    serial    10 GHz    1.25 GB/s    10 Gbps
OC-192    serial    9.953 GHz    1.24 GB/s    9.953 Gbps
133 MHz FSB    64bits    133 MHz    1.06 GB/s    8.5 Gbps
PC133 SDRAM    64bits    133 MHz    1.06 GB/s    8.5 Gbps
AGP 4x    32bits    266 MHz    1.06 GB/s    8.5 Gbps
PCI-X    64bits    133 MHz    1.06 GB/s    8.5 Gbps
PCI-Express x4    serial/4lanes    2.5 GHz    1 GB/s    8 Gbps
100 MHz FSB    64bits    100 MHz    800 MB/s    6.4 Gbps
PC100 SDRAM    64bits    100 MHz    800 MB/s    6.4 Gbps
PC66 SDRAM    64bits    66 MHz    533 MB/s    4.3 Gbps
fast/wide PCI    64bits    66 MHz    533 MB/s    4.3 Gbps
AGP 2x    32bits    133 MHz    533 MB/s    4.3 Gbps
single-link DVI    12bits    165 MHz DDR    495 MB/s    3.96 Gbps
Ultra-320 SCSI    16bits    160 MHz    320 MB/s    2.6 Gbps
OC-48 network    serial    2.488 GHz    311 MB/s    2.488 Gbps
AGP    32bits    66 MHz    266 MB/s    2.1 Gbps
PCI-Express x1    serial    2.5 GHz    250 MB/s    2 Gbps
Serial ATA/1500 disk    serial    1.5 GHz    187 MB/s    1.5 Gbps
Ultra-160 SCSI    16bits    80 MHz    160 MB/s    1.3 Gbps
OC-24 network    serial    1.244 GHz    155 MB/s    1.244 Gbps
PCI    32bits    33 MHz    133 MB/s    1.06 Gbps
ATA/133 disk    8bits    66 MHz DDR    133 MB/s    1.06 Gbps
gigabit ethernet    serial    1 GHz    125 MB/s    1 Gbps
ATA/100 disk    8bits    50 MHz DDR    100 MB/s    800 Mbps
IEEE 1394b    serial    800 MHz    100 MB/s    800 Mbps
Ultra-2 Wide SCSI    16bits    40 MHz    80 MB/s    640 Mbps
OC-12 network    serial    622.08 MHz    77.7 MB/s    622.08 Mbps
ATA/66 disk    8bits    33 MHz DDR    66 MB/s    533 Mbps
USB-2    serial    480 MHz    60 MB/s    480 Mbps
IEEE 1394    serial    400 MHz    50 MB/s    400 Mbps
Ultra Wide SCSI    16bits    20 MHz    40 MB/s    320 Mbps
ATA/33 disk    8bits    16.6 MHz DDR    33 MB/s    266 Mbps
Fast Wide SCSI    16bits    10 MHz    20 MB/s    160 Mbps
OC-3 network    serial    155.52 MHz    19.4 MB/s    155.52 Mbps
100baseT ethernet    serial    100 MHz    12.5 MB/s    100 Mbps
OC-1 network    serial    51.84 MHz    6.5 MB/s    51.84 Mbps
T-3 network    serial    45 MHz    5.6 MB/s    44.736 Mbps
USB    serial    12 MHz    1.5 MB/s    12 Mbps
10baseT ethernet    serial    10 MHz    1.25 MB/s    10 Mbps
IrDA-2    serial    4 MHz    500 KB/s    4 Mbps
T-1 network    serial    1.5 MHz    193 KB/s    1.544 Mbps

> How long does a "explain analyze sinmple select * from sorter" take?
>
> And assuming you're doing disk sorts (in disk cache) you're doing quite a
> lot
> of copying to temporary files (in disk cache) and then back to memory.
>
>
> Note that speeding up a query from 20s to 5s isn't terribly useful. If
> it's
> OLTP you can't be using all your cores for each user anyways. And if it's
> DSS
> 20s isn't a problem.

Unless (of course) there are 20,000 users doing the queries that would take 20 seconds but now they take 5 (when run
single-user). They will still have a bit of a wait, of course. 
> Where parallel processing like this becomes attractive is when you're
> running
> a 2 hour query on a machine sequentially running scheduled batch jobs
> which
> can be sped up to 30 minutes. But in that case you're almost certainly
> being
> limited by your disk bandwidth, not your cpu speed.

A linear speedup of 2 or more is always worth while[*].  Since sorting (e.g. for 'group by' and 'order by') and sort
joinsare a major database task, I guess that a linear speedup by a factor of 2 might make the database operations on
thewhole be 10% faster or so {OK, it's a SWAG}.  I guess it would look good on the benchmarks, if nothing else. 

[*] unless it is already fast enough.  If, at peak load a query takes 1 ms, then making the query take 0.5 ms is not
goingto win you any medals, especially if the improvement costs $10,000. 



pgsql-hackers by date:

Previous
From: Decibel!
Date:
Subject: Re: New style of hash join proposal
Next
From: Greg Smith
Date:
Subject: Re: Sorting Improvements for 8.4