Re: [HACKERS] sort on huge table - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] sort on huge table
Date
Msg-id 22958.941730614@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] sort on huge table  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-hackers
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Now that's a close to linear as you are going to get.  Pretty good I think:
>> a sort of one billion rows in half an hour.

> Oops! It's not one billion but 10 millions. Sorry.

Actually, the behavior ought to be O(N log N) not O(N).

With a little bit of arithmetic, we get

MTuples    Time    Sec    Delta    Sec/    MTuple/sec
            sec    MTuple

1    1:31    91    91    91    0.010989
2    4:24    264    173    132    0.00757576
3    7:27    447    183    149    0.00671141
4    11:11    671    224    167.75    0.00596125
5    14:01    841    170    168.2    0.0059453
6    18:31    1111    270    185.167    0.00540054
7    22:24    1344    233    192    0.00520833
8    24:36    1476    132    184.5    0.00542005
9    28:12    1692    216    188    0.00531915
10    32:14    1934    242    193.4    0.00517063

which is obviously nonlinear.  Column 5 should theoretically be a log(N)
curve, and it's not too hard to draw one that matches it pretty well
(see attached plot).

It's pretty clear that we don't have any specific problem with the
one-versus-two-segment issue, which is good (I looked again at the code
and couldn't see any reason for such a problem to exist).  But there's
still the question of whether the old code is faster.

Tatsuo, could you run another set of numbers using 6.5.* and the same
test conditions, as far up as you can get with 6.5?  (I think you ought
to be able to reach 2Gb, though not pass it, so most of this curve
can be compared to 6.5.)

            regards, tom lane


Attachment

pgsql-hackers by date:

Previous
From: Thomas Lockhart
Date:
Subject: Re: [HACKERS] pgaccess 0.98
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] pgaccess 0.98