Thread: RE: [HACKERS] sort on huge table

RE: [HACKERS] sort on huge table

From
"Ansley, Michael"
Date:
Now that's a close to linear as you are going to get.  Pretty good I think:
a sort of one billion rows in half an hour.

Mikea

>> -----Original Message-----
>> From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
>> Sent: Thursday, November 04, 1999 10:30 AM
>> To: Tom Lane
>> Cc: t-ishii@sra.co.jp; pgsql-hackers@postgreSQL.org
>> Subject: Re: [HACKERS] sort on huge table 
>> 
>> 
>> >
>> >Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> >> I have compared current with 6.5 using 1000000 
>> tuple-table (243MB) (I
>> >> wanted to try 2GB+ table but 6.5 does not work in this case). The
>> >> result was strange in that current is *faster* than 6.5!
>> >
>> >> RAID5
>> >>     current    2:29
>> >>     6.5.2    3:15
>> >
>> >> non-RAID
>> >>     current    1:50
>> >>     6.5.2    2:13
>> >
>> >> Seems my previous testing was done in wrong way or the behavior of
>> >> sorting might be different if the table size is changed?
>> >
>> >Well, I feel better now, anyway ;-).  I thought that my first cut
>> >ought to have been about the same speed as 6.5, and after I added
>> >the code to slurp up multiple tuples in sequence, it should've been
>> >faster than 6.5.  The above numbers seem to be in line with that
>> >theory.  Next question: is there some additional effect that comes
>> >into play once the table size gets really huge?  I am thinking maybe
>> >there's some glitch affecting performance once the temp file size
>> >goes past one segment (1Gb).  Tatsuo, can you try sorts of say
>> >0.9 and 1.1 Gb to see if something bad happens at 1Gb?  I could
>> >try rebuilding here with a small RELSEG_SIZE, but right at the
>> >moment I'm not certain I'd see the same behavior you do...
>> 
>> Ok. I have run some testings with various amount of data.
>> 
>> RedHat Linux 6.0
>> Kernel 2.2.5-smp
>> 512MB RAM
>> Sort mem: 80MB
>> RAID5
>> 
>> 100 million tuples    1:31
>> 200            4:24
>> 300            7:27
>> 400            11:11  <-- 970MB
>> 500            14:01  <-- 1.1GB (segmented files)
>> 600            18:31
>> 700            22:24
>> 800            24:36
>> 900            28:12
>> 1000            32:14
>> 
>> I didn't see any bad thing at 1.1GB (500 million).
>> --
>> Tatsuo Ishii
>> 
>> ************
>> 


Re: [HACKERS] sort on huge table

From
Tatsuo Ishii
Date:
>Now that's a close to linear as you are going to get.  Pretty good I think:
>a sort of one billion rows in half an hour.

Oops! It's not one billion but 10 millions. Sorry.

1 million tuples    1:31
2            4:24
3            7:27
4            11:11  <-- 970MB
5            14:01  <-- 1.1GB (segmented files)
6            18:31
7            22:24
8            24:36
9            28:12
10            32:14
--
Tatsuo Ishii


Re: [HACKERS] sort on huge table

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Now that's a close to linear as you are going to get.  Pretty good I think:
>> a sort of one billion rows in half an hour.

> Oops! It's not one billion but 10 millions. Sorry.

Actually, the behavior ought to be O(N log N) not O(N).

With a little bit of arithmetic, we get

MTuples    Time    Sec    Delta    Sec/    MTuple/sec
            sec    MTuple

1    1:31    91    91    91    0.010989
2    4:24    264    173    132    0.00757576
3    7:27    447    183    149    0.00671141
4    11:11    671    224    167.75    0.00596125
5    14:01    841    170    168.2    0.0059453
6    18:31    1111    270    185.167    0.00540054
7    22:24    1344    233    192    0.00520833
8    24:36    1476    132    184.5    0.00542005
9    28:12    1692    216    188    0.00531915
10    32:14    1934    242    193.4    0.00517063

which is obviously nonlinear.  Column 5 should theoretically be a log(N)
curve, and it's not too hard to draw one that matches it pretty well
(see attached plot).

It's pretty clear that we don't have any specific problem with the
one-versus-two-segment issue, which is good (I looked again at the code
and couldn't see any reason for such a problem to exist).  But there's
still the question of whether the old code is faster.

Tatsuo, could you run another set of numbers using 6.5.* and the same
test conditions, as far up as you can get with 6.5?  (I think you ought
to be able to reach 2Gb, though not pass it, so most of this curve
can be compared to 6.5.)

            regards, tom lane


Attachment