Hi,
Tom Lane wrote:
Gary Cowell <gary_cowell@yahoo.co.uk> writes:
-> Sort (cost=117865.77..119220.13 rows=541741
width=132) (actual time=63623.417..66127.641
rows=541741 loops=1)
This is clearly where the time is going.
sort_mem = 16384
Probably not enough for this problem. The estimated data size is
upwards of 60 meg (132 bytes * half a mil rows); allowing for per-row
overhead I suspect that you'd need sort_mem approaching 100 meg for
a fully-in-memory sort. (Also I'd take the width=132 with a *big*
grain of salt, unless you have reason to know that it's accurate.)
The on-disk sorting algorithm that we use is designed to favor minimum
disk space consumption over speed. It has a fairly nonrandom access
pattern that can be pretty slow if your disks don't have good seek-time
specs.
I don't know whether Oracle's performance advantage is because they're
not swapping the sort to disk at all, or because they use a different
on-disk sort method with a more sequential access pattern.
[... thinks for awhile ...] It seems possible that they may use sort
code that knows it is performing a DISTINCT operation and discards
duplicates on sight. Given that there are only 534 distinct values,
the sort would easily stay in memory if that were happening.
It would be interesting to compare Oracle and PG times for a straight
sort of half a million rows, without the DISTINCT part; that would
give us a clue whether they simply have much better sort technology,
or whether they have a special optimization for sort+unique.
I was tested this situation and found that oracle is working also in this case much faster (in some cases x10 ) compared to pg.
Also by in memory sort oracle is faster but the diferenc is not so big.
So I have oracle 8 and oracle 10 (also pg - it is my primary platform) installed and can run some tests.
I am ready to help in this direction or if you can send any example I will run it and post the result .
regards,
ivan.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend