Thread: Major differences between oracle and postgres performance - what can I do ?
I'm trying to migrate an application from an Oracle backend to PostgreSQL and have a performance question. The hardware for the database is the same, a SunFire v120, 2x73GB U2W SCSI disks, 1GB RAM, 650MHz US-IIe CPU. Running Solaris 8. The table in question has 541741 rows. Under Oracle, the query ' select distinct version from vers where version is not null ' returns 534 rows in 6.14 seconds, with an execution plan showing a table scan of vers followed by a sort. The explain output on postgres shows the same execution with a scan on vers and a sort but the query time is 78.6 seconds. The explain output from PostgreSQL is: QUERY PLAN --------------------------------------------------------------------------------- Unique (cost=117865.77..120574.48 rows=142 width=132) -> Sort (cost=117865.77..119220.13 rows=541741 width=132) Sort Key: "version" -> Seq Scan on vers (cost=0.00..21367.41 rows=541741 width=132) Filter: ("version" IS NOT NULL) I do have an index on the column in question but neither oracle nor postgresql choose to use it (which given that we're visiting all rows is perhaps not surprising). I'm not as familiar with postgresql as I am with Oracle but I think I've configured comparible buffering and sort area sizes, certainly there isn't much physical IO going on in either case. What can I do to speed up this query? Other queries are slightly slower than under Oracle on the same hardware but nothing like this. Thanks! G ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to expressyourself http://uk.messenger.yahoo.com
Gary Cowell wrote: > > I'm not as familiar with postgresql as I am with > Oracle but I think I've configured comparible > buffering and sort area sizes, certainly there isn't > much physical IO going on in either case. People are going to want to know: 1. version of PG 2. explain analyse output, rather than just explain 3. What values you've used for the postgresql.conf file The actual plan from explain analyse isn't going to be much use - as you say, a scan of the whole table followed by sorting is the best you'll get. However, the actual costs of these steps might say something useful. -- Richard Huxton Archonet Ltd
Re: Major differences between oracle and postgres performance - what can I do ?
From
Paul Thomas
Date:
On 18/06/2004 12:31 Gary Cowell wrote: > [snip] > I'm not as familiar with postgresql as I am with > Oracle but I think I've configured comparible > buffering and sort area sizes, certainly there isn't > much physical IO going on in either case. > > What can I do to speed up this query? Other queries > are slightly slower than under Oracle on the same > hardware but nothing like this. Usual questions: have you vacuumed the table recently? what are your postgresql.conf settings? can you show us explain ANALYZE output rather than just explain output? -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
On Jun 18, 2004, at 7:31 AM, Gary Cowell wrote: > The explain output on postgres shows the same > execution with a scan on vers and a sort but the query > time is 78.6 seconds. > Does it run just as slow if you run it again? It could be a case of the caches being empty > Oracle but I think I've configured comparible > buffering and sort area sizes, certainly there isn't > much physical IO going on in either case. > Configuring PG like Oracle isn't the best thing in the world. The general PG philosophy is to let the OS do all the caching & buffering - this is reversed in the Oracle world. In 7.4 the rule of thumb is no more than 10k shared_buffers.. beyond that the overhead of maintaining it becomes excessive. (This isn't really the case in 7.5) Curiously, what are your sort_mem and shared_buffers settings? -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Gary Cowell wrote: > The explain output on postgres shows the same > execution with a scan on vers and a sort but the query > time is 78.6 seconds. > > The explain output from PostgreSQL is: > QUERY PLAN > --------------------------------------------------------------------------------- > Unique (cost=117865.77..120574.48 rows=142 > width=132) > -> Sort (cost=117865.77..119220.13 rows=541741 > width=132) > Sort Key: "version" > -> Seq Scan on vers (cost=0.00..21367.41 > rows=541741 width=132) > Filter: ("version" IS NOT NULL) > > I do have an index on the column in question but > neither oracle nor postgresql choose to use it (which > given that we're visiting all rows is perhaps not > surprising). Can you post explain analyze for the same query? It contains actual numbers alond side the chosen plan. > > I'm not as familiar with postgresql as I am with > Oracle but I think I've configured comparible > buffering and sort area sizes, certainly there isn't > much physical IO going on in either case. Well, for postgresql you should check out http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html HTH Shridhar
Re: Major differences between oracle and postgres performance - what can I do ?
From
Gary Cowell
Date:
--- lnd@hnit.is wrote: > You can roughly estimate time spent for just scaning > the table using > something like this: > > select sum(version) from ... where version is not > null > > and just > > select sum(version) from ... > > The results would be interesting to compare. To answer (I hope) everyones questions at once: 1) Oracle and postmaster were not running at the same time 2) The queries were run once, to cache as much as possible then run again to get the timing 3) Distinct vs. no distinct (i.e. sort performance). select length(version) from vers where version is not null; Time: 9748.174 ms select distinct(version) from vers where version is not null; Time: 67988.972 ms So about an extra 60 seconds with the distinct on. Here is the explain analyze output from psql: # explain analyze select distinct version from vers where version is not null; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Unique (cost=117865.77..120574.48 rows=142 width=132) (actual time=63623.428..68269.111 rows=536 loops=1) -> Sort (cost=117865.77..119220.13 rows=541741 width=132) (actual time=63623.417..66127.641 rows=541741 loops=1) Sort Key: "version" -> Seq Scan on vers (cost=0.00..21367.41 rows=541741 width=132) (actual time=0.218..7214.903 rows=541741 loops=1) Filter: ("version" IS NOT NULL) Total runtime: 68324.215 ms (6 rows) Time: 68326.062 ms And the non-default .conf parameters: tcpip_socket = true max_connections = 100 password_encryption = true shared_buffers = 2000 sort_mem = 16384 vacuum_mem = 8192 effective_cache_size = 4000 syslog = 2 postgresql version is 7.4.3 compiled with GCC 3.3.2 on sun4u architecture. ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to expressyourself http://uk.messenger.yahoo.com
Hi ,
I have similare problem and found that the problem is by pg sort.
It is extremly slow by me.
Also in my case I tryed to migrate one db from oracle to pg .
To solve this problem I dinamicaly set sort_mem to some big value.
In this case the sort is working into RAM and is relative fast.
You can try this and remember sort mem is per sort, not per connection.
In my migration I found the only advantage for oracle is the very good sort.
regards,
ivan.
Gary Cowell wrote:
I have similare problem and found that the problem is by pg sort.
It is extremly slow by me.
Also in my case I tryed to migrate one db from oracle to pg .
To solve this problem I dinamicaly set sort_mem to some big value.
In this case the sort is working into RAM and is relative fast.
You can try this and remember sort mem is per sort, not per connection.
In my migration I found the only advantage for oracle is the very good sort.
regards,
ivan.
Gary Cowell wrote:
--- lnd@hnit.is wrote: > You can roughly estimate time spent for just scaningthe table using something like this: select sum(version) from ... where version is not null and just select sum(version) from ... The results would be interesting to compare.To answer (I hope) everyones questions at once: 1) Oracle and postmaster were not running at the same time 2) The queries were run once, to cache as much as possible then run again to get the timing 3) Distinct vs. no distinct (i.e. sort performance). select length(version) from vers where version is not null; Time: 9748.174 ms select distinct(version) from vers where version is not null; Time: 67988.972 ms So about an extra 60 seconds with the distinct on. Here is the explain analyze output from psql: # explain analyze select distinct version from vers where version is not null; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------Unique (cost=117865.77..120574.48 rows=142 width=132) (actual time=63623.428..68269.111 rows=536 loops=1) -> Sort (cost=117865.77..119220.13 rows=541741 width=132) (actual time=63623.417..66127.641 rows=541741 loops=1) Sort Key: "version" -> Seq Scan on vers (cost=0.00..21367.41 rows=541741 width=132) (actual time=0.218..7214.903 rows=541741 loops=1) Filter: ("version" IS NOT NULL)Total runtime: 68324.215 ms (6 rows) Time: 68326.062 ms And the non-default .conf parameters: tcpip_socket = true max_connections = 100 password_encryption = true shared_buffers = 2000 sort_mem = 16384 vacuum_mem = 8192 effective_cache_size = 4000 syslog = 2 postgresql version is 7.4.3 compiled with GCC 3.3.2 on sun4u architecture. ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to express yourself http://uk.messenger.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Gary Cowell wrote: > --- lnd@hnit.is wrote: > You can roughly estimate time > spent for just scaning > >>the table using >>something like this: >> >> select sum(version) from ... where version is not >>null >> >> and just >> >> select sum(version) from ... >> >>The results would be interesting to compare. > > > To answer (I hope) everyones questions at once: > > 1) Oracle and postmaster were not running at the same > time > 2) The queries were run once, to cache as much as > possible then run again to get the timing > > 3) Distinct vs. no distinct (i.e. sort performance). > > select length(version) from vers where version is not > null; > > Time: 9748.174 ms > > select distinct(version) from vers where version is > not null; > > Time: 67988.972 ms > > So about an extra 60 seconds with the distinct on. Which is basically the sorting time... > Here is the explain analyze output from psql: > > # explain analyze select distinct version from vers > where version is not null; > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------- > Unique (cost=117865.77..120574.48 rows=142 > width=132) (actual time=63623.428..68269.111 rows=536 > loops=1) > -> Sort (cost=117865.77..119220.13 rows=541741 > width=132) (actual time=63623.417..66127.641 > rows=541741 loops=1) > Sort Key: "version" > -> Seq Scan on vers (cost=0.00..21367.41 > rows=541741 width=132) (actual time=0.218..7214.903 > rows=541741 loops=1) > Filter: ("version" IS NOT NULL) > Total runtime: 68324.215 ms > (6 rows) > > Time: 68326.062 ms Yep - the seq-scan takes 7214.903 ms, there's a huge setup time for the sort (63623.417) and it's not finished until 66127.641ms have elapsed. > > And the non-default .conf parameters: > > tcpip_socket = true > max_connections = 100 > password_encryption = true > shared_buffers = 2000 > sort_mem = 16384 > vacuum_mem = 8192 > effective_cache_size = 4000 > syslog = 2 Well, I'd probably up vacuum_mem, and check how much RAM is being used for disk cache - I'd guess it's more than 32MB (4000 * 8kb). You might want to up the shared_buffers, but that's going to depend on the load. Try increasing sort_mem temporarily, and see if that makes a difference: SET sort_mem = 64000; EXPLAIN ANALYSE ... The only thing I can think is that you're getting disk activity to get a sort that slow. I'd be expecting a hash-sort if PG thought it could fit the distinct values in memory. -- Richard Huxton Archonet Ltd
=?iso-8859-1?q?Gary=20Cowell?= <gary_cowell@yahoo.co.uk> writes: > -> Sort (cost=117865.77..119220.13 rows=541741 > width=132) (actual time=63623.417..66127.641 > rows=541741 loops=1) This is clearly where the time is going. > sort_mem = 16384 Probably not enough for this problem. The estimated data size is upwards of 60 meg (132 bytes * half a mil rows); allowing for per-row overhead I suspect that you'd need sort_mem approaching 100 meg for a fully-in-memory sort. (Also I'd take the width=132 with a *big* grain of salt, unless you have reason to know that it's accurate.) The on-disk sorting algorithm that we use is designed to favor minimum disk space consumption over speed. It has a fairly nonrandom access pattern that can be pretty slow if your disks don't have good seek-time specs. I don't know whether Oracle's performance advantage is because they're not swapping the sort to disk at all, or because they use a different on-disk sort method with a more sequential access pattern. [... thinks for awhile ...] It seems possible that they may use sort code that knows it is performing a DISTINCT operation and discards duplicates on sight. Given that there are only 534 distinct values, the sort would easily stay in memory if that were happening. It would be interesting to compare Oracle and PG times for a straight sort of half a million rows, without the DISTINCT part; that would give us a clue whether they simply have much better sort technology, or whether they have a special optimization for sort+unique. regards, tom lane
Hi,
Tom Lane wrote:
Also by in memory sort oracle is faster but the diferenc is not so big.
So I have oracle 8 and oracle 10 (also pg - it is my primary platform) installed and can run some tests.
I am ready to help in this direction or if you can send any example I will run it and post the result .
regards,
ivan.
Tom Lane wrote:
I was tested this situation and found that oracle is working also in this case much faster (in some cases x10 ) compared to pg.Gary Cowell <gary_cowell@yahoo.co.uk> writes:-> Sort (cost=117865.77..119220.13 rows=541741 width=132) (actual time=63623.417..66127.641 rows=541741 loops=1)This is clearly where the time is going.sort_mem = 16384Probably not enough for this problem. The estimated data size is upwards of 60 meg (132 bytes * half a mil rows); allowing for per-row overhead I suspect that you'd need sort_mem approaching 100 meg for a fully-in-memory sort. (Also I'd take the width=132 with a *big* grain of salt, unless you have reason to know that it's accurate.) The on-disk sorting algorithm that we use is designed to favor minimum disk space consumption over speed. It has a fairly nonrandom access pattern that can be pretty slow if your disks don't have good seek-time specs. I don't know whether Oracle's performance advantage is because they're not swapping the sort to disk at all, or because they use a different on-disk sort method with a more sequential access pattern. [... thinks for awhile ...] It seems possible that they may use sort code that knows it is performing a DISTINCT operation and discards duplicates on sight. Given that there are only 534 distinct values, the sort would easily stay in memory if that were happening. It would be interesting to compare Oracle and PG times for a straight sort of half a million rows, without the DISTINCT part; that would give us a clue whether they simply have much better sort technology, or whether they have a special optimization for sort+unique.
Also by in memory sort oracle is faster but the diferenc is not so big.
So I have oracle 8 and oracle 10 (also pg - it is my primary platform) installed and can run some tests.
I am ready to help in this direction or if you can send any example I will run it and post the result .
regards,
ivan.
regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
Re: Major differences between oracle and postgres performance - what can I do ?
From
Gary Cowell
Date:
> Try increasing sort_mem temporarily, and see if that > makes a difference: > SET sort_mem = 64000; > EXPLAIN ANALYSE ... I did this (actualy 65536) and got the following: pvcsdb=# explain analyze select distinct version from vers where version is not null; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=117865.77..120574.48 rows=142 width=132) (actual time=81595.178..86573.228 rows=536 loops=1) -> Sort (cost=117865.77..119220.13 rows=541741 width=132) (actual time=81595.169..84412.069 rows=541741 loops=1) Sort Key: "version" -> Seq Scan on vers (cost=0.00..21367.41 rows=541741 width=132) (actual time=10.068..7397.374 rows=541741 loops=1) Filter: ("version" IS NOT NULL) Total runtime: 86647.495 ms (6 rows) In response to Tom Lane, I have compared a select/order by on the same data in Oracle and PG to see if this changes things: PG: Time: 67438.536 ms 541741 rows Oracle: After an hour and a half I canned it So it seems the idea that oracle is dropping duplicate rows prior to the sort when using distinct may indeed be the case. From what I've seen here, it seems that PGs on-disk sort performance is exceeding that of Oracle - it's just that oracle sorts fewer rows for distinct. ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to expressyourself http://uk.messenger.yahoo.com
Re: Major differences between oracle and postgres performance - what can I do ?
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > [... thinks for awhile ...] It seems possible that they may use sort > code that knows it is performing a DISTINCT operation and discards > duplicates on sight. Given that there are only 534 distinct values, > the sort would easily stay in memory if that were happening. Could this optimization be added to PostgreSQL? It sounds like a very reasonable thing to do. Hopefully there wouldn't be too much complexity needed to add it. Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> [... thinks for awhile ...] It seems possible that they may use sort >> code that knows it is performing a DISTINCT operation and discards >> duplicates on sight. Given that there are only 534 distinct values, >> the sort would easily stay in memory if that were happening. > Could this optimization be added to PostgreSQL? It sounds like a very > reasonable thing to do. That's what I was wondering about too. But first I'd like to get some kind of reading on how effective it would be. If someone can demonstrate that Oracle can do sort-and-drop-dups a lot faster than it can do a straight sort of the same amount of input data, that would be a strong indication that it's worth doing. At this point we don't know if that's the source of their win or not. regards, tom lane
Re: Major differences between oracle and postgres performance - what can I do ?
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> [... thinks for awhile ...] It seems possible that they may use sort > >> code that knows it is performing a DISTINCT operation and discards > >> duplicates on sight. Given that there are only 534 distinct values, > >> the sort would easily stay in memory if that were happening. > > > Could this optimization be added to PostgreSQL? It sounds like a very > > reasonable thing to do. > > That's what I was wondering about too. But first I'd like to get > some kind of reading on how effective it would be. If someone can > demonstrate that Oracle can do sort-and-drop-dups a lot faster than > it can do a straight sort of the same amount of input data, that > would be a strong indication that it's worth doing. At this point > we don't know if that's the source of their win or not. Alright, I did a couple tests, these are different systems with different hardware, but in the end I think the difference is clear: tsf=# explain analyze select distinct access_type_id from p_gen_dom_dedicated_swc_access ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=321591.00..333205.56 rows=16 width=10) (actual time=32891.141..37420.429 rows=16 loops=1) -> Sort (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32891.137..35234.810 rows=2322912 loops=1) Sort Key: access_type_id -> Seq Scan on p_gen_dom_dedicated_swc_access (cost=0.00..55492.12 rows=2322912 width=10) (actual time=0.013..3743.470rows=2322912 loops=1) Total runtime: 37587.519 ms (5 rows) tsf=# explain analyze select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32926.696..35278.847 rows=2322912 loops=1) Sort Key: access_type_id -> Seq Scan on p_gen_dom_dedicated_swc_access (cost=0.00..55492.12 rows=2322912 width=10) (actual time=0.014..3753.443rows=2322912 loops=1) Total runtime: 36737.628 ms (4 rows) So, about the same from postgres in each case. From Oracle: (select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id) sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql > /dev/null real 3m55.12s user 2m25.87s sys 0m10.59s (select distinct access_type_id from p_gen_dom_dedicated_swc_access) sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql > /dev/null real 0m5.08s user 0m3.86s sys 0m0.95s All the queries were run multiple times, though there wasn't all that much difference in the times. Both systems are pretty speedy, but I tend to feel the Postgres box is faster in CPU/disk access time, which is probably why the Oracle system took 4 minutes to do what the Postgres systems does in 40 seconds. My only other concern is the Oracle system having to do the write I/O while the postgres one doesn't... I don't see an obvious way to get around that though, and I'm not sure if it'd really make *that* big of a difference. Stephen
Attachment
Re: Major differences between oracle and postgres performance - what can I do ?
From
Stephen Frost
Date:
* Stephen Frost (sfrost@snowman.net) wrote: > systems does in 40 seconds. My only other concern is the Oracle system > having to do the write I/O while the postgres one doesn't... I don't > see an obvious way to get around that though, and I'm not sure if it'd > really make *that* big of a difference. Alright, after talking with some people on #postgresql I found that in Oracle you can do 'set autotrace traceonly', which removes the I/O factor from the Oracle query. Doing this I also discovered that it appears Oracle actually uses an index on that field that it knows about to derive what the distinct results would be. That probably invalidates this test for what we were specifically looking for, but, hey, using the index to figure out what the distinct values for the key are isn't exactly a bad idea. :) Here's the new results: (select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id;) ----------------------------------------------------------------------------------- sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql SQL*Plus: Release 9.2.0.1.0 - Production on Fri Jun 18 14:10:12 2004 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 - Production 2322912 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11459 Card=1303962 B ytes=16951506) 1 0 SORT* (ORDER BY) (Cost=11459 Card=1303962 Bytes=16951506) :Q457001 2 1 TABLE ACCESS* (FULL) OF 'P_GEN_DOM_DEDICATED_SWC_ACCESS' :Q457000 (Cost=1550 Card=1303962 Bytes=16951506) 1 PARALLEL_TO_SERIAL SELECT A1.C0 C0 FROM :Q457000 A1 ORDER BY A1 .C0 2 PARALLEL_TO_PARALLEL SELECT /*+ NO_EXPAND ROWID(A1) */ A1."ACCESS _TYPE_ID" C0 FROM "P_GEN_DOM_DEDICAT Statistics ---------------------------------------------------------- 32 recursive calls 1594 db block gets 64495 consistent gets 105975 physical reads 0 redo size 40109427 bytes sent via SQL*Net to client 1704111 bytes received via SQL*Net from client 154862 SQL*Net roundtrips to/from client 2 sorts (memory) 4 sorts (disk) 2322912 rows processed Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 - Production real 1m38.55s user 0m23.36s sys 0m9.61s ----------------------------------------------------------------------------------- (select distinct access_type_id from p_gen_dom_dedicated_swc_access) ----------------------------------------------------------------------------------- sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql SQL*Plus: Release 9.2.0.1.0 - Production on Fri Jun 18 14:13:54 2004 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 - Production 16 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=44874 Card=1303962 B ytes=16951506) 1 0 SORT (UNIQUE) (Cost=44874 Card=1303962 Bytes=16951506) 2 1 INDEX (FAST FULL SCAN) OF 'TABLE_8111_DUPLICATE_CHECK' ( UNIQUE) (Cost=4 Card=1303962 Bytes=16951506) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 47069 consistent gets 47067 physical reads 0 redo size 841 bytes sent via SQL*Net to client 662 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 16 rows processed Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 - Production real 0m5.36s user 0m0.04s sys 0m0.07s ----------------------------------------------------------------------------------- Stephen
Attachment
=?iso-8859-1?q?Gary=20Cowell?= <gary_cowell@yahoo.co.uk> writes: > So it seems the idea that oracle is dropping duplicate > rows prior to the sort when using distinct may indeed > be the case. Okay. We won't have any short-term solution for making DISTINCT do that, but if you are on PG 7.4 you could get the same effect from using GROUP BY: instead of select distinct version from vers where version is not null try select version from vers where version is not null group by version You should get a HashAggregate plan out of that, and I'd think it'd be pretty quick when there are not many distinct values of version. regards, tom lane
Re: Major differences between oracle and postgres performance - what can I do ?
From
Gary Cowell
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > =?iso-8859-1?q?Gary=20Cowell?= > <gary_cowell@yahoo.co.uk> writes: > > So it seems the idea that oracle is dropping > duplicate > > rows prior to the sort when using distinct may > indeed > > be the case. > > Okay. We won't have any short-term solution for > making DISTINCT do that, > but if you are on PG 7.4 you could get the same > effect from using > GROUP BY: instead of > select distinct version from vers where version is > not null > try > select version from vers where version is not null > group by version > You should get a HashAggregate plan out of that, and > I'd think it'd be > pretty quick when there are not many distinct values > of version. > Yeah out of the half million rows there are only ever going to be 500 or so distinct values. I do indeed get such a plan. It's much faster that way. Down to 16 seconds. I'll get the chap to rewrite his app to use group by instead of distinct. Thanks (everyone) for the top class help! ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to expressyourself http://uk.messenger.yahoo.com
Re: Major differences between oracle and postgres performance - what can I do ?
From
"Mischa Sandberg"
Date:
Don't know about Oracle, but select-distinct in MSSQL2K will indeed throw away duplicates, which chops the CPU time. Very easy to see in the graphic query plan, both in terms of CPU and the number of rows retrieved from a single-node or nested-loop subtree. Definitely a worthwhile optimization. "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message news:22514.1087578111@sss.pgh.pa.us... > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> [... thinks for awhile ...] It seems possible that they may use sort > >> code that knows it is performing a DISTINCT operation and discards > >> duplicates on sight. Given that there are only 534 distinct values, > >> the sort would easily stay in memory if that were happening. > > > Could this optimization be added to PostgreSQL? It sounds like a very > > reasonable thing to do. > > That's what I was wondering about too. But first I'd like to get > some kind of reading on how effective it would be. If someone can > demonstrate that Oracle can do sort-and-drop-dups a lot faster than > it can do a straight sort of the same amount of input data, that > would be a strong indication that it's worth doing. At this point > we don't know if that's the source of their win or not. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >