Huge Performance Difference on Similar Query in Pg7.2 - Mailing list pgsql-general
From | Heiko Klein |
---|---|
Subject | Huge Performance Difference on Similar Query in Pg7.2 |
Date | |
Msg-id | 15514.62356.723349.852577@polar.oslo.dnmi.no Whole thread Raw |
Responses |
Re: Huge Performance Difference on Similar Query in Pg7.2
|
List | pgsql-general |
Hi, I have two views: LowHiSectorGridEmissionsView: > 1.000.000 rows DataSetsView: > 30.000 rows When I now perform the two similar (results equal) queries on those rows: 1) select count(*) from EmissionsView, DataSetsView where DataSetsView.setid = EmissionsView.setid and EmissionsView.setid = '4614' ; ------------- 2) select count(*) from EmissionsView, DataSetsView where DataSetsView.setid = EmissionsView.setid and DataSetsView.setid = '4614' ; ------------ I have a huge performance difference. 1) Makes a index search on EmissionsView and a Seq scan on DataSetsView: time: 1-2s 2) Makes a index search on DataSetsView and a Seq scan on EmissionsView: time: ~50s The count(*) isn't the point here, the same is true for every other type of select. I'm just porting the database from a Ora7 on 2x200Mhz SGI machine to a Pg7.2 on a 850Mhz Athlon/Linux PC. On Ora7, both queries take about the same time: 1-2s. I cannot easily change all queries, since I got only time to work on the database, not on the frontend, which creates the queries. Why is Pg7.2 using a Seq scan at all? (Even after a vacuum analyze!) And if a Seq scan is needed: Why doesn't the query-planer change the query to the much faster version itself? And most important: How should I work on, is there a Pg internal solution for it, or do I have to go the hard way and modify the frontend? Heiko Below are the explain results: 1) emep=> explain select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setid and EmissionsView.setid = '4614' ; NOTICE: QUERY PLAN: Aggregate (cost=5975.66..5975.66 rows=1 width=131) -> Hash Join (cost=5958.18..5975.58 rows=35 width=131) -> Hash Join (cost=5956.30..5972.99 rows=35 width=127) -> Hash Join (cost=5954.41..5970.40 rows=35 width=123) -> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8) -> Hash (cost=5954.30..5954.30 rows=42 width=115) -> Nested Loop (cost=4375.32..5954.30 rows=42 width=115) -> Nested Loop (cost=4375.32..5812.95 rows=42 width=103) -> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4) -> Materialize (cost=5811.31..5811.31 rows=42 width=99) -> Hash Join (cost=4375.32..5811.31 rows=42 width=99) -> Hash Join (cost=4371.95..5796.47 rows=307 width=77) -> Merge Join (cost=4370.73..5789.88 rows=307 width=50) -> Sort (cost=4370.73..4370.73 rows=28192 width=38) -> Hash Join (cost=24.06..1834.10 rows=28192 width=38) -> Hash Join (cost=22.17..1268.37 rows=28192width=34) -> Seq Scan on datasets (cost=0.00..724.86rows=33786 width=22) -> Hash (cost=21.23..21.23 rows=378 width=12) -> Hash Join (cost=1.89..21.23 rows=378width=12) -> Seq Scan on reports (cost=0.00..11.78rows=378 width=8) -> Hash (cost=1.71..1.71 rows=71width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71width=4) -> Index Scan using lhsecgriemis_setid_idx on emissions (cost=0.00..1343.91rows=368 width=12) -> Hash (cost=1.18..1.18 rows=18 width=27) -> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27) -> Hash (cost=3.09..3.09 rows=109 width=22) -> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) EXPLAIN 2) emep=> explain select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setid and DataSetsView.setid = '4614' ; NOTICE: QUERY PLAN: Aggregate (cost=91660.36..91660.36 rows=1 width=131) -> Hash Join (cost=2732.23..91660.35 rows=3 width=131) -> Nested Loop (cost=2719.53..91058.61 rows=117798 width=103) -> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4) -> Materialize (cost=89290.63..89290.63 rows=117798 width=99) -> Hash Join (cost=2719.53..89290.63 rows=117798 width=99) -> Hash Join (cost=2716.17..56957.45 rows=863754 width=77) -> Seq Scan on emissions (cost=0.00..18502.34 rows=1035134 width=12) -> Hash (cost=2328.69..2328.69 rows=28192 width=65) -> Hash Join (cost=25.29..2328.69 rows=28192 width=65) -> Hash Join (cost=24.06..1834.10 rows=28192 width=38) -> Hash Join (cost=22.17..1268.37 rows=28192 width=34) -> Seq Scan on datasets (cost=0.00..724.86 rows=33786 width=22) -> Hash (cost=21.23..21.23 rows=378 width=12) -> Hash Join (cost=1.89..21.23 rows=378 width=12) -> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71width=4) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=1.18..1.18 rows=18 width=27) -> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27) -> Hash (cost=3.09..3.09 rows=109 width=22) -> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22) -> Hash (cost=12.70..12.70 rows=1 width=28) -> Hash Join (cost=10.62..12.70 rows=1 width=28) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=10.62..10.62 rows=1 width=24) -> Nested Loop (cost=0.00..10.62 rows=1 width=24) -> Nested Loop (cost=0.00..5.96 rows=1 width=16) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Index Scan using reports_pkey on reports (cost=0.00..4.64 rows=1 width=8) EXPLAIN
pgsql-general by date: