Huge Performance Difference on Similar Query in Pg7.2 - Mailing list pgsql-general

From Heiko Klein
Subject Huge Performance Difference on Similar Query in Pg7.2
Date
Msg-id 15514.62356.723349.852577@polar.oslo.dnmi.no
Whole thread Raw
Responses Re: Huge Performance Difference on Similar Query in Pg7.2
List pgsql-general
Hi,

I have two views:
LowHiSectorGridEmissionsView: > 1.000.000 rows
DataSetsView: > 30.000 rows

When I now perform the two similar (results equal) queries on those
rows:
1)
select count(*) from EmissionsView, DataSetsView
where DataSetsView.setid = EmissionsView.setid
  and EmissionsView.setid = '4614' ;
      -------------
2)
select count(*) from EmissionsView, DataSetsView
where DataSetsView.setid = EmissionsView.setid
  and DataSetsView.setid = '4614' ;
      ------------

I have a huge performance difference. 1) Makes a index search on
EmissionsView and a Seq scan on DataSetsView: time: 1-2s
2) Makes a index search on DataSetsView and a Seq scan on EmissionsView:
time: ~50s

The count(*) isn't the point here, the same is true for every other type
of select.

I'm just porting the database from a Ora7 on 2x200Mhz SGI machine to a
Pg7.2 on a 850Mhz Athlon/Linux PC. On Ora7, both queries take about the
same time: 1-2s. I cannot easily change all queries, since I got only
time to work on the database, not on the frontend, which creates the
queries.


Why is Pg7.2 using a Seq scan at all? (Even after a vacuum analyze!) And
if a Seq scan is needed: Why doesn't the query-planer change the query
to the much faster version itself? And most important: How should I work
on, is there a Pg internal solution for it, or do I have to go the hard
way and modify the frontend?

Heiko


Below are the explain results:
1)
emep=> explain select  count(*) from  EmissionsView ,    DataSetsView where  DataSetsView.setid = EmissionsView.setid
and    EmissionsView.setid = '4614' ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=5975.66..5975.66 rows=1 width=131)
  ->  Hash Join  (cost=5958.18..5975.58 rows=35 width=131)
        ->  Hash Join  (cost=5956.30..5972.99 rows=35 width=127)
              ->  Hash Join  (cost=5954.41..5970.40 rows=35 width=123)
                    ->  Seq Scan on reports  (cost=0.00..11.78 rows=378 width=8)
                    ->  Hash  (cost=5954.30..5954.30 rows=42 width=115)
                          ->  Nested Loop  (cost=4375.32..5954.30 rows=42 width=115)
                                ->  Nested Loop  (cost=4375.32..5812.95 rows=42 width=103)
                                      ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4)
                                      ->  Materialize  (cost=5811.31..5811.31 rows=42 width=99)
                                            ->  Hash Join  (cost=4375.32..5811.31 rows=42 width=99)
                                                  ->  Hash Join  (cost=4371.95..5796.47 rows=307 width=77)
                                                        ->  Merge Join  (cost=4370.73..5789.88 rows=307 width=50)
                                                              ->  Sort  (cost=4370.73..4370.73 rows=28192 width=38)
                                                                    ->  Hash Join  (cost=24.06..1834.10 rows=28192
width=38)
                                                                          ->  Hash Join  (cost=22.17..1268.37
rows=28192width=34) 
                                                                                ->  Seq Scan on datasets
(cost=0.00..724.86rows=33786 width=22) 
                                                                                ->  Hash  (cost=21.23..21.23 rows=378
width=12)
                                                                                      ->  Hash Join  (cost=1.89..21.23
rows=378width=12) 
                                                                                            ->  Seq Scan on reports
(cost=0.00..11.78rows=378 width=8) 
                                                                                            ->  Hash  (cost=1.71..1.71
rows=71width=4) 
                                                                                                  ->  Seq Scan on areas
(cost=0.00..1.71 rows=71 width=4) 
                                                                          ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                                                                                ->  Seq Scan on areas  (cost=0.00..1.71
rows=71width=4) 
                                                              ->  Index Scan using lhsecgriemis_setid_idx on emissions
(cost=0.00..1343.91rows=368 width=12) 
                                                        ->  Hash  (cost=1.18..1.18 rows=18 width=27)
                                                              ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18
width=27)
                                                  ->  Hash  (cost=3.09..3.09 rows=109 width=22)
                                                        ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109
width=22)
                                ->  Index Scan using datasets_setid_idx on datasets  (cost=0.00..3.37 rows=1 width=12)
              ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
        ->  Hash  (cost=1.71..1.71 rows=71 width=4)
              ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)

EXPLAIN





2)
emep=> explain select  count(*) from  EmissionsView ,    DataSetsView where  DataSetsView.setid = EmissionsView.setid
and    DataSetsView.setid = '4614' ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=91660.36..91660.36 rows=1 width=131)
  ->  Hash Join  (cost=2732.23..91660.35 rows=3 width=131)
        ->  Nested Loop  (cost=2719.53..91058.61 rows=117798 width=103)
              ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4)
              ->  Materialize  (cost=89290.63..89290.63 rows=117798 width=99)
                    ->  Hash Join  (cost=2719.53..89290.63 rows=117798 width=99)
                          ->  Hash Join  (cost=2716.17..56957.45 rows=863754 width=77)
                                ->  Seq Scan on emissions  (cost=0.00..18502.34 rows=1035134 width=12)
                                ->  Hash  (cost=2328.69..2328.69 rows=28192 width=65)
                                      ->  Hash Join  (cost=25.29..2328.69 rows=28192 width=65)
                                            ->  Hash Join  (cost=24.06..1834.10 rows=28192 width=38)
                                                  ->  Hash Join  (cost=22.17..1268.37 rows=28192 width=34)
                                                        ->  Seq Scan on datasets  (cost=0.00..724.86 rows=33786
width=22)
                                                        ->  Hash  (cost=21.23..21.23 rows=378 width=12)
                                                              ->  Hash Join  (cost=1.89..21.23 rows=378 width=12)
                                                                    ->  Seq Scan on reports  (cost=0.00..11.78 rows=378
width=8)
                                                                    ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                                                                          ->  Seq Scan on areas  (cost=0.00..1.71
rows=71width=4) 
                                                  ->  Hash  (cost=1.71..1.71 rows=71 width=4)
                                                        ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
                                            ->  Hash  (cost=1.18..1.18 rows=18 width=27)
                                                  ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18 width=27)
                          ->  Hash  (cost=3.09..3.09 rows=109 width=22)
                                ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109 width=22)
        ->  Hash  (cost=12.70..12.70 rows=1 width=28)
              ->  Hash Join  (cost=10.62..12.70 rows=1 width=28)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
                    ->  Hash  (cost=10.62..10.62 rows=1 width=24)
                          ->  Nested Loop  (cost=0.00..10.62 rows=1 width=24)
                                ->  Nested Loop  (cost=0.00..5.96 rows=1 width=16)
                                      ->  Index Scan using datasets_setid_idx on datasets  (cost=0.00..3.37 rows=1
width=12)
                                      ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4)
                                ->  Index Scan using reports_pkey on reports  (cost=0.00..4.64 rows=1 width=8)

EXPLAIN


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postmaster processes running out of control?
Next
From: "Luis Alberto Amigo Navarro"
Date:
Subject: Re: Huge Performance Difference on Similar Query in Pg7.2