Re: Huge Performance Difference on Similar Query in Pg7.2 - Mailing list pgsql-general

From Heiko Klein
Subject Re: Huge Performance Difference on Similar Query in Pg7.2
Date
Msg-id 15515.22769.371394.645257@polar.oslo.dnmi.no
Whole thread Raw
In response to Re: Huge Performance Difference on Similar Query in Pg7.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general

Tom Lane writes:
 > Unfortunately the planner is not very bright about transitivity of
 > equals and so does not realize that it could derive EmissionsView.setid
 > = '4614' from the other two conditions.  I imagine the reason Oracle
 > gives similar behavior for both queries is that it does expend the
 > cycles to make that deduction.  (This is a guess though; it'd be
 > interesting to see their query plans.)

I don't know how to retrieve the query plan from oracle7. Any hints and
I will send them

 >
 > Given the complexity of the plans, it may just be that the planner is
 > switching into GEQO search mode and is failing to find the best plan.
 > You might consider setting geqo_threshold larger than the number of
 > base tables in the query (looks like 12, here) to see if better plans
 > emerge.

Wow, thats impressive. By changing the geqo_threshold from 11 to 12, I
reduced time from 70000ms to 700ms. Increasing step by step it stay
constant until 15, then I have 70000ms at 16 and 17, and drop to 200ms
at 18 and stay between 100 and 150 when I increase it further. So why
not using a very high geqo_threshold (maybe 100) as threshold? (See
new query-plan below with geqo_threshold = 20)

 > It'd also be worth asking whether you couldn't simplify your views ;-).
 > Do you really require a 12-way join to do whatever it is you're doing?
 > I'm pretty amazed that the system is able to find a good plan for either
 > query ...

Exactly that was the problems, all queries where quite slow. This one
was only one I used very often. But I cannot do everything at once. I
just convinced my boss to switch from Oracle to Postgres, and that this
wouldn't take a long time. So the frontend is important for the views
(it asks for them), and I may not change this before I haven't managed to
converert to Postgres smoothly. Short: its politics

But thanks a lot for your help, that was really usefull and a huge step
forward for me.

Heiko


emep=> explain analyze select  count(*) from  LowHiSectorGridEmissionsView ,    DataSetsView where  DataSetsView.setid
='4614' and DataSetsView.setid = LowHiSectorGridEmissionsView.setid ; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=1992.56..1992.56 rows=1 width=131) (actual time=112.80..112.80 rows=1 loops=1)
  ->  Nested Loop  (cost=1989.38..1992.55 rows=4 width=131) (actual time=105.12..111.02 rows=1606 loops=1)
        ->  Seq Scan on footnotes  (cost=0.00..1.01 rows=1 width=4) (actual time=0.03..0.04 rows=1 loops=1)
        ->  Materialize  (cost=1991.49..1991.49 rows=4 width=127) (actual time=105.06..106.01 rows=1606 loops=1)
              ->  Hash Join  (cost=1989.38..1991.49 rows=4 width=127) (actual time=75.34..102.23 rows=1606 loops=1)
                    ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual time=0.02..0.19 rows=71 loops=1)
                    ->  Hash  (cost=1989.37..1989.37 rows=4 width=123) (actual time=75.15..75.15 rows=0 loops=1)
                          ->  Hash Join  (cost=1984.27..1989.37 rows=4 width=123) (actual time=42.51..71.05 rows=1606
loops=1)
                                ->  Seq Scan on sectordefinitions  (cost=0.00..3.09 rows=109 width=22) (actual
time=0.01..0.35rows=109 loops=1) 
                                ->  Hash  (cost=1984.20..1984.20 rows=26 width=101) (actual time=30.29..30.29 rows=0
loops=1)
                                      ->  Nested Loop  (cost=22.01..1984.20 rows=26 width=101) (actual time=1.02..25.55
rows=1606loops=1) 
                                            ->  Hash Join  (cost=22.01..24.08 rows=1 width=89) (actual time=0.97..1.19
rows=1loops=1) 
                                                  ->  Seq Scan on areas  (cost=0.00..1.71 rows=71 width=4) (actual
time=0.01..0.16rows=71 loops=1) 
                                                  ->  Hash  (cost=22.00..22.00 rows=1 width=85) (actual time=0.87..0.87
rows=0loops=1) 
                                                        ->  Nested Loop  (cost=16.07..22.00 rows=1 width=85) (actual
time=0.81..0.87rows=1 loops=1) 
                                                              ->  Hash Join  (cost=16.07..17.35 rows=1 width=77)
(actualtime=0.78..0.84 rows=1 loops=1) 
                                                                    ->  Seq Scan on datatypes  (cost=0.00..1.18 rows=18
width=27)(actual time=0.01..0.06 rows=18 loops=1) 
                                                                    ->  Hash  (cost=16.07..16.07 rows=1 width=50)
(actualtime=0.69..0.69 rows=0 loops=1) 
                                                                          ->  Nested Loop  (cost=8.02..16.07 rows=1
width=50)(actual time=0.31..0.68 rows=1 loops=1) 
                                                                                ->  Nested Loop  (cost=8.02..12.69
rows=1width=28) (actual time=0.28..0.65 rows=1 loops=1) 
                                                                                      ->  Hash Join  (cost=8.02..10.09
rows=1width=24) (actual time=0.21..0.39 rows=1 loops=1) 
                                                                                            ->  Seq Scan on areas
(cost=0.00..1.71rows=71 width=4) (actual time=0.01..0.14 rows=71 loops=1) 
                                                                                            ->  Hash  (cost=8.01..8.01
rows=1width=20) (actual time=0.11..0.11 rows=0 loops=1) 
                                                                                                  ->  Nested Loop
(cost=0.00..8.01rows=1 width=20) (actual time=0.10..0.11 rows=1 loops=1) 
                                                                                                        ->  Index Scan
usingdatasets_setid_idx on datasets  (cost=0.00..3.36 rows=1 width=12) (actual time=0.06..0.06 rows=1 loops=1) 
                                                                                                        ->  Index Scan
usingreports_pkey on reports  (cost=0.00..4.64 rows=1 width=8) (actual time=0.03..0.03 rows=1 loops=1) 
                                                                                      ->  Seq Scan on areas
(cost=0.00..1.71rows=71 width=4) (actual time=0.00..0.16 rows=71 loops=1) 
                                                                                ->  Index Scan using datasets_setid_idx
ondatasets  (cost=0.00..3.36 rows=1 width=22) (actual time=0.02..0.02 rows=1 loops=1) 
                                                              ->  Index Scan using reports_pkey on reports
(cost=0.00..4.64rows=1 width=8) (actual time=0.01..0.01 rows=1 loops=1) 
                                            ->  Index Scan using lhsecgriemis_setid_idx on lowhisectorgridemissions
(cost=0.00..1953.36rows=541 width=12) (actual time=0.04..7.30 rows=1606 loops=1) 
Total runtime: 114.50 msec

EXPLAIN

pgsql-general by date:

Previous
From: Jochem van Dieten
Date:
Subject: Re: Newbie question - Which Linux?
Next
From: Stephan Szabo
Date:
Subject: Re: Huge Performance Difference on Similar Query in Pg7.2