Re: Slow query with planner row strange estimation - Mailing list pgsql-performance

From damien hostin
Subject Re: Slow query with planner row strange estimation
Date
Msg-id 4C318983.6040600@axege.com
Whole thread Raw
In response to Slow query with planner row strange estimation  (damien hostin <damien.hostin@axege.com>)
Responses Re: Slow query with planner row strange estimation  (damien hostin <damien.hostin@axege.com>)
List pgsql-performance
Hello,

Before the week end I tried to change the index, but even with the
mono-column index on differents columns, the estimated number of rows
from dwhinv is 1.

Anyone have a suggestion, what can I check ?


thx


damien hostin a écrit :
> Hello,
>
> I try to make a query run quicker but I don't really know how to give
> hints to the planner.
>
> We are using postgresql 8.4.3 64bit on ubuntu 9.10 server. The
> hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go RAM.
> Queries come from J2EE application (OLAP cube), but running them in
> pg_admin perform the same way.
>
> I made a short example that shows what I think is the problem. The
> real query is much longer but with only one join it already cause
> problems.
>
> Here is the short example :
>
> select rfoadv_8.rfoadvsup as c8,
>    sum(dwhinv.dwhinvqte) as m0
> from
>    dwhinv as dwhinv,
>    rfoadv as rfoadv_8
> where (dwhinv.dwhinv___rforefide = 'HPLUS'
>  and  (dwhinv.dwhinv___rfodomide = 'PMSI'  and
> dwhinv.dwhinv___rfoindrvs = '1' and
> dwhinv.dwhinv___rfoindide='recN3_BB_reel') )
>  and  dwhinv.dwhinv_p2rfodstide = rfoadv_8.rfoadvinf
>  and rfoadv_8.rfoadvsup = 'ACTI'
> group by rfoadv_8.rfoadvsup
>
> dwhinv is a table with almost 6.000.000 records
> rfoadv is a view with 800.000 records
> rfoadv is based on rfoade which is 50.000 records
>
> Here is the explain analyse :
> GroupAggregate  (cost=0.00..16.56 rows=1 width=13) (actual
> time=2028.452..2028.453 rows=1 loops=1)
>  ->  Nested Loop  (cost=0.00..16.54 rows=1 width=13) (actual
> time=0.391..1947.432 rows=42664 loops=1)
>        Join Filter: (((ade2.rfoadegch)::text >=
> (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <=
> (ade1.rfoadedrt)::text))
>        ->  Nested Loop  (cost=0.00..12.54 rows=1 width=214) (actual
> time=0.304..533.281 rows=114350 loops=1)
>              ->  Index Scan using dwhinv_rdi_idx on dwhinv
> (cost=0.00..4.87 rows=1 width=12) (actual time=0.227..16.827 rows=6360
> loops=1)
>                    Index Cond: (((dwhinv___rforefide)::text =
> 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND
> ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND
> (dwhinv___rfoindrvs = 1))
>              ->  Index Scan using rfoade_dsi_idx on rfoade ade2
> (cost=0.00..7.63 rows=3 width=213) (actual time=0.007..0.037 rows=18
> loops=6360)
>                    Index Cond: ((ade2.rfoade_i_rfodstide)::text =
> (dwhinv.dwhinv_p2rfodstide)::text)
>        ->  Index Scan using rfoade_pk on rfoade ade1  (cost=0.00..3.98
> rows=1 width=213) (actual time=0.008..0.009 rows=0 loops=114350)
>              Index Cond: (((ade1.rfoade___rforefide)::text =
> (ade2.rfoade___rforefide)::text) AND ((ade1.rfoade_i_rfodstide)::text
> = 'ACTI'::text) AND ((ade1.rfoade___rfovdeide)::text =
> (ade2.rfoade___rfovdeide)::text) AND (ade1.rfoadervs = ade2.rfoadervs))
>
> We can see that the planner think that accessing dwhinv with the
> dwhinv_rdi_idx index will return 1 row, but in fact there are 6360. So
> the nested loop is not done with 1 loop but 6360. With only one Join,
> the query runs in about 1.5 sec which is not really long, but with 8
> join, the same mistake is repeated 8 times, the query runs in 30-60
> sec. I try to disable nested loop, hash join and merge join are done
> instead of nested loops, example query runs in 0.2 - 0.5 sec, and the
> real query no more that 1 sec ! Which is great.
>
> Here is the execution plan with nested loop off:
>
> GroupAggregate  (cost=12.56..2453.94 rows=1 width=13) (actual
> time=817.306..817.307 rows=1 loops=1)
>  ->  Hash Join  (cost=12.56..2453.93 rows=1 width=13) (actual
> time=42.583..720.746 rows=42664 loops=1)
>        Hash Cond: (((ade2.rfoade___rforefide)::text =
> (ade1.rfoade___rforefide)::text) AND ((ade2.rfoade___rfovdeide)::text
> = (ade1.rfoade___rfovdeide)::text) AND (ade2.rfoadervs = ade1.rfoadervs))
>        Join Filter: (((ade2.rfoadegch)::text >=
> (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <=
> (ade1.rfoadedrt)::text))
>        ->  Hash Join  (cost=4.88..2446.21 rows=1 width=214) (actual
> time=42.168..411.962 rows=114350 loops=1)
>              Hash Cond: ((ade2.rfoade_i_rfodstide)::text =
> (dwhinv.dwhinv_p2rfodstide)::text)
>              ->  Seq Scan on rfoade ade2  (cost=0.00..2262.05
> rows=47805 width=213) (actual time=0.057..78.988 rows=47805 loops=1)
>              ->  Hash  (cost=4.87..4.87 rows=1 width=12) (actual
> time=41.632..41.632 rows=6360 loops=1)
>                    ->  Index Scan using dwhinv_rdi_idx on dwhinv
> (cost=0.00..4.87 rows=1 width=12) (actual time=0.232..28.199 rows=6360
> loops=1)
>                          Index Cond: (((dwhinv___rforefide)::text =
> 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND
> ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND
> (dwhinv___rfoindrvs = 1))
>        ->  Hash  (cost=7.63..7.63 rows=3 width=213) (actual
> time=0.347..0.347 rows=11 loops=1)
>              ->  Index Scan using rfoade_dsi_idx on rfoade ade1
> (cost=0.00..7.63 rows=3 width=213) (actual time=0.095..0.307 rows=11
> loops=1)
>                    Index Cond: ((rfoade_i_rfodstide)::text =
> 'ACTI'::text)
>
> Even if dwhinv row estimation is wrong, the query is quicker
>
>
> So after looking at dwhinv_rdi_idx statistics, I found that
> dwhinv___rfoindide related stats wasn't good, so I try "ALTER TABLE
> dwhinv ALTER dwhinv_p2rfodstide SET STATISTICS 2000" and launch an
> vaccum analyse to gather more impressive stats. Stats are better but
> query plan is the same and query is not optimised. So I try reindex on
> DWHINV as a last chance, but it changes nothing !
>
> Maybe I'm wrong with the interpretation of the plan but I don't really
> think so because with no nested loops this query is really fast ! I do
> not plan to disable nested loop on the whole database because
> sometimes, nested loops are greats !
>
> Now I'm stuck ! I don't know how to make the planner understand there
> are 6000 rows. Or maybe the 3 column index is a bad idea... ?!
>
> Thanks
>


--
HOSTIN Damien - Equipe R&D
Tel:+33(0)4 63 05 95 40
Société Axège
23 rue Saint Simon
63000 Clermont Ferrand
www.axege.com




pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Performance issues with postgresql-8.4.0
Next
From: "Octavio Alvarez"
Date:
Subject: Two "equivalent" WITH RECURSIVE queries, one of them slow.