Re: 8.1 planner problem ? - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: 8.1 planner problem ?
Date
Msg-id 474BED96.2020208@enterprisedb.com
Whole thread Raw
In response to 8.1 planner problem ?  (Gianluca Alberici <gianluca@abinetworks.biz>)
List pgsql-performance
Gianluca Alberici wrote:
> I have ran into an interesting problem with 8.1 and i would like anybody
> to explain me if there's a problem with the planner or there's a problem
> with myself. In both cases a solution is welcome. The following query:
>
> SELECT sum(qty) FROM
>     _abi_main_pof_r ampr
>     inner join _abi_main_items ami on ampr.itemid=ami.id
>     inner join _abi_main_pof_t ampt on ampr.poftid=ampt.poftid
>     inner join _abi_main_contacts amc on ampt.contactid=amc.codest
> WHERE
>     ampt.doctypeid in ('BCA','BSC')
>     and amc.descr ilike '%SOCO%'
>     and ampr.type=0;
>
> is explain analyze'd as follows
>
> Aggregate  (cost=1220.65..1220.66 rows=1 width=4) (actual
> time=95937.824..95937.824 rows=1 loops=1)
>   ->  Nested Loop  (cost=0.00..1220.65 rows=1 width=4) (actual
> time=3695.250..95936.292 rows=1503 loops=1)
>         ->  Nested Loop  (cost=0.00..1214.64 rows=1 width=8) (actual
> time=3695.229..95924.697 rows=1503 loops=1)
>               Join Filter: ("inner".poftid = "outer".poftid)
>               ->  Nested Loop  (cost=0.00..79.16 rows=1 width=4) (actual
> time=0.039..22.547 rows=2437 loops=1)
>                     Join Filter: ("inner".contactid = "outer".codest)
>                     ->  Seq Scan on _abi_main_contacts amc
> (cost=0.00..1.29 rows=1 width=4) (actual time=0.029..0.034 rows=1 loops=1)
>                           Filter: ((descr)::text ~~* '%SOCO%'::text)
>                     ->  Seq Scan on _abi_main_pof_t ampt
> (cost=0.00..77.53 rows=27 width=8) (actual time=0.006..15.820 rows=2702
> loops=1)
>                           Filter: (((doctypeid)::text = 'BCA'::text) OR
> ((doctypeid)::text = 'BSC'::text))
>               ->  Seq Scan on _abi_main_pof_r ampr  (cost=0.00..1132.81
> rows=214 width=12) (actual time=0.034..35.986 rows=8271 loops=2437)
>                     Filter: ("type" = 0)

The problem is in the estimate of that seq scan of _abi_main_pof_r
table. The planner estimates that there's 214 rows with type=0, but in
reality there's 8271. That misestimate throws off the rest of the plan.

> ...The same query, but with a condition change as ampr.type != 1 instead
> of ampr.type=0
>
> SELECT sum(qty) FROM
>     _abi_main_pof_r ampr
>     inner join _abi_main_items ami on ampr.itemid=ami.id
>     inner join _abi_main_pof_t ampt on ampr.poftid=ampt.poftid
>     inner join _abi_main_contacts amc on ampt.contactid=amc.codest
> WHERE
>     ampt.doctypeid in ('BCA','BSC')
>     and amc.descr ilike '%SOCO%'
>     and ampr.type != 1;
>
>
> is explain analyze'd as follows:
>
> Aggregate  (cost=1446.13..1446.14 rows=1 width=4) (actual
> time=81.609..81.609 rows=1 loops=1)
>   ->  Nested Loop  (cost=77.60..1446.12 rows=2 width=4) (actual
> time=22.597..80.944 rows=1503 loops=1)
>         ->  Nested Loop  (cost=77.60..1434.12 rows=2 width=8) (actual
> time=22.577..72.785 rows=1503 loops=1)
>               Join Filter: ("inner".contactid = "outer".codest)
>               ->  Seq Scan on _abi_main_contacts amc  (cost=0.00..1.29
> rows=1 width=4) (actual time=0.030..0.036 rows=1 loops=1)
>                     Filter: ((descr)::text ~~* '%SOCO%'::text)
>               ->  Hash Join  (cost=77.60..1427.52 rows=425 width=12)
> (actual time=22.536..69.034 rows=8271 loops=1)
>                     Hash Cond: ("outer".poftid = "inner".poftid)
>                     ->  Seq Scan on _abi_main_pof_r ampr
> (cost=0.00..1132.81 rows=42571 width=12) (actual time=0.035..37.045
> rows=8271 loops=1)
>                           Filter: ("type" <> 1)

Here you can see that the planner is estimating that there's 42571 rows
with type <> 1, which is also way off the reality in the other
direction, but this time the plan it chooses happens to be better.

Run ANALYZE. The planner needs to have statistics on how many rows of
each type there is to come up with the proper plan.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-performance by date:

Previous
From: Gianluca Alberici
Date:
Subject: 8.1 planner problem ?
Next
From: cluster
Date:
Subject: Query only slow on first run