Re: why is postgres estimating so badly? - Mailing list pgsql-hackers
From | Luis Alberto Amigo Navarro |
---|---|
Subject | Re: why is postgres estimating so badly? |
Date | |
Msg-id | 002701c22e33$b68b7dc0$cab990c1@atc.unican.es Whole thread Raw |
In response to | why is postgres estimating so badly? ("Luis Alberto Amigo Navarro" <lamigo@atc.unican.es>) |
List | pgsql-hackers |
> > AND part.name LIKE '%green%' > > It's difficult for the planner to produce a decent estimate for the > selectivity of an unanchored LIKE clause, since there are no statistics > it can use for the purpose. We recently changed FIXED_CHAR_SEL in > src/backend/utils/adt/selfuncs.c from 0.04 to 0.20, which would make > this particular case come out better. (I believe the estimate would > work out to about 320, if part is 200K rows; that should be enough to > produce at least some change of plan.) You could try patching your > local installation likewise. Here are the results, worse than before: NOTICE: QUERY PLAN: Sort (cost=25209.88..25209.88 rows=1 width=93) (actual time=1836143.78..1836144.48 rows=175 loops=1) -> Aggregate (cost=25209.85..25209.87 rows=1 width=93) (actual time=1803559.97..1836136.47 rows=175 loops=1) -> Group (cost=25209.85..25209.86 rows=2 width=93) (actual time=1803348.04..1816093.89 rows=325302 loops=1) -> Sort (cost=25209.85..25209.85 rows=2 width=93) (actual time=1803347.97..1804795.41 rows=325302 loops=1) -> Hash Join (cost=25208.43..25209.84 rows=2 width=93) (actual time=1744714.61..1772790.19 rows=325302 loops=1) -> Seq Scan on nation (cost=0.00..1.25rows=25 width=15) (actual time=13.92..14.84 rows=25 loops=1) -> Hash (cost=25208.42..25208.42 rows=2 width=78) (actual time=1744603.74..1744603.74 rows=0 loops=1) -> Nested Loop (cost=0.00..25208.42rows=2 width=78) (actual time=139.21..1740110.04 rows=325302 loops=1) -> Nested Loop (cost=0.00..25201.19 rows=2 width=70) (actual time=122.37..1687895.49 rows=325302 loops=1) -> NestedLoop (cost=0.00..25187.93 rows=4 width=62) (actual time=121.75..856097.27 rows=325302 loops=1) -> Nested Loop (cost=0.00..17468.91 rows=1280 width=24) (actual time=78.43..19698.77 rows=43424 loops=1) -> Seq Scan on part (cost=0.00..12399.00 rows=320 width=4) (actual time=29.57..4179.70 rows=10856 loops=1) -> Index Scan using partsupp_pkey on partsupp (cost=0.00..15.79 rows=4 width=20) (actual time=1.17..1.33 rows=4 loops=10856) -> Index Scan using l_partsupp_index on lineitem (cost=0.00..6.02 rows=1 width=38) (actual time=2.83..18.97 rows=7 loops=43424) -> Index Scan using orders_pkey on orders (cost=0.00..3.23 rows=1 width=8) (actual time=2.47..2.50 rows=1 loops=325302) -> Index Scan using supplier_pkey on supplier (cost=0.00..3.01 rows=1 width=8) (actual time=0.08..0.09 rows=1 loops=325302) Total runtime: 1836375.16 msec It looks even worse, another advice?, or maybe a query change. here is the query again: SELECTnation,o_year,CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit FROM(SELECT nation.name AS nation, EXTRACT(year FROM orders.orderdate) AS o_year, lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.qu antity AS amountFROM part, supplier, lineitem, partsupp, orders, nationWHERE supplier.suppkey=lineitem.suppkey AND partsupp.suppkey=lineitem.suppkeyAND partsupp.partkey=lineitem.partkey AND part.partkey=lineitem.partkey AND orders.orderkey=lineitem.orderkeyAND supplier.nationkey=nation.nationkey AND part.name LIKE '%green%') AS profit GROUP BYnation,o_year ORDER BYnation,o_year DESC; Thanks and regards
pgsql-hackers by date: