Re: Increasing pattern index query speed - Mailing list pgsql-performance
From | Andrus |
---|---|
Subject | Re: Increasing pattern index query speed |
Date | |
Msg-id | 91652AA4F32448288A88CD62CF84B330@andrusnotebook Whole thread Raw |
In response to | Re: Increasing pattern index query speed (Scott Carey <scott@richrelevance.com>) |
Responses |
Re: Increasing pattern index query speed
|
List | pgsql-performance |
Scott, >My first thought on the query where a pattern being faster than the query >with an exact value is that the planner does not have good enough >statistics on that column. Without looking at the explain plans further, I >would suggest trying something simple. The fact that it is fasster on 8.3 >but slower on 8.1 may have to do with changes between versions, or may >simply be due to luck in the statistics sampling. >See if increasing the statistics target on that column significantly does >anything: >EXPLAIN (your query); ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000; ANALYZE orders_products; EXPLAIN (your query); >2000 is simply a guess of mine for a value much larger than the default. >This will generally make query planning slower but the system will have a >lot more data about that column and the distribution of data in it. This >should help stabilize the query performance. >If this has an effect, the query plans will change. >Your question below really boils down to something more simple: > --Why is the most optimal query plan not chosen? This is usually due to > either insufficient statistics or quirks in how the query planner works on > a specific data >set or with certain configuration options. Thank you very much. I found that AND dok.kuupaev = date'2008-11-21' runs fast but AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' runs very slow. explain SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '99000010%' plan with default statistics: "Aggregate (cost=17.86..17.87 rows=1 width=0)" " -> Nested Loop (cost=0.00..17.85 rows=1 width=0)" " -> Nested Loop (cost=0.00..11.84 rows=1 width=24)" " Join Filter: ("outer".dokumnr = "inner".dokumnr)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81 rows=1 width=4)" " Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" " -> Index Scan using rid_toode_pattern_idx on rid (cost=0.00..6.01 rows=1 width=28)" " Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode ~<~ '99000011'::bpchar))" " Filter: (toode ~~ '99000010%'::text)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1 width=24)" " Index Cond: ("outer".toode = toode.toode)" after statistics is changed query runs fast ( 70 ... 1000 ms) ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000; analyze rid; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '99000010%' AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' "Aggregate (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834 rows=1 loops=1)" " -> Nested Loop (cost=0.00..27.04 rows=1 width=0) (actual time=0.727..44.370 rows=108 loops=1)" " -> Nested Loop (cost=0.00..21.02 rows=1 width=24) (actual time=0.688..40.519 rows=108 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81 rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)" " Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" " -> Index Scan using rid_dokumnr_idx on rid (cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0 loops=1678)" " Index Cond: ("outer".dokumnr = rid.dokumnr)" " Filter: (toode ~~ '99000010%'::text)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1 width=24) (actual time=0.016..0.020 rows=1 loops=108)" " Index Cond: ("outer".toode = toode.toode)" "Total runtime: 45.050 ms" It seems that you are genius. I used 1000 since doc wrote that max value is 1000 Rid table contains 3.5millions rows, will increase 1 millions of rows per year and is updated frequently, mostly by adding. Is it OK to leave SET STATISTICS 1000; setting for this table this column or should I try to decrease it ? Andrus.
pgsql-performance by date: