Re: Increasing pattern index query speed - Mailing list pgsql-performance

From Andrus
Subject Re: Increasing pattern index query speed
Date
Msg-id 91652AA4F32448288A88CD62CF84B330@andrusnotebook
Whole thread Raw
In response to Re: Increasing pattern index query speed  (Scott Carey <scott@richrelevance.com>)
Responses Re: Increasing pattern index query speed
List pgsql-performance
Scott,

>My first thought on the query where a pattern being faster than the query
>with an exact value is that the planner does not have good enough
>statistics on that column.  Without looking at the explain plans further, I
>would suggest trying something simple.  The fact that it is fasster on 8.3
>but slower on 8.1 may have to do with changes between versions, or may
>simply be due to luck in the statistics sampling.
>See if increasing the statistics target on that column significantly does
>anything:
>EXPLAIN (your query);
ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000;
ANALYZE orders_products;
EXPLAIN (your query);
>2000 is simply a guess of mine for a value much larger than the default.
>This will generally make query planning slower but the system will have a
>lot more data about that column and the distribution of data in it.  This
>should help stabilize the query performance.
>If this has an effect, the query plans will change.
>Your question below really boils down to something more simple:
> --Why is the most optimal query plan not chosen?  This is usually due to
> either insufficient statistics or quirks in how the query planner works on
> a specific data >set or with certain configuration options.

Thank you very much.
I found that  AND dok.kuupaev = date'2008-11-21' runs fast but
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'  runs very
slow.

explain SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode like '99000010%'

plan with default statistics:

"Aggregate  (cost=17.86..17.87 rows=1 width=0)"
"  ->  Nested Loop  (cost=0.00..17.85 rows=1 width=0)"
"        ->  Nested Loop  (cost=0.00..11.84 rows=1 width=24)"
"              Join Filter: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81
rows=1 width=4)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
"              ->  Index Scan using rid_toode_pattern_idx on rid
(cost=0.00..6.01 rows=1 width=28)"
"                    Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode
~<~ '99000011'::bpchar))"
"                    Filter: (toode ~~ '99000010%'::text)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1
width=24)"
"              Index Cond: ("outer".toode = toode.toode)"

after statistics is changed query runs fast ( 70 ... 1000 ms)

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000;
analyze rid;
explain analyze SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode like '99000010%'
   AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'
"Aggregate  (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834
rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..27.04 rows=1 width=0) (actual
time=0.727..44.370 rows=108 loops=1)"
"        ->  Nested Loop  (cost=0.00..21.02 rows=1 width=24) (actual
time=0.688..40.519 rows=108 loops=1)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81
rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
"              ->  Index Scan using rid_dokumnr_idx on rid
(cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0
loops=1678)"
"                    Index Cond: ("outer".dokumnr = rid.dokumnr)"
"                    Filter: (toode ~~ '99000010%'::text)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1
width=24) (actual time=0.016..0.020 rows=1 loops=108)"
"              Index Cond: ("outer".toode = toode.toode)"
"Total runtime: 45.050 ms"

It seems that you are genius.

I used 1000 since doc wrote that max value is 1000

Rid table contains 3.5millions rows, will increase 1 millions of rows per
year and is updated frequently, mostly by adding.

Is it OK to leave

SET STATISTICS 1000;

setting for this table this column or should  I try to decrease it ?

Andrus.


pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Increasing pattern index query speed
Next
From: "Andrus"
Date:
Subject: Re: Increasing pattern index query speed