Re: vacuum analyze slows sql query - Mailing list pgsql-performance

From patrick ~
Subject Re: vacuum analyze slows sql query
Date
Msg-id 20041108185702.99883.qmail@web52103.mail.yahoo.com
Whole thread Raw
In response to Re: vacuum analyze slows sql query  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: vacuum analyze slows sql query  (John Meinel <john@johnmeinel.com>)
List pgsql-performance
Sorry for the late reply.  Was feeling a bit under the weather
this weekend and didn't get a chance to look at this.


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> patrick ~ <sidsrr@yahoo.com> writes:
> >  PREPARE pkk_00 ( integer ) <the def of pkk_offer_has_pending_purc( integer
> )
>
> This is what you want to do, but not quite like that.  The PREPARE
> determines the plan and so VACUUMing and re-EXECUTing is going to show
> the same plan.  What we need to look at is
>     - standing start
>     PREPARE pkk_00 ...
>     EXPLAIN ANALYZE EXECUTE pkk_00 ...
>     VACUUM ANALYZE;
>     PREPARE pkk_01 ...
>     EXPLAIN ANALYZE EXECUTE pkk_01 ...

But of course!  I feel a bit silly now.

This is what I get after following Tom's directions:

pkk=# prepare pkk_00 ( integer ) as select ...
PREPARE
Time: 1.753 ms
pkk=# execute pkk_00(    241 );
 case
------
 f
(1 row)

Time: 0.788 ms
pkk=# explain analyze execute pkk_00(    241 );
      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=10.73..10.74 rows=1 width=0) (actual time=0.067..0.068 rows=1
loops=1)
   InitPlan
     ->  Limit  (cost=0.00..10.73 rows=1 width=4) (actual time=0.055..0.055
rows=0 loops=1)
           ->  Index Scan using pur_offer_id_idx on pkk_purchase p0
(cost=0.00..20690.18 rows=1929 width=4) (actual time=0.052..0.052 rows=0
loops=1)
                 Index Cond: (offer_id = $1)
                 Filter: ((((expire_time)::timestamp with time zone > now()) OR
(expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR
(pending = true)))
 Total runtime: 0.213 ms
(7 rows)

Time: 24.654 ms
pkk=# vacuum analyze ;
VACUUM
Time: 128826.078 ms
pkk=# prepare pkk_01 ( integer ) as select ...
PREPARE
Time: 104.658 ms
pkk=# execute pkk_01(    241 );
 case
------
 f
(1 row)

Time: 7652.708 ms
pkk=# explain analyze execute pkk_01(    241 );
      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213
rows=1 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..2.66 rows=1 width=4) (actual
time=2872.189..2872.189 rows=0 loops=1)
           ->  Seq Scan on pkk_purchase p0  (cost=0.00..37225.83 rows=13983
width=4) (actual time=2872.180..2872.180 rows=0 loops=1)
                 Filter: ((offer_id = $1) AND (((expire_time)::timestamp with
time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND
((cancel_date IS NULL) OR (pending = true)))
 Total runtime: 2872.339 ms
(6 rows)

Time: 2873.479 ms


So it looks like after the VACCUM the planner resorts to Seq Scan
rather than Index Scan.

This is because of the value of correlation field in pg_stats
(according to PostgreSQL docs) being closer to 0 rather than
�1:

pkk=# select tablename,attname,correlation from pg_stats where tablename =
'pkk_purchase' and attname = 'offer_id' ;
  tablename   | attname  | correlation
--------------+----------+-------------
 pkk_purchase | offer_id |    0.428598
(1 row)


So I started to experiment with ALTER TABLE SET STATISTICS
values to see which gets the correlation closer to �1.  The
trend seems to indicat the higher the stat value is set it
pushes the correlation value closer to 0:

set statistics   correlation
----------------------------
         800     0.393108
         500     0.408137
         200     0.43197
          50     0.435211
           1     0.45758

And a subsequent PREPARE and EXPLAIN ANALYZE confirms that
the Planer reverts back to using the Index Scan after setting
stats to 1 (even though correlation value is still closer
to 0 than 1):

pkk=# explain analyze execute pkk_02(    241 );

 QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=2.95..2.96 rows=1 width=0) (actual time=0.068..0.069 rows=1
loops=1)
   InitPlan
     ->  Limit  (cost=0.00..2.95 rows=1 width=4) (actual time=0.056..0.056
rows=0 loops=1)
           ->  Index Scan using pur_offer_id_idx on pkk_purchase p0
(cost=0.00..35810.51 rows=12119 width=4) (actual time=0.053..0.053 rows=0
loops=1)
                 Index Cond: (offer_id = $1)
                 Filter: ((((expire_time)::timestamp with time zone > now()) OR
(expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR
(pending = true)))
 Total runtime: 0.200 ms
(7 rows)



So, is this the ultimate solution to this issue?

--patrick



__________________________________
Do you Yahoo!?
Check out the new Yahoo! Front Page.
www.yahoo.com



pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Question regarding the file system
Next
From: John Meinel
Date:
Subject: Re: vacuum analyze slows sql query