Re: Moving postgresql.conf tunables into 2003... - Mailing list pgsql-performance

From Sean Chittenden
Subject Re: Moving postgresql.conf tunables into 2003...
Date
Msg-id 20030805222609.GF94710@perrin.int.nxad.com
Whole thread Raw
In response to Re: Moving postgresql.conf tunables into 2003...  (Manfred Koizar <mkoi-pg@aon.at>)
Responses Re: Moving postgresql.conf tunables into 2003...
Use of multipart index with "IN"
List pgsql-performance
> >> is some other problem that needs to be solved.  (I'd wonder about
> >> index correlation myself; we know that that equation is pretty
> >> bogus.)
> >
> >Could be.  I had him create a multi-column index on the date and a
> >non-unique highly redundant id.
>
> Tom has already suspected index correlation to be a possible source
> of the problem and recommended to CLUSTER on the index.  A weakness
> of the current planner implementation is that a multi column index
> is always thought to have low correlation.  In your case even after
> CLUSTER the 2-column index on (date, sensorid) is treated like a
> single column index with correlation 0.5.

Howdy.  Well, I got far enough with the guy in the testing to figure
out that it wasn't a single vs multi-column index problem, however I
haven't heard back from him regarding the use of CLUSTER.  Ce est la
IRC.  :-p

> I have an experimental patch lying around somewhere that tries to
> work around these problems by offering different estimation methods
> for index scans.  If you are interested, I'll dig it out.

Sure, I'll take a gander... had my head in enough Knuth recently to
even hopefully have some kind of a useful response to the patch.

> In the meantime have him try with a single column index on date.

Been there, done that: no change.

> |That's one heck of a poor estimate for the number of rows returned.
> |
> |> ->  Seq Scan on mss_fwevent  (cost=0.00..223312.60 rows=168478 width=12)
> |                        (actual time=24253.66..24319.87 rows=320 loops=1)
>
> >  ->  Index Scan using mss_fwevent_evtime_sensorid_idx on mss_fwevent
> >             (cost=0.00..2442524.70 rows=168478 width=12)
> >            (actual time=68.36..132.84 rows=320 loops=1)
> >      Index Cond: ((evtime > (now() - '06:00'::interval)) AND (sensorid = 7))
> >      Filter: (NOT "action")
>
> Estimated number of rows being wrong by a factor 500 seems to be the
> main problem hiding everything else.  With statistics already set to
> 1000, does this mean that sensorid, evtime, and action are not
> independent?  It'd be interesting to know whether the estimation
> error comes from "Index Cond" or from "Filter".

Hrm...  sensorid is sequence and grows proportional with evtime,
obviously.  Action is a char(1) or something like that (ie: not
unique).  See the EXPLAIN ANALYZEs that I posted in msgid:
20030704000646.GV72567@perrin.int.nxad.com.. or at the bottom of this
msg.

Having spent a fair amount of time looking at the two following plans,
it seems as though an additional statistic is needed to change the
cost of doing an index lookup when the index is linearly ordered.
Whether CLUSTER does this or not, I don't know, I never heard back
from him after getting the runtime down to a few ms.  :-/ Are indexes
on linearly ordered data rebalanced somehow?  I thought CLUSTER only
reordered data on disk.  -sc



Plan for normal random_page_cost:

mss_masterlog=> SHOW random_page_cost;
 random_page_cost
------------------
 4
(1 row)

mss_masterlog=> EXPLAIN ANALYZE SELECT srca, COUNT(srca) FROM mss_fwevent WHERE
mss_masterlog-> sensorid = 7 AND evtime > (now() - '6 hours'::INTERVAL)
mss_masterlog-> AND NOT action GROUP BY srca ORDER BY COUNT DESC LIMIT 20;
                                                                      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=240384.69..240384.74 rows=20 width=12) (actual time=24340.04..24340.39 rows=20 loops=1)
   ->  Sort  (cost=240384.69..240426.80 rows=16848 width=12) (actual time=24340.02..24340.14 rows=21 loops=1)
         Sort Key: count(srca)
         ->  Aggregate  (cost=237938.36..239201.95 rows=16848 width=12) (actual time=24322.84..24330.73 rows=23
loops=1)
               ->  Group  (cost=237938.36..238780.75 rows=168478 width=12) (actual time=24322.57..24328.45 rows=320
loops=1)
                     ->  Sort  (cost=237938.36..238359.55 rows=168478 width=12) (actual time=24322.55..24324.34
rows=320loops=1) 
                           Sort Key: srca
                           ->  Seq Scan on mss_fwevent  (cost=0.00..223312.60 rows=168478 width=12) (actual
time=24253.66..24319.87rows=320 loops=1) 
                                 Filter: ((sensorid = 7) AND (evtime > (now() - '06:00'::interval)) AND (NOT "action"))
 Total runtime: 24353.67 msec
(10 rows)


Plan for altered random_page_cost:

mss_masterlog=> SET random_page_cost = 0.32;
SET
mss_masterlog=> EXPLAIN ANALYZE SELECT srca, COUNT(srca) FROM mss_fwevent WHERE
mss_masterlog-> sensorid = 7 AND evtime > (now() - '6 hours'::INTERVAL)
mss_masterlog-> AND NOT action GROUP BY srca ORDER BY COUNT DESC LIMIT 20;
                                                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------
 Limit  (cost=227274.85..227274.90 rows=20 width=12) (actual time=28.42..28.75 rows=20 loops=1)
   ->  Sort  (cost=227274.85..227316.97 rows=16848 width=12) (actual time=28.41..28.52 rows=21 loops=1)
         Sort Key: count(srca)
         ->  Aggregate  (cost=224828.52..226092.11 rows=16848 width=12) (actual time=20.26..28.13 rows=23 loops=1)
               ->  Group  (cost=224828.52..225670.91 rows=168478 width=12) (actual time=19.99..25.86 rows=320 loops=1)
                     ->  Sort  (cost=224828.52..225249.72 rows=168478 width=12) (actual time=19.98..21.76 rows=320
loops=1)
                           Sort Key: srca
                           ->  Index Scan using mss_fwevent_evtime_sensorid_idx on mss_fwevent  (cost=0.00..210202.76
rows=168478width=12) (actual time=0.35..17.61 
rows=320 loops=1)
                                 Index Cond: ((evtime > (now() - '06:00'::interval)) AND (sensorid = 7))
                                 Filter: (NOT "action")
 Total runtime: 29.09 msec
(11 rows)

--
Sean Chittenden

pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Some vacuum & tuning help
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Re: Some vacuum & tuning help