Re: optimizer tuning/forcing correct index use - Mailing list pgsql-sql

From Tom Lane
Subject Re: optimizer tuning/forcing correct index use
Date
Msg-id 160.1016559143@sss.pgh.pa.us
Whole thread Raw
In response to Re: optimizer tuning/forcing correct index use  (Kelly Burkhart <kelly@tradebotsystems.com>)
Responses Re: optimizer tuning/forcing correct index use  (Kelly Burkhart <kelly@tradebotsystems.com>)
List pgsql-sql
Kelly Burkhart <kelly@tradebotsystems.com> writes:
>> Offhand I am guessing that the table is fairly well ordered by fill_ts
>> and the planner is underestimating the effects of this.  There is a
>> provision in there to try to account for data ordering, but it's new
>> code in 7.2 and doubtless still needs refinement.
fill      | fill_ts               |         0 |         8 |     152655 | {"2001-10-22 15:28:07-05","2001-10-22
15:28:16-05","2001-10-2212:43:28-05","2001-08-13 08:49:19-05","2001-08-13 08:49:41-05","2001-09-25
16:13:41-05","2001-10-1009:04:33-05","2001-10-22 14:50:05-05","2001-10-31 14:05:43-06","2002-01-07 13:35:48-06"} |
{0.002,0.002,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}| {"2001-01-10
14:13:01-06","2001-07-1013:05:01-05","2001-09-10 09:26:08-05","2001-10-15 09:01:54-05","2001-11-02
09:49:58-06","2001-11-2809:36:33-06","2001-12-21 08:38:03-06","2002-01-15 09:34:59-06","2002-02-04
11:34:20-06","2002-02-2509:39:55-06","2002-03-08 14:43:10-06"} |   -0.546947
 

Hmm.  So the correlation of fill_ts with physical position is actually
negative, according to the analyze results.  Still, -0.54 represents
rather strong correlation which would reduce the cost of the index scan.

There was some discussion a couple weeks ago on the pgsql-bugs list about
changing the equation the planner uses to estimate the effects of
correlation order.  Are you interested in experimenting?  I previously
said:

: If you look in cost_index (see approx. lines 270-340 in
: src/backend/optimizer/path/costsize.c) you'll see that it computes
: access cost estimates for both the perfectly sequential case and
: the perfectly uncorrelated case, and then tries to interpolate
: between them.  I have reasonable faith in both of the endpoint
: estimation methods, but very little in the interpolation equation ---
: it was chosen on the spur of the moment and hasn't really been tested.
: 
: It might be interesting to replace csquared with just
: fabs(indexCorrelation) to see if the results are better.
        regards, tom lane


pgsql-sql by date:

Previous
From: Kelly Burkhart
Date:
Subject: Re: optimizer tuning/forcing correct index use
Next
From: Jean-Luc Lachance
Date:
Subject: How to create crude report with psql and/or plpgsql