Re: Query using SeqScan instead of IndexScan - Mailing list pgsql-performance

From Brendan Duddridge
Subject Re: Query using SeqScan instead of IndexScan
Date
Msg-id 95C1A48B-A76F-4F9B-B5EE-0BFAF8873067@clickspace.com
Whole thread Raw
In response to Re: Query using SeqScan instead of IndexScan  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Query using SeqScan instead of IndexScan
List pgsql-performance
Hi Jim,

from SELECT * FROM pg_stats WHERE tablename='table' AND
attname='category_id'

I find correlation on category_product for category_id is 0.643703

Would setting the index on category_id to be clustered help with this?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 1, 2006, at 8:32 AM, Jim C. Nasby wrote:

> On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
>> On 4/1/06, Brendan Duddridge <brendan@clickspace.com> wrote:
>>> Hi Jim,
>>>
>>> I'm not quite sure what you mean by the correlation of category_id?
>>
>> It means how many distinct values does it have (at least that's my
>> understanding of it ;) ).
>
> Your understanding is wrong. :) What you're discussing is n_distinct.
>
> http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html
>
> correlation: "Statistical correlation between physical row ordering
> and
> logical ordering of the column values. This ranges from -1 to +1. When
> the value is near -1 or +1, an index scan on the column will be
> estimated to be cheaper than when it is near zero, due to reduction of
> random access to the disk. (This column is NULL if the column data
> type
> does not have a < operator.)"
>
> In other words, the following will have a correlation of 1:
>
> 1
> 2
> 3
> ...
> 998
> 999
> 1000
>
> And this is -1...
>
> 1000
> 999
> ...
> 2
> 1
>
> While this would have a very low correlation:
>
> 1
> 1000
> 2
> 999
> ...
>
> The lower the correlation, the more expensive an index scan is,
> because
> it's more random. As I mentioned, I believe that the current index
> scan
> cost estimator is flawed though, because it will bias heavily against
> correlations that aren't close to 1 or -1.
>
> So, what does
>
> SELECT * FROM pg_stats WHERE tablename='table' AND
> attname='category_id';
>
> show?
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


Attachment

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Query using SeqScan instead of IndexScan
Next
From: "Merlin Moncure"
Date:
Subject: Re: [Solved] Slow performance on Windows .NET and OleDb