Re: analyzer/planner and clustered rows - Mailing list pgsql-performance

From Manfred Koizar
Subject Re: analyzer/planner and clustered rows
Date
Msg-id q0v39094mi9vrab5v0e2nmofv6f3d9l942@email.aon.at
Whole thread Raw
In response to analyzer/planner and clustered rows  (Joseph Shraibman <jks@selectacast.net>)
List pgsql-performance
On Thu, 29 Apr 2004 19:09:09 -0400, Joseph Shraibman
<jks@selectacast.net> wrote:
>How does the analyzer/planner deal with rows clustered together?

There's a correlation value per column.  Just try

    SELECT attname, correlation
      FROM pg_stats
     WHERE tablename = '...';

if you are interested.  It indicates how well the hypothetical order of
tuples if sorted by that column corresponds to the physical order.  +1.0
is perfect correlation, 0.0 is totally chaotic, -1.0 means reverse
order.  The optimizer is more willing to choose an index scan if
correlation for the first index column is near +/-1.

>  What if the data in the table happens to be close
>together because it was inserted together originally?

Having equal values close to each other is not enough, the values should
be increasing, too.  Compare

    5 5 5 4 4 4 7 7 7 2 2 2 6 6 6 3 3 3 8 8 8   low correlation
and
    2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8   correlation = 1.0


pgsql-performance by date:

Previous
From: "Gary Doades"
Date:
Subject: Re: planner/optimizer question
Next
From: Jeff
Date:
Subject: Re: planner/optimizer question