Thread: analyzer/planner and clustered rows

analyzer/planner and clustered rows

From
Joseph Shraibman
Date:
How does the analyzer/planner deal with rows clustered together?  Does
it just assume that if this col is clustered on then the actual data
will be clustered?  What if the data in the table happens to be close
together because it was inserted together originally?

Re: analyzer/planner and clustered rows

From
Manfred Koizar
Date:
On Thu, 29 Apr 2004 19:09:09 -0400, Joseph Shraibman
<jks@selectacast.net> wrote:
>How does the analyzer/planner deal with rows clustered together?

There's a correlation value per column.  Just try

    SELECT attname, correlation
      FROM pg_stats
     WHERE tablename = '...';

if you are interested.  It indicates how well the hypothetical order of
tuples if sorted by that column corresponds to the physical order.  +1.0
is perfect correlation, 0.0 is totally chaotic, -1.0 means reverse
order.  The optimizer is more willing to choose an index scan if
correlation for the first index column is near +/-1.

>  What if the data in the table happens to be close
>together because it was inserted together originally?

Having equal values close to each other is not enough, the values should
be increasing, too.  Compare

    5 5 5 4 4 4 7 7 7 2 2 2 6 6 6 3 3 3 8 8 8   low correlation
and
    2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8   correlation = 1.0