Re: Sequential Scan with LIMIT - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Sequential Scan with LIMIT
Date
Msg-id 20041028235430.GJ55164@decibel.org
Whole thread Raw
In response to Re: Sequential Scan with LIMIT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Thu, Oct 28, 2004 at 07:49:28PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote:
> >> The test case you are showing is probably suffering from nonrandom
> >> placement of this particular data value; which is something that the
> >> statistics we keep are too crude to detect.
>
> > Isn't that exactly what pg_stats.correlation is?
>
> No.  A far-from-zero correlation gives you a clue that on average, *all*
> the data values are placed nonrandomly ... but it doesn't really tell
> you much one way or the other about a single data value.

Maybe I'm confused about what the original issue was then... it appeared
that you were suggesting PGSQL was doing a seq scan instead of an index
scan because it thought it would find it on the first page if the data
was randomly distributed. If the correlation is highly non-zero though,
shouldn't it 'play it safe' and assume that unless it's picking the min
or max value stored in statistics it will be better to do an index scan,
since the value it's looking for is probably in the middle of the table
somewhere? IE: if the values in the field are between 1 and 5 and the
table is clustered on that field then clearly an index scan would be
better to find a row with field=3 than a seq scan.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sequential Scan with LIMIT
Next
From: Patrick Hatcher
Date:
Subject: determining max_fsm_pages