Curt Sampson <cjs@cynic.net> writes:
> On Thu, 30 May 2002, Tom Lane wrote:
>> I guess that the smaller datasets would get proportionally more benefit
>> from kernel disk caching.
> Actually, I re-did the 100m row and 500m row queries from a cold
> start of the machine, and I still get the same results: 10 sec. vs
> 70 sec. (Thus, 7x as long to query only 5x as much data.) So I
> don't think caching is an issue here.
But even from a cold start, there would be cache effects within the
query, viz. fetching the same table block more than once when it is
referenced from different places in the index. On the smaller table,
the block is more likely to still be in kernel cache when it is next
wanted.
On a pure random-chance basis, you'd not expect that fetching 5k rows
out of 100m would hit the same table block twice --- but I'm wondering
if the data was somewhat clustered. Do the system usage stats on your
machine reflect the difference between physical reads and reads
satisfied from kernel buffer cache?
Or maybe your idea about extra seek time is correct.
regards, tom lane