On Fri, 2005-05-20 at 15:23 -0700, Josh Berkus wrote:
> > Well, that raises an interesting issue, because AFAIK none of the cost
> > estimate functions currently do that. Heck, AFAIK even the piggyback
> > seqscan code doesn't take other seqscans into account.
>
> Sure. But you're striving for greater accuracy, no?
>
> Actually, all that's really needed in the way of concurrent activity is a
> calculated factor that lets us know how likely a particular object is to be
> cached, either in the fs cache or the pg cache (with different factors for
> each presumably) based on history. Right now, that's based on
> estimated_cache_size, which is rather innacurate: a table which is queried
> once a month has the exact same cost factors as one which is queried every
> 2.1 seconds. This would mean an extra column in pg_stats I suppose.
Hmmm...not sure that would be a good thing.
effective_cache_size isn't supposed to be set according to how much of a
table is in cache when the query starts. The setting is supposed to
reflect how much cache is *available* for the current index scan, when
performing an index scan on a table that is not in clustered sequence.
The more out of sequence the table is, the more memory is required to
avoid doing any repeated I/Os during the scan. Of course, if there are
many users, the available cache may be much reduced.
Best regards, Simon Riggs