Tom Lane <tgl@sss.pgh.pa.us> writes:
> I see no reason to hardwire such a number. On any hardware, the
> distribution is going to be double-humped, and it will be pretty easy to
> determine a cutoff after minimal accumulation of data.
Well my stats-fu isn't up to the task. My hunch is that the wide range that
the disk reads are spread out over will throw off more sophisticated
algorithms. Eliminating hardwired numbers is great, but practically speaking
it's not like any hardware is ever going to be able to fetch the data within
100us. If it does it's because it's really a solid state drive or pulling the
data from disk cache and therefore really ought to be considered part of
effective_cache_size anyways.
> The real question is whether we can afford a pair of gettimeofday() calls
> per read(). This isn't a big issue if the read actually results in I/O, but
> if it doesn't, the percentage overhead could be significant.
My thinking was to use gettimeofday by default but allow individual ports to
provide a replacement function that uses the cpu TSC counter (via rdtsc) or
equivalent. Most processors provide such a feature. If it's not there then we
just fall back to gettimeofday.
Your idea to sample only 1% of the reads is a fine idea too.
My real question is different. Is it worth heading down this alley at all? Or
will postgres eventually opt to use O_DIRECT and boost the size of its buffer
cache? If it goes the latter route, and I suspect it will one day, then all of
this is a waste of effort.
I see mmap or O_DIRECT being the only viable long-term stable states. My
natural inclination was the former but after the latest thread on the subject
I suspect it'll be forever out of reach. That makes O_DIRECT And a Postgres
managed cache the only real choice. Having both caches is just a waste of
memory and a waste of cpu cycles.
> Another issue is what we do with the effective_cache_size value once we
> have a number we trust. We can't readily change the size of the ARC
> lists on the fly.
Huh? I thought effective_cache_size was just used as an factor the cost
estimation equation. My general impression was that a higher
effective_cache_size effectively lowered your random page cost by making the
system think that fewer nonsequential block reads would really incur the cost.
Is that wrong? Is it used for anything else?
--
greg