Tom, Greg, Merlin,
> But for example,
> if our buffer management algorithm recognizes an index page as being
> heavily hit and therefore keeps it in cache for a long time, then when
> it does fall out of cache you can be sure it's going to need to be read
> from disk when it's next used, because the OS-level buffer cache has not
> seen a call for that page in a long time. Contrariwise a page that we
> think is only on the fringe of usefulness is going to stay in the OS
> cache because we repeatedly drop it and then have to ask for it again.
Now you can see why other DBMSs don't use the OS disk cache. There's other
issues as well; for example, as long as we use the OS disk cache, we can't
eliminate checkpoint spikes, at least on Linux. No matter what we do with
the bgwriter, fsyncing the OS disk cache causes heavy system activity.
> It seems inevitable that Postgres will eventually eliminate that redundant
> layer of buffering. Since mmap is not workable, that means using O_DIRECT
> to read table and index data.
Why is mmap not workable? It would require far-reaching changes to our code
-- certainly -- but I don't think it can be eliminated from consideration.
> What about going the other way and simply letting the o/s do all the
> caching? How bad (or good) would the performance really be?
Pretty bad. You can simulate this easily by turning your shared_buffers way
down ...
--
Josh Berkus
Aglio Database Solutions
San Francisco