Thread: Larger volumes of chronologically ordered data and the planner
Hello, What is PostgreSQL's likely behaviour when it encounters a large volume of data that is chronologically ordered (there's a btree index on a date column)? Is postgreSQL intelligent enough to discern that since the most frequently accessed data is invariably recent data, that it should store only that in memory, and efficiently store less relevant, older data on disk (the volume of data in production at the moment is still small enough to fit entirely in memory)? The application I maintain is not really a data warehousing app, but this is likely to be where I first encounter performance issues, if I ever do. Where can I learn more about this subject in general? Regards, John Moran
John Moran <johnfrederickmoran@gmail.com> writes: > What is PostgreSQL's likely behaviour when it encounters a large > volume of data that is chronologically ordered (there's a btree index > on a date column)? Is postgreSQL intelligent enough to discern that > since the most frequently accessed data is invariably recent data, > that it should store only that in memory, and efficiently store less > relevant, older data on disk (the volume of data in production at the > moment is still small enough to fit entirely in memory)? There's no dedicated intelligence about such a case, but I don't see why the ordinary cache management algorithms won't handle it perfectly well. regards, tom lane
John Moran wrote: > Is postgreSQL intelligent enough to discern that > since the most frequently accessed data is invariably recent data, > that it should store only that in memory, and efficiently store less > relevant, older data on disk When you ask for a database block from disk, it increments a usage count figure for that block when it's read into memory, and again if it turns out it was already there. Those requests to allocate new blocks are constantly decreasing those usage counts as they "clock sweep" over the cache looking for space that hasn't been used recently. This will automatically keep blocks you've used recently in RAM, while evicting ones that aren't. The database doesn't have any intelligence to determining what data to keep in memory or not beyond that. Its sole notion of "relevant" is whether someone has accessed that block recently or not. The operating system cache will sit as a second layer on top of this, typically with its own LRU scheme typically for determining what gets cached or not. I've written a long paper covering the internals here named "Inside the PostgreSQL Buffer Cache" at http://www.westnet.com/~gsmith/content/postgresql/ if you want to know exactly how this is all implemented. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
> I've written a long paper covering the internals here named "Inside the > PostgreSQL Buffer Cache" at > http://www.westnet.com/~gsmith/content/postgresql/ if you want to know > exactly how this is all implemented. Greg, That's exactly what I was looking for, Regards, John