Caching (was Re: choosing the right platform) - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Caching (was Re: choosing the right platform) |
Date | |
Msg-id | 11732.1049934037@sss.pgh.pa.us Whole thread Raw |
In response to | Re: choosing the right platform ("Jim C. Nasby" <jim@nasby.net>) |
Responses |
Re: Caching (was Re: choosing the right platform)
|
List | pgsql-performance |
"Jim C. Nasby" <jim@nasby.net> writes: > That seems odd... shouldn't pgsql be able to cache information better > since it would be cached in whatever format is best for it, rather than > the raw page format (or maybe that is the best format). There's also the > issue of having to go through more layers of software if you're relying > on the OS caching. All the tuning info I've seen for every other > database I've worked with specifically recommends giving the database as > much memory as you possibly can, the theory being that it will do a much > better job of caching than the OS will. There are a number of reasons why that's a dubious policy for PG (I won't take a position on whether these apply to other databases...) One is that because we sit on top of the OS' filesystem, we can't (portably) prevent the OS from caching blocks. So it's quite easy to get into a situation where the same data is cached twice, once in PG buffers and once in kernel disk cache. That's clearly a waste of RAM however you slice it, and it's worst when you set the PG shared buffer size to be about half of available RAM. You can minimize the duplication by skewing the allocation one way or the other: either set PG's allocation relatively small, relying heavily on the OS to do the caching; or make PG's allocation most of RAM and hope to squeeze out the OS' cache. There are partisans for both approaches on this list. I lean towards the first policy because I think that starving the kernel for RAM is a bad idea. (Especially if you run on Linux, where this policy tempts the kernel to start kill -9'ing random processes ...) Another reason is that PG uses a simplistic fixed-number-of-buffers internal cache, and therefore it can't adapt on-the-fly to varying memory pressure, whereas the kernel can and will give up disk cache space to make room when it's needed for processes. Since PG isn't even aware of the total memory pressure on the system as a whole, it couldn't do as good a job of trading off cache vs process workspace as the kernel can do, even if we had a variable-size cache scheme. A third reason is that on many (most?) Unixen, SysV shared memory is subject to swapping, and the bigger you make the shared_buffer arena, the more likely it gets that some of the arena will be touched seldom enough to make it a candidate for swapping. A disk buffer that gets swapped to disk is worse than useless (if it's dirty, the swapping is downright counterproductive, since an extra read and write cycle will be needed before the data can make it to its rightful place). PG is *not* any smarter about the usage patterns of its disk buffers than the kernel is; it uses a simple LRU algorithm that is surely no brighter than what the kernel uses. (We have looked at smarter buffer recycling rules, but failed to see any performance improvement.) So the notion that PG can do a better job of cache management than the kernel is really illusory. About the only advantage you gain from having data directly in PG buffers rather than kernel buffers is saving the CPU effort needed to move data across the userspace boundary --- which is not zero, but it's sure a lot less than the time spent for actual I/O. So my take on it is that you want shared_buffers fairly small, and let the kernel do the bulk of the heavy lifting for disk cache. That's what it does for a living, so let it do what it does best. You only want shared_buffers big enough so you don't spend too many CPU cycles shoving data back and forth between PG buffers and kernel disk cache. The default shared_buffers setting of 64 is surely too small :-(, but my feeling is that values in the low thousands are enough to get past the knee of that curve in most cases. regards, tom lane
pgsql-performance by date: