Re: choosing the right platform - Mailing list pgsql-performance
From | scott.marlowe |
---|---|
Subject | Re: choosing the right platform |
Date | |
Msg-id | Pine.LNX.4.33.0304101042480.23398-100000@css120.ihs.com Whole thread Raw |
In response to | Re: choosing the right platform ("Jim C. Nasby" <jim@nasby.net>) |
Responses |
Re: choosing the right platform
|
List | pgsql-performance |
On Wed, 9 Apr 2003, Jim C. Nasby wrote: > On Wed, Apr 09, 2003 at 11:55:56AM -0600, scott.marlowe wrote: > > A couple more suggestions. One is to never allocate more than 50% of your > > memory to a database's shared buffers, i.e. let the OS buffer the disks en > > masse, while the database should have a smaller buffer for the most recent > > accesses. This is because kernel caching is usually faster and more > > efficient than the database doing it, and this becomes more an issue with > > large chunks of memory, which both Linux and BSD are quite good at > > caching, and postgresql, not so good. > > That seems odd... shouldn't pgsql be able to cache information better > since it would be cached in whatever format is best for it, rather than > the raw page format (or maybe that is the best format). Yes and no. The problem isn't that the data is closer to postgresql in it's buffers versus further away in kernel buffers, it's that postgresql's caching algorhythm isn't performance tweaked for very large settings, it's performance tweaked to provide good performance on smaller machines, with say 4 or 16 Megs of shared buffers. Handling large buffers requires a different approach to handling small ones, and the kernel is optimized in that direction. Also, the kernel in most Oses, i.e. Linux and BSD tends to use "spare ram" with abandon as cache memory, so if you've got 4 gigs of ram, with 200 Megs set aside for postgresql, it's quite likely that the kernel cache can hold ALL your dataset for you once it's been read in once. So, the data is already cached once. Caching it again in Postgresql only gains a little, since the speed difference of postgresql shared buffer / cache and kernel caches is very small. However, the speed going to the hard drive is much slower. What you don't want is a postgresql cache that's bigger (on average) than the kernel cache, since the kernel cache will then be "thrashing" when you access information not currently in either cache. I.e. postgresql becomes your only cache, and kernel caching stops working for you and becomes just overhead, since you never get anything from it if it's too small to cache something long enough to be used again. > There's also the > issue of having to go through more layers of software if you're relying > on the OS caching. All the tuning info I've seen for every other > database I've worked with specifically recommends giving the database as > much memory as you possibly can, the theory being that it will do a much > better job of caching than the OS will. That's old school thinking. There was a day when kernel caching was much slower, and writing directly to your devices in a raw mode was the only way to ensure good performance. Nowadays, most modern Unix kernels and their file systems are a match for most database needs. heck, with some storage systems, the performance of the file system is just not really an issue, it's the bandwidth of the connector you use. Note that this is a good thing (TM) since it frees the postgresql development team to do other things than worry about caching 1 gig of data.
pgsql-performance by date: