Re: Configuring for maximum memory usage - Mailing list pgsql-performance

From Scott Carey
Subject Re: Configuring for maximum memory usage
Date
Msg-id a1ec7d000810301127k40c3cb4x5db8df0b64d037ae@mail.gmail.com
Whole thread Raw
In response to Re: Configuring for maximum memory usage  ("Joshua D. Drake" <jd@commandprompt.com>)
Responses Re: Configuring for maximum memory usage
List pgsql-performance


On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:

>
>         Remember that PostgreSQL doesn't cache anything on its own so
>         if you do
>         want to hit disk it has to be in file cache.
>
> By my understanding, this is absolutely false.  Postgres caches pages
> from tables/indexes in shared_buffers. You can make this very large if
> you wish.

You can make it very large with a potentially serious performance hit.
It is very expensive to manage large amounts of shared buffers. It can
also nail your IO on checkpoint if you are not careful (even with
checkpoint smoothing). You are correct that I did not explain what I
meant very well because shared buffers are exactly that, shared
buffers.

You can slam your I/O by havnig too large of either OS file cache or shared_buffers, and you have to tune both.
In the case of large shared_buffers you have to tune postgres and especially the background writer and checkpoints.
In the case of a large OS cache, you have to tune parameters to limit the ammount of dirty pages there and force writes out smoothly. 
Both layers attempt to delay writes for their own, often similar reasons, and suffer when a large sync comes along with lots of dirty data. 

Recent ZFS changes have been made to limit this, (http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle)
in earlier ZFS versions, this is what usually killed databases -- ZFS in some situations would delay writes too long (even if "long" is 5 seconds) and get in trouble.  This still has to be tuned well, combined with good checkpoint tuning in Postgres as you mention. For Linux, there are similar issues that have to be tuned on many kernels, or up to 40% of RAM can fill with dirty pages not written to disk.

Letting the OS do it doesn't get rid of the problem, both levels of cache share very similar issues with large sizes and dirty pages followed by a sync.

The buffer cache in shared_buffers is a lot more efficient for large scanning queries -- A select count(*) test will be CPU bound if it comes from shared_buffers or the OS page cache, and in the former case I have seen it execute up to 50% faster than the latter, by avoiding calling out to the OS to get pages, purely as a result of less CPU used.
 
 


However that isn't the exact same thing as a "cache" at least as I was
trying to describe it. shared buffers are used to keep track of pages
(as well as some other stuff) and their current status. That is not the
same as caching a relation.

It is not possible to pin a relation to memory using PostgreSQL.
PostgreSQL relies on the operating system for that type of caching.

The OS can't pin a relation either, from its point of view its all just a bunch of disk data blocks, not relations -- so it is all roughly equivalent.  The OS can do a bit better job at data prefetch on sequential scans or other predictable seek sequences (ARC stands for Adaptive Replacement Cache) than postgres currently does (no intelligent prefetch in postgres AFAIK).

So I apologize if I made it sound like Postgres cached the actual relation, its just pages -- but it is basically the same thing as the OS cache, but kept in process closer to the code that needs it.  Its a cache that prevents disk reads.

My suggestion for the OP is to try it both ways, and see what is better for his workload / OS / Hardware combination.
 

Joshua D. Drake





--


pgsql-performance by date:

Previous
From: "Scott Carey"
Date:
Subject: Re: Configuring for maximum memory usage
Next
From: Christiaan Willemsen
Date:
Subject: Re: Configuring for maximum memory usage