Re: shared_buffers advice - Mailing list pgsql-performance

From Scott Carey
Subject Re: shared_buffers advice
Date
Msg-id 198A4D5C-A8F6-45F3-A69B-25A06874117D@richrelevance.com
Whole thread Raw
In response to Re: shared_buffers advice  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-performance
On Mar 11, 2010, at 12:39 AM, Greg Smith wrote:

>
> Giving all the buffers to the database doesn't work for many reasons:
> -Need a bunch leftover for clients to use (i.e. work_mem)
> -Won't be enough OS cache for non-buffer data the database expects
> cached reads and writes will perform well onto (some of the non-database
> files it uses)
> -Database checkpoints will turn into a nightmare, because there will be
> so much more dirty data that could have been spooled regularly out to
> the OS and then to disk by backends that doesn't ever happen.
> -Not having enough writes for buffering backend writes means less chanes
> to do write combining and elevator seek sorting, which means average I/O
> will drop.
>
> The alternate idea is to make shared_buffers small.  I see people
> happilly running away in the 128MB - 256MB range sometimes.  The benefit
> over just using the default of <32MB is obvious, but you're already past
> a good bit of the diminishing marginal returns just by the 8X increase.
>

The DB usage pattern influences this sort of decision too.  One that does large bulk inserts can prefer larger shared
buffers,provided its bg_writer is tuned well (10GB - 16GB for a 64GB server).   
Temp table usage benefits from it as well -- I believe that one created as "ON COMMIT DROP" has a better chance of not
beingwritten to the data disk before being dropped with more work_mem. 
If you have a mixed read workload that has occasional very large sequential scans, you will want to make sure
shared_buffersis large enough to hold the most important index and randomly accessed data. 

Linux is more sensitive to letting sequential scans kick out data from page cache than Postgres.


----------
Lastly, a word of caution on Linux.  Before the recent changes to memory accounting and paging (~ kernel 2.28 ish?).
Shared_buffersare only accounted for in part of the equations for paging.  On one hand, the system sees shared memory
asavailable to be swapped out (even though it won't) and on the other hand it senses memory pressure from it.  So if
youfor example, set shared_mem to 75% of your RAM the system will completely freak out and kswapd and other processes
willgo through long periods of 100% CPU utilization.   
An example:
32GB RAM, 16GB shared_buffers, CentOS 5.4:
With the default os 'swappiness' of '60' the system will note that less than 60% is used by pagecache and favor
swappingout postgres backends aggressively.  If either by turning down the swappiness or opening enough processes to
consumemore RAM on the system (to ~ 80% or so) the kernel will start spending a LOT of CPU, often minutes at a time,
tryingto free up memory.   From my understanding, it will keep searching the postgres shared_buffers space for pages to
swapout even though it can't do so.  So for example, there might be 16GB shared mem (which it won't page out), 10GB
otherprocess memory, and 6GB actual cahced files in page cache.  It sees the ratio of 6GB files to 26GB processes and
heavilyfavors attacking the 26GB -- but scans the whole set of process memory and finds all pages are recently used or
can'tbe paged out. 

Anyhow, the latest linux kernels claim to fix this, and Solaris/OpenSolaris or BSD's don't have this problem.   On
OpenSolaristhere are some benchmarks out there that showing that 90% of memory allocated to shared_buffers can work
well. On Linux, that is dangerous.  Combine the poor memory management when there is a lot of shared memory with the
factthat 50% is bad for double-buffering, and the Linux suggestion becomes the typical 'at least 128MB, but never more
than25% of RAM'. 


> Improves keep coming as shared_buffers cache size increases for many
> workloads, but eventually you can expect to go to far if you try to push
> everything in there.  Only question is whether that happens at 40%, 60%,
> or something higher.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


pgsql-performance by date:

Previous
From: VJK
Date:
Subject: Re: Deleting bytea, autovacuum, and 8.2/8.4 differences
Next
From: Kees van Dieren
Date:
Subject: Strange workaround for slow query