Thread: Swappiness setting on a linux pg server

Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 10:56:54

I just came to think about /proc/sys/swappiness ...

When this one is set to a high number (say, 100 - which is maximum), the
kernel will aggressively swap out all memory that is not beeing
accessed, to allow more memory for caches.  For a postgres server, OS
caches are good, because postgres relies on the OS to cache indices,
etc.  At the other hand, for any production server it's very bad to
experience bursts of iowait when/if the swapped out memory becomes
needed - particularly if the server is used for interactive queries,
like serving a web application.

I know there are much religion on this topic in general, I'm just
curious if anyone has done any serious thoughts or (even better!)
experimenting with the swappiness setting on a loaded postgres server.

I would assume that the default setting (60) is pretty OK and sane, and
that modifying the setting would have insignificant effect.  My
religious belief is that, however insignificant, a higher setting would
have been better :-)

We're running linux kernel 2.6.17.7 (debian) on the postgres server, and
our memory stats looks like this:
         total       used       free     shared    buffers cached
Mem:      6083M      5846M      236M          0       31M   5448M
-/+ buffers/cache:    366M     5716M
Swap:     2643M         2M     2640M

In addition to the postgres server we're running some few cronscripts
and misc on it - nothing significant though.

Re: Swappiness setting on a linux pg server

From

"Jim C. Nasby"

Date:

19 October 2006, 12:29:09

On Thu, Oct 19, 2006 at 03:54:28PM +0200, Tobias Brox wrote:
> I just came to think about /proc/sys/swappiness ...
>
> When this one is set to a high number (say, 100 - which is maximum), the
> kernel will aggressively swap out all memory that is not beeing
> accessed, to allow more memory for caches.  For a postgres server, OS
> caches are good, because postgres relies on the OS to cache indices,
> etc.  At the other hand, for any production server it's very bad to
> experience bursts of iowait when/if the swapped out memory becomes
> needed - particularly if the server is used for interactive queries,
> like serving a web application.
>
> I know there are much religion on this topic in general, I'm just
> curious if anyone has done any serious thoughts or (even better!)
> experimenting with the swappiness setting on a loaded postgres server.

I think it'd be much better to experiment with using much larger
shared_buffers settings. The conventional wisdom there is from 7.x days
when you really didn't want a large buffer, but that doesn't really
apply with the new buffer management we got in 8.0. I know of one site
that doubled their performance by setting shared_buffers to 50% of
memory.

Something else to consider is that many people will put pg_xlog on the
same drives as the OS (and swap). It's pretty important that those
drives not have much activity other than pg_xlog, so any swap activity
would have an even larger than normal impact on performance.

> I would assume that the default setting (60) is pretty OK and sane, and
> that modifying the setting would have insignificant effect.  My
> religious belief is that, however insignificant, a higher setting would
> have been better :-)
>
> We're running linux kernel 2.6.17.7 (debian) on the postgres server, and
> our memory stats looks like this:
>          total       used       free     shared    buffers cached
> Mem:      6083M      5846M      236M          0       31M   5448M
> -/+ buffers/cache:    366M     5716M
> Swap:     2643M         2M     2640M
>
> In addition to the postgres server we're running some few cronscripts
> and misc on it - nothing significant though.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 13:01:14

[Jim C. Nasby - Thu at 10:28:31AM -0500]
> I think it'd be much better to experiment with using much larger
> shared_buffers settings. The conventional wisdom there is from 7.x days
> when you really didn't want a large buffer, but that doesn't really
> apply with the new buffer management we got in 8.0. I know of one site
> that doubled their performance by setting shared_buffers to 50% of
> memory.

Oh, that's interessting.  I will give it a shot.  Our config is
"inheritated" from the 7.x-days, so we have a fairly low setting
compared to available memory.  From the 7.x-days the logic was that "a
lot of careful thought has been given when designing the OS cache/buffer
subsystem, we don't really want to reinvent the wheel" or something like
that.

Sadly it's not easy to measure the overall performance impact of such
tunings in a production environment, so such a setting tends to be tuned
by religion rather than science :-)

> Something else to consider is that many people will put pg_xlog on the
> same drives as the OS (and swap). It's pretty important that those
> drives not have much activity other than pg_xlog, so any swap activity
> would have an even larger than normal impact on performance.

Hm ... that's actually our current setting, we placed the postgres
database itself on a separate disk, not the xlog.  So we should have
done it the other way around?  No wonder the performance is badly
affected by backups etc ...

What else, I gave the swappiness a second thought, compared to our
actual memory usage statistics ... turning down the swappiness would
have no significant effect since we're only using 2M of swap (hardly
significant) and our total memory usage by applications (including the
pg shared buffers) is less than 400M out of 6G.  Maybe we could have
moved some 50M of this to swap, but that's not really significant
compared to our 6G of memory.

Re: Swappiness setting on a linux pg server

From

"Jim C. Nasby"

Date:

19 October 2006, 13:31:53

On Thu, Oct 19, 2006 at 06:00:54PM +0200, Tobias Brox wrote:
> [Jim C. Nasby - Thu at 10:28:31AM -0500]
> > I think it'd be much better to experiment with using much larger
> > shared_buffers settings. The conventional wisdom there is from 7.x days
> > when you really didn't want a large buffer, but that doesn't really
> > apply with the new buffer management we got in 8.0. I know of one site
> > that doubled their performance by setting shared_buffers to 50% of
> > memory.
>
> Oh, that's interessting.  I will give it a shot.  Our config is
> "inheritated" from the 7.x-days, so we have a fairly low setting
> compared to available memory.  From the 7.x-days the logic was that "a
> lot of careful thought has been given when designing the OS cache/buffer
> subsystem, we don't really want to reinvent the wheel" or something like
> that.

Yeah, test setups are a good thing to have...

> Sadly it's not easy to measure the overall performance impact of such
> tunings in a production environment, so such a setting tends to be tuned
> by religion rather than science :-)
>
> > Something else to consider is that many people will put pg_xlog on the
> > same drives as the OS (and swap). It's pretty important that those
> > drives not have much activity other than pg_xlog, so any swap activity
> > would have an even larger than normal impact on performance.
>
> Hm ... that's actually our current setting, we placed the postgres
> database itself on a separate disk, not the xlog.  So we should have
> done it the other way around?  No wonder the performance is badly
> affected by backups etc ...

Well, typically I see setups where people dedicate say 6 drives to the
data and 2 drives for the OS and pg_xlog.

The issue with pg_xlog is you don't need bandwidth... you need super-low
latency. The best way to accomplish that is to get a battery-backed RAID
controller that you can enable write caching on. In fact, if the
controller is good enough, you can theoretically get away with just
building one big RAID10 and letting the controller provide the
low-latency fsyncs that pg_xlog depends on.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 13:36:27

[Jim C. Nasby - Thu at 10:28:31AM -0500]
> I think it'd be much better to experiment with using much larger
> shared_buffers settings. The conventional wisdom there is from 7.x days
> when you really didn't want a large buffer, but that doesn't really
> apply with the new buffer management we got in 8.0. I know of one site
> that doubled their performance by setting shared_buffers to 50% of
> memory.

I've upped it a bit, but it would require a server restart to get the
new setting into effect.  This is relatively "expensive" for us.  Does
anyone else share the viewpoint of Nasby, and does anyone have
recommendation for a good value?  Our previous value was 200M, and I
don't want to go to the extremes just yet.  We have 6G of memory
totally.

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 13:39:39

[Jim C. Nasby - Thu at 11:31:26AM -0500]
> Yeah, test setups are a good thing to have...

We would need to replicate the production traffic as well to do reliable
tests.  Well, we'll get to that one day ...

> The issue with pg_xlog is you don't need bandwidth... you need super-low
> latency. The best way to accomplish that is to get a battery-backed RAID
> controller that you can enable write caching on.

Sounds a bit risky to me :-)

Re: Swappiness setting on a linux pg server

From

"Jim C. Nasby"

Date:

19 October 2006, 13:45:46

On Thu, Oct 19, 2006 at 06:39:22PM +0200, Tobias Brox wrote:
> [Jim C. Nasby - Thu at 11:31:26AM -0500]
> > Yeah, test setups are a good thing to have...
>
> We would need to replicate the production traffic as well to do reliable
> tests.  Well, we'll get to that one day ...

Marginally reliable tests are usually better than none at all. :)

> > The issue with pg_xlog is you don't need bandwidth... you need super-low
> > latency. The best way to accomplish that is to get a battery-backed RAID
> > controller that you can enable write caching on.
>
> Sounds a bit risky to me :-)

Well, you do need to understand what happens if the machine does lose
power... namely you have a limited amount of time to get power back to
the machine so that the controller can flush that data out. Other than
that, it's not very risky.

As for shared_buffers, conventional wisdom has been to use between 10%
and 25% of memory, bounding towards the lower end as you get into larger
quantities of memory. So in your case, 600M wouldn't be pushing things
much at all. Even 1G wouldn't be that out of the ordinary. Also remember
that the more memory for shared_buffers, the less for
sorting/hashes/etc. (work_mem)
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 13:53:58

[Jim C. Nasby - Thu at 11:45:32AM -0500]
> > > The issue with pg_xlog is you don't need bandwidth... you need super-low
> > > latency. The best way to accomplish that is to get a battery-backed RAID
> > > controller that you can enable write caching on.
> >
> > Sounds a bit risky to me :-)
>
> Well, you do need to understand what happens if the machine does lose
> power... namely you have a limited amount of time to get power back to
> the machine so that the controller can flush that data out. Other than
> that, it's not very risky.

We have burned ourself more than once due to unreliable raid controllers
...

> quantities of memory. So in your case, 600M wouldn't be pushing things
> much at all. Even 1G wouldn't be that out of the ordinary. Also remember
> that the more memory for shared_buffers, the less for
> sorting/hashes/etc. (work_mem)

What do you mean, a high value for the shared_buffers implicates I
can/should lower the work_mem value?  Or just that I should remember to
have more than enough memory for both work_mem, shared_buffers and OS
caches?  What is a sane value for the work_mem?  It's currently set to
8M.

Re: Swappiness setting on a linux pg server

From

"Jim C. Nasby"

Date:

19 October 2006, 14:00:52

On Thu, Oct 19, 2006 at 06:53:49PM +0200, Tobias Brox wrote:
> [Jim C. Nasby - Thu at 11:45:32AM -0500]
> > > > The issue with pg_xlog is you don't need bandwidth... you need super-low
> > > > latency. The best way to accomplish that is to get a battery-backed RAID
> > > > controller that you can enable write caching on.
> > >
> > > Sounds a bit risky to me :-)
> >
> > Well, you do need to understand what happens if the machine does lose
> > power... namely you have a limited amount of time to get power back to
> > the machine so that the controller can flush that data out. Other than
> > that, it's not very risky.
>
> We have burned ourself more than once due to unreliable raid controllers
> ...

Well, if you're buying unreliable hardware, there's not much you can
do... you're setting yourself up for problems.

> > quantities of memory. So in your case, 600M wouldn't be pushing things
> > much at all. Even 1G wouldn't be that out of the ordinary. Also remember
> > that the more memory for shared_buffers, the less for
> > sorting/hashes/etc. (work_mem)
>
> What do you mean, a high value for the shared_buffers implicates I
> can/should lower the work_mem value?  Or just that I should remember to
> have more than enough memory for both work_mem, shared_buffers and OS
> caches?  What is a sane value for the work_mem?  It's currently set to
> 8M.

The key is that there's enough memory for shared_buffers and work_mem
without going to swapping. If you're consuming that much work_mem I
wouldn't worry at all about OS caching.

What's reasonable for work_mem depends on your workload. If you've got
some reporting queries that you know aren't run very concurrently they
might benefit from large values of work_mem. For stats.distributed.net,
I set work_mem to something like 2MB in the config file, but the nightly
batch routines manually set it up to 256M or more, because I know that
those only run one at a time, and having that extra memory means a lot
of stuff that would otherwise have to spill to disk now stays in memory.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 14:10:37

[Jim C. Nasby - Thu at 12:00:39PM -0500]
> What's reasonable for work_mem depends on your workload. If you've got
> some reporting queries that you know aren't run very concurrently they
> might benefit from large values of work_mem. For stats.distributed.net,
> I set work_mem to something like 2MB in the config file, but the nightly
> batch routines manually set it up to 256M or more, because I know that
> those only run one at a time, and having that extra memory means a lot
> of stuff that would otherwise have to spill to disk now stays in memory.

That sounds like a good idea; indeed we do have some few heavy reporting
queries and they are not run much concurrently (the application checks
the pg_stat_activity table and will disallow reports to be taken out if
there is too much activity there).  We probably would benefit from
raising the work mem just for those reports, and lower it for the rest
of the connections.

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 14:13:08

[Jim C. Nasby - Thu at 12:00:39PM -0500]
> Well, if you're buying unreliable hardware, there's not much you can
> do... you're setting yourself up for problems.

I'm luckily not responsible for the hardware, but my general experience
tells that you never know anything about hardware reliability until the
hardware actually breaks :-)  It's not always so that more expensive
equipment equals better equipment.

Re: Swappiness setting on a linux pg server

From

Ron

Date:

19 October 2006, 16:11:01

At 12:35 PM 10/19/2006, Tobias Brox wrote:
>[Jim C. Nasby - Thu at 10:28:31AM -0500]
> > I think it'd be much better to experiment with using much larger
> > shared_buffers settings. The conventional wisdom there is from 7.x days
> > when you really didn't want a large buffer, but that doesn't really
> > apply with the new buffer management we got in 8.0. I know of one site
> > that doubled their performance by setting shared_buffers to 50% of
> > memory.
>
>I've upped it a bit, but it would require a server restart to get the
>new setting into effect.  This is relatively "expensive" for us.  Does
>anyone else share the viewpoint of Nasby, and does anyone have
>recommendation for a good value?  Our previous value was 200M, and I
>don't want to go to the extremes just yet.  We have 6G of memory
>totally.

Jim is correct that traditional 7.x folklore regarding shared buffer
size is nowhere near as valid for 8.x.  Jim tends to know what he is
talking about when speaking about pg operational issues.

Nonetheless, "YMMV".  The only sure way to know what is best for your
SW running on your HW under your load conditions is to test, test, test.

A= Find out how much RAM your OS image needs.
Usually 1/3 to 2/3 of a GB is plenty.

B= Find out how much RAM pg tasks need during typical peak usage and
how much each of those tasks is using.
This will tell you what work_mem should be.
Note that you may well find out that you have not been using the best
size for work_mem for some tasks when you investigate this.

(Total RAM) - A - B - (small amount for error margin) = 1st pass at
shared_buffers setting.

If this results in better performance that your current settings,
either declare victory and stop or cut the number in half and see
what it does to performance.

Then you can either set it to what experiment thus far has shown to
be best or use binary search to change the size of shared_buffers and
do experiments to your heart's content.

Ron

Re: Swappiness setting on a linux pg server

From

"Merlin Moncure"

Date:

19 October 2006, 16:44:12

On 10/19/06, Ron <rjpeace@earthlink.net> wrote:
> Nonetheless, "YMMV".  The only sure way to know what is best for your
> SW running on your HW under your load conditions is to test, test, test.

anybody have/know of some data on shared buffer settings on 8.1+?

merlin

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

19 October 2006, 16:57:22

[Ron - Thu at 03:10:35PM -0400]
> Jim is correct that traditional 7.x folklore regarding shared buffer
> size is nowhere near as valid for 8.x.  Jim tends to know what he is
> talking about when speaking about pg operational issues.

I would not doubt it, but it's always better to hear it from more people
:-)

> Nonetheless, "YMMV".  The only sure way to know what is best for your
> SW running on your HW under your load conditions is to test, test, test.

Certainly.  My time and possibilities for testing is not
that great at the moment, and in any case I believe some small
adjustments won't cause the really significant results.

In any case, our database server is not on fire at the moment and people
are not angry because of slow reports at the moment. (actually, I
started this thread out of nothing but curiousity ... triggered by
somebody complaining about his desktop windows computer swapping too
much :-) So, for this round of tunings I'm more than satisfied just
relying on helpful rules of the thumb.

> A= Find out how much RAM your OS image needs.
> Usually 1/3 to 2/3 of a GB is plenty.

A quick glance on "free" already revealed we are using less than 400 MB
out of 6G totally (with the 7.x-mindset that the OS should take care of
cacheing), and according to our previous settings, the shared buffers
was eating 200 MB of this - so most of our memory is free.

> B= Find out how much RAM pg tasks need during typical peak usage and
> how much each of those tasks is using.

I believe we have quite good control of the queries ... there are
safeguards to prevent most of the heavy queries to run concurrently, and
the lighter queries shouldn't spend much memory, so it should be safe
for us to bump up the setting a bit.

In any case, I guess the OS is not that bad at handling the memory
issue.  Unused memory will be used relatively intelligently (i.e.
buffering the temp files used by sorts) and overuse of memory will cause
some swapping (which is probably quite much worse than using temp files
directly, but a little bit of swapping is most probably not a disaster).

Re: Swappiness setting on a linux pg server

From

"Kevin"

Date:

19 October 2006, 17:57:41

> I just came to think about /proc/sys/swappiness ...
>
> When this one is set to a high number (say, 100 - which is maximum), the
> kernel will aggressively swap out all memory that is not beeing
> accessed, to allow more memory for caches.  For a postgres server, OS
> caches are good, because postgres relies on the OS to cache indices,
> etc.  At the other hand, for any production server it's very bad to
> experience bursts of iowait when/if the swapped out memory becomes
> needed - particularly if the server is used for interactive queries,
> like serving a web application.

This is very useful on smaller systems where memory is a scarce commodity.
 I have a Xen virtual server with 128MB ram.  I noticed a big improvement
in query performance when I upped swappiness to 80.  It gave just enough
more memory to fs buffers so my common queries ran in memory.

Yes, throwing more ram at it is usually the better solution, but it's nice
linux gives you that knob to turn when adding ram isn't an option, at
least for me.


--
Kevin

Re: Swappiness setting on a linux pg server

From

Tobias Brox

Date:

20 October 2006, 09:19:10

[Jim C. Nasby - Thu at 11:31:26AM -0500]
> The issue with pg_xlog is you don't need bandwidth... you need super-low
> latency. The best way to accomplish that is to get a battery-backed RAID
> controller that you can enable write caching on. In fact, if the
> controller is good enough, you can theoretically get away with just
> building one big RAID10 and letting the controller provide the
> low-latency fsyncs that pg_xlog depends on.

I was talking a bit about our system administrator.  We're running 4
disks in raid 1+0 for the database and 2 disks in raid 1 for the WALs
and for OS.  He wasn't really sure if we had write cacheing on the RAID
controller or not.  He pasted me some lines from the dmesg:

 sda: asking for cache data failed
 sda: assuming drive cache: write through
 failed line is expected from these controllers
 0000:02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller 4 (rev 06)

I think we're going to move system logs, temporary files and backup
files from the wal-disks to the db-disks.  Since our database aren't on
fire for the moment, I suppose we'll wait moving the rest of the OS :-)