Thread: Memory Allocation
Hey all,
This may be more of a Linux question than a PG question, but I’m wondering if any of you have successfully allocated more than 8 GB of memory to PG before.
I have a fairly robust server running Ubuntu Hardy Heron, 24 GB of memory, and I’ve tried to commit half the memory to PG’s shared buffer, but it seems to fail. I’m setting the kernel shared memory accordingly using sysctl, which seems to work fine, but when I set the shared buffer in PG and restart the service, it fails if it’s above about 8 GB. I actually have it currently set at 6 GB.
I don’t have the exact failure message handy, but I can certainly get it if that helps. Mostly I’m just looking to know if there’s any general reason why it would fail, some inherent kernel or db limitation that I’m unaware of.
If it matters, this DB is going to be hosting and processing hundreds of GB and eventually TB of data, it’s a heavy read-write system, not transactional processing, just a lot of data file parsing (python/bash) and bulk loading. Obviously the disks get hit pretty hard already, so I want to make the most of the large amount of available memory wherever possible. So I’m trying to tune in that direction.
Any info is appreciated.
Thanks!
On Wednesday 26 November 2008, "Ryan Hansen" <ryan.hansen@brightbuilders.com> wrote: > This may be more of a Linux question than a PG question, but I'm > wondering if any of you have successfully allocated more than 8 GB of > memory to PG before. > CentOS 5, 24GB shared_buffers on one server here. No problems. -- Alan
Ryan Hansen wrote: > > Hey all, > > This may be more of a Linux question than a PG question, but I’m > wondering if any of you have successfully allocated more than 8 GB of > memory to PG before. > > I have a fairly robust server running Ubuntu Hardy Heron, 24 GB of > memory, and I’ve tried to commit half the memory to PG’s shared > buffer, but it seems to fail. > Though not sure why this is happening or whether it is normal, I would suggest that such setting is maybe too high. From the Annotated postgresql.conf document at http://www.powerpostgresql.com/Downloads/annotated_conf_80.html, the suggested range is 8 to 400MB. They specifically say that it should never be set to more than 1/3 of the available memory, which in your case is precisely the 8GB figure (I guess that's just a coincidence --- I doubt that the server would be written so that it fails to start if shared_buffers is more than 1/3 of available RAM) Another important parameter that you don't mention is the effective_cache_size, which that same document suggests should be about 2/3 of available memory. (this tells the planner the amount of data that it can "probabilistically" expect to reside in memory due to caching, and as such, the planner is likely to produce more accurate estimates and thus better query optimizations). Maybe you could set shared_buffers to, say, 1 or 2GB (that's already beyond the recommended figure, but given that you have 24GB, it may not hurt), and then effective_cache_size to 16GB or so? HTH, Carlos --
"Ryan Hansen" <ryan.hansen@brightbuilders.com> writes: > I have a fairly robust server running Ubuntu Hardy Heron, 24 GB of memory, > and I've tried to commit half the memory to PG's shared buffer, but it seems > to fail. I'm setting the kernel shared memory accordingly using sysctl, > which seems to work fine, but when I set the shared buffer in PG and restart > the service, it fails if it's above about 8 GB. Fails how? And what PG version is that? FWIW, while there are various schools of thought on how large to make shared_buffers, pretty much everybody agrees that half of physical RAM is not the sweet spot. What you're likely to get is maximal inefficiency with every active disk page cached twice --- once in kernel space and once in shared_buffers. regards, tom lane
Tuning for bulk loading:
Make sure the Linux kernel paramters in /proc/sys/vm related to the page cache are set well.
Set swappiness to 0 or 1.
Make sure you understand and configure /proc/sys/vm/dirty_background_ratio
and /proc/sys/vm/dirty_ratio well.
With enough RAM the default on some kernel versions is way, way off (40% of RAM with dirty pages! yuck).
http://www.westnet.com/~gsmith/content/linux-pdflush.htm
If postgres is doing a lot of caching for you you probably want dirty_ratio at 10% or less, and you'll want the OS to start flushing to disk sooner rather than later. A dirty_background_ratio of 3% with 24GB of RAM is 720MB -- a pretty big buffer. I would not personally want this buffer to be larger than 5 seconds of max write speed of the disk I/O.
You'll need to tune your background writer to be aggressive enough to actually write data fast enough so that checkpoints don't suck, and tune your checkpoint size and settings as well. Turn on checkpoint logging on the database and run tests while looking at the output of those. Ideally, most of your batch writes have made it to the OS before the checkpoint, and the OS has actually started moving most of it to disk. If your settings are wrong, you'll have the data buffered twice, and most or nearly all of it will be in memory when the checkpoint happens, and the checkpoint will take a LONG time. The default Linux settings + default postgres settings + large shared_buffers will almost guarantee this situation for bulk loads. Both have to be configured with complementary settings. If you have a large postgres buffer, the OS buffer should be small and write more aggressively. If you have a small postgres buffer, the OS can be more lazy and cache much more.
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Ryan Hansen
Sent: Wednesday, November 26, 2008 2:10 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Memory Allocation
Hey all,
This may be more of a Linux question than a PG question, but I’m wondering if any of you have successfully allocated more than 8 GB of memory to PG before.
I have a fairly robust server running Ubuntu Hardy Heron, 24 GB of memory, and I’ve tried to commit half the memory to PG’s shared buffer, but it seems to fail. I’m setting the kernel shared memory accordingly using sysctl, which seems to work fine, but when I set the shared buffer in PG and restart the service, it fails if it’s above about 8 GB. I actually have it currently set at 6 GB.
I don’t have the exact failure message handy, but I can certainly get it if that helps. Mostly I’m just looking to know if there’s any general reason why it would fail, some inherent kernel or db limitation that I’m unaware of.
If it matters, this DB is going to be hosting and processing hundreds of GB and eventually TB of data, it’s a heavy read-write system, not transactional processing, just a lot of data file parsing (python/bash) and bulk loading. Obviously the disks get hit pretty hard already, so I want to make the most of the large amount of available memory wherever possible. So I’m trying to tune in that direction.
Any info is appreciated.
Thanks!
>>> Scott Carey <scott@richrelevance.com> wrote: > Set swappiness to 0 or 1. We recently converted all 72 remote county databases from 8.2.5 to 8.3.4. In preparation we ran a test conversion of a large county over and over with different settings to see what got us the best performance. Setting swappiness below the default degraded performance for us in those tests for identical data, same hardware, no other changes. Our best guess is that code which really wasn't getting called got swapped out leaving more space in the OS cache, but that's just a guess. Of course, I'm sure people would not be recommending it if they hadn't done their own benchmarks to confirm that this setting actually improved things in their environments, so the lesson here is to test for your environment when possible. -Kevin
Swappiness optimization is going to vary. Definitely test on your own. For a bulk load database, with large page cache, swappines = 60 (default) is _GUARANTEED_ to force the OS to swap out someof Postgres while in heavy use. This is heavily dependent on the page cache size, work_mem size, and concurrency. I've had significantly increased performance setting this value low (1000x ! -- if your DB starts swapping postgres, you'reperformance-DEAD). The default has the OS targeting close to 60% of the memory for page cache. On a 32GB server,with 7GB postgres buffer cache, several concurrent queries reading GB's of data and using 500MB + work_mem (huge aggregates),the default swappiness will choose to page out postgres with about 19GB of disk page cache left to evict, withdisastrous results. And that is a read-only test. Tests with writes can trigger it earlier if combined with bad dirty_bufferssettings. The root of the problem is that the Linux paging algorithm estimates that I/O for file read access is as costly as I/O forpaging. A reasonable assumption for a desktop, a ridiculously false assumption for a large database with high capacityDB file I/O and a much lower capability swap file. Not only that -- page in is almost always near pure random reads,but DB I/O is often sequential. So losing 100M of cached db file takes a lot less time to scan back in than 100MBof the application. If you do have enough other applications that are idle that take up RAM that should be pushed out to disk from time to time(perhaps your programs that are doing the bulk loading?) a higher value is useful. Although it is not exact, think ofthe swappiness value as the percentage of RAM that the OS would prefer page cache to applications (very roughly). The more RAM you have and the larger your postgres memory usage, the lower the swappiness value should be. 60% of 24GB is~14.5GB, If you have that much stuff that is in RAM that should be paged out to save space, try it. I currently use a value of 1, on a 32GB machine, and about 600MB of 'stuff' gets paged out normally, 1400MB under heavy load. This is a dedicated machine. Higher values page out more stuff that increases the cache size and helps performancea little, but under the heavy load, it hits the paging wall and falls over. The small improvement in performancewhen the system is not completely stressed is not worth risking hitting the wall for me. ***For a bulk load database, one is optimizing for _writes_ and extra page cache doesn't help writes like it does reads.*** When I use a machine with misc. other lower priority apps and less RAM, I have found larger values to be helpful. If your DB is configured with a low shared_buffers and small work_mem, you probably want the OS to use that much memory fordisk pages, and again a higher swappiness may be more optimal. Like all of these settings, tune to your application and test. Many of these settings are things that go hand in hand withothers, but alone don't make as much sense. Tuning Postgres to do most of the caching and making the OS get out of theway is far different than tuning the OS to do as much caching work as possible and minimizing postgres. Which of thosetwo strategies is best is highly application dependent, somewhat O/S dependent, and also hardware dependent. -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Wednesday, November 26, 2008 3:09 PM To: Ryan Hansen; pgsql-performance@postgresql.org; Scott Carey Subject: Re: [PERFORM] Memory Allocation >>> Scott Carey <scott@richrelevance.com> wrote: > Set swappiness to 0 or 1. We recently converted all 72 remote county databases from 8.2.5 to 8.3.4. In preparation we ran a test conversion of a large county over and over with different settings to see what got us the best performance. Setting swappiness below the default degraded performance for us in those tests for identical data, same hardware, no other changes. Our best guess is that code which really wasn't getting called got swapped out leaving more space in the OS cache, but that's just a guess. Of course, I'm sure people would not be recommending it if they hadn't done their own benchmarks to confirm that this setting actually improved things in their environments, so the lesson here is to test for your environment when possible. -Kevin
I'm hoping that through compare/contrast we might help someone start closer to their own best values.... >>> Scott Carey <scott@richrelevance.com> wrote: > Tests with writes can trigger it earlier if combined with bad dirty_buffers > settings. We've never, ever modified dirty_buffers settings from defaults. > The root of the problem is that the Linux paging algorithm estimates that > I/O for file read access is as costly as I/O for paging. A reasonable > assumption for a desktop, a ridiculously false assumption for a large > database with high capacity DB file I/O and a much lower capability swap > file. Our swap file is not on lower speed drives. > If you do have enough other applications that are idle that take up RAM that > should be pushed out to disk from time to time (perhaps your programs that > are doing the bulk loading?) a higher value is useful. Bulk loading was ssh cat | psql. > The more RAM you have and the larger your postgres memory usage, the lower > the swappiness value should be. I think the test environment had 8 GB RAM with 256 MB in shared_buffers. For the conversion we had high work_mem and maintenance_work_mem settings, and we turned fsync off, along with a few other settings we would never using during production. > I currently use a value of 1, on a 32GB machine, and about 600MB of 'stuff' > gets paged out normally, 1400MB under heavy load. Outside of bulk load, we've rarely seen anything swap, even under load. > ***For a bulk load database, one is optimizing for _writes_ and extra page > cache doesn't help writes like it does reads.*** I'm thinking that it likely helps when indexing tables for which data has recently been loaded. It also might help minimize head movement and/or avoid the initial disk hit for a page which subsequently get hint bits set . > Like all of these settings, tune to your application and test. We sure seem to agree on that. -Kevin