Thread: Adding more memory = hugh cpu load
Hi, Yesterday, a customer increased the server memory from 16GB to 48GB. Today, the load of the server hit 40 ~ 50 points. With 16 GB, the load not surpasses 5 ~ 8 points. The only parameter that I changed is effective_cache_size (from 14 GB to 40GB) and shared_buffers (from 1 GB to 5 GB). Setting the values back does not take any effect. This server use CentOS 5.5 (2.6.18-194.3.1.el5.centos.plus - X86_64). Should I change some vm parameters to this specific kernel ? Thanks for any help.
alexandre - aldeia digital <adaldeia@gmail.com> wrote: > Yesterday, a customer increased the server memory from 16GB to > 48GB. That's usually for the better, but be aware that on some hardware adding RAM beyond a certain point causes slower RAM access. Without knowing more details, it's impossible to say whether that's the case here. > Today, the load of the server hit 40 ~ 50 points. > With 16 GB, the load not surpasses 5 ~ 8 points. Are you talking about "load average", CPU usage, or something else? > The only parameter that I changed is effective_cache_size (from 14 > GB to 40GB) and shared_buffers (from 1 GB to 5 GB). Setting the > values back does not take any effect. What version of PostgreSQL is this? What settings are in effect? How many user connections are active at one time? How many cores are there, of what type? What's the storage system? What kind of load is this? http://wiki.postgresql.org/wiki/Guide_to_reporting_problems http://wiki.postgresql.org/wiki/Server_Configuration -Kevin
On 10/10/2011 08:26 AM, alexandre - aldeia digital wrote: > Yesterday, a customer increased the server memory from 16GB to 48GB. > > Today, the load of the server hit 40 ~ 50 points. > With 16 GB, the load not surpasses 5 ~ 8 points. That's not entirely surprising. The problem with having lots of memory is... that you have lots of memory. The operating system likes to cache, and this includes writes. Normally this isn't a problem, but with 48GB of RAM, the defaults (for CentOS 5.5 in particular) are to use up to 40% of that to cache writes. The settings you're looking for are in: /proc/sys/vm/dirty_background_ratio /proc/sys/vm/dirty_ratio You can set these by putting lines in your /etc/sysctl.conf file: vm.dirty_background_ratio = 1 vm.dirty_ratio = 10 And then calling: sudo sysctl -p The first number, the background ratio, tells the memory manager to start writing to disk as soon as 1% of memory is used. The second is like a maximum of memory that can be held for caching. If the number of pending writes exceeds this, the system goes into synchronous write mode, and blocks all other write activity until it can flush everything out to disk. You really, really want to avoid this. The defaults in older Linux systems were this high mostly to optimize for desktop performance. For CentOS 5.5, the defaults are 10% and 40%, which doesn't seem like a lot. But for servers with tons of ram, 10% of 48GB is almost 5GB. That's way bigger than all but the largest RAID or controller cache, which means IO waits, and thus high load. Those high IO waits cause a kind of cascade that slowly cause writes to back up, making it more likely you'll reach the hard 40% limit which causes a system flush, and then you're in trouble. You can actually monitor this by checking /proc/meminfo: grep -A1 Dirty /proc/meminfo The 'Dirty' line tells you how much memory *could* be written to disk, and the 'Writeback' line tells you how much the system is trying to write. You want that second line to be 0 or close to it, as much as humanly possible. It's also good to keep Dirty low, because it can be an indicator that the system is about to start uncontrollably flushing if it gets too high. Generally it's good practice to keep dirty_ratio lower than the size of your disk controller cache, but even high-end systems only give 256MB to 1GB of controller cache. Newer kernels have introduced dirty_bytes and dirty_background_bytes, which lets you set a hard byte-specified limit instead of relying on some vague integer percentage of system memory. This is better for systems with vast amounts of memory that could cause these kinds of IO spikes. Of course, in order to use those settings, your client will have to either install a custom kernel, or upgrade to CentOS 6. Try the 1% first, and it may work out. Some kernels have a hard 5% limit on dirty_background_ratio, but the one included in CentOS 5.5 does not. You can even set it to 0, but your IO throughput will take a nosedive, because at that point, it's always writing to disk without any effective caching at all. The reason we set dirty_ratio to 10%, is because we want to reduce the total amount of time a synchronous IO block lasts. You can probably take that as low as 5%, but be careful and test to find your best equilibrium point. You want it at a point it rarely blocks, but if it does, it's over quickly. There's more info here: http://www.westnet.com/~gsmith/content/linux-pdflush.htm (I only went on about this because we had the same problem when we increased from 32GB to 72GB. It was a completely unexpected reaction, but a manageable one.) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
> That's not entirely surprising. The problem with having lots of memory is... > that you have lots of memory. The operating system likes to cache, and this > includes writes. Normally this isn't a problem, but with 48GB of RAM, the > defaults (for CentOS 5.5 in particular) are to use up to 40% of that to cache > writes. I don't understand: don't you want postgresql to issue the fsync calls when it "makes sense" (and configure them), rather than having the OS decide when it's best to flush to disk? That is: don't you want all the memory to be used for caching, unless postgresql says otherwise (calling fsync), instead of "as soon as 1% of memory is used"?
On 10/10/2011 10:14 AM, Leonardo Francalanci wrote: > I don't understand: don't you want postgresql to issue the fsync > calls when it "makes sense" (and configure them), rather than having > the OS decide when it's best to flush to disk? That is: don't you > want all the memory to be used for caching, unless postgresql says > otherwise (calling fsync), instead of "as soon as 1% of memory is > used"? You'd think that, which is why this bites so many people. That's not quite how it works in practice, though. OS cache is a lot more than altered database and WAL files, which do get fsync'd frequently. Beyond that, you need to worry about what happens *between* fsync calls. On a highly loaded database, or even just a database experiencing heavy write volume due to some kind of ETL process, your amount of dirty memory may increase much more quickly than you expect. For example, say your checkpoint_timeout setting is the default of five minutes. An ETL process runs that loads 2GB of data into a table, and you're archiving transaction logs. So you now have three possible write vectors, not including temp tables and what not. And that's only for that connection; this gets more complicated if you have other OLTP connections on the same DB. So your memory is now flooded with 2-6GB of data, and that's easy for memory to handle, and it can do so quickly. With 48GB of RAM, that's well within caching range, so the OS never writes anything until the fsync call. Then the database makes the fsync call, and suddenly the OS wants to flush 2-6GB of data straight to disk. Without that background trickle, you now have a flood that only the highest-end disk controller or a backing-store full of SSDs or PCIe NVRAM could ever hope to absorb. That write flood massively degrades your read IOPS, and degrades future writes until it's done flushing, so all of your disk IO is saturated, further worsening the situation. Now you're getting closer and closer to your dirty_ratio setting, at which point the OS will effectively stop responding to anything, so it can finally finish flushing everything to disk. This can take a couple minutes, but it's not uncommon for these IO storms to last over half an hour depending on the quality of the disks/controller in question. During this time, system load is climbing precipitously, and clients are getting query timeouts. Adding more memory can actually make your system performance worse if you don't equally increase the capability of your RAID/SAN/whatever to compensate for increased size of write chunks. This is counter-intuitive, but completely borne out by tests. The kernel developers agree, or we wouldn't have dirty_bytes, or dirty_background_bytes, and they wouldn't have changed the defaults to 5% and 10% instead of 10% and 40%. It's just one of those things nobody expected until machines with vast amounts of RAM started becoming common. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On 10/10/2011 10:04 AM, Shaun Thomas wrote: > The problem with having lots of memory is... that you have lots of > memory. The operating system likes to cache, and this includes writes. > Normally this isn't a problem, but with 48GB of RAM, the defaults (for > CentOS 5.5 in particular) are to use up to 40% of that to cache writes. I make the same sort of tuning changes Shaun suggested on every CentOS 5 system I come across. That said, you should turn on log_checkpoints in your postgresql.conf and see whether the "sync=" numbers are high. That will help prove or disprove that the slowdown you're seeing is from too much write caching. You also may be able to improve that by adjusting checkpoint_segments/checkpoint_timeout, or *decreasing* shared_buffers. More about this at http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm There are some other possibilities, too, like that memory addition can actually causing average memory speed to drop as Kevin mentioned. I always benchmark with stream-scaling: https://github.com/gregs1104/stream-scaling before and after a RAM size change, to see whether things are still as fast or not. It's hard to do that in the position you're in now though. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
> Then the > database makes the fsync call, and suddenly the OS wants to flush 2-6GB of data > straight to disk. Without that background trickle, you now have a flood that > only the highest-end disk controller or a backing-store full of SSDs or PCIe > NVRAM could ever hope to absorb. Isn't checkpoint_completion_target supposed to deal exactly with that problem? Plus: if 2-6GB is too much, why not decrease checkpoint_segments? Or checkpoint_timeout? > The kernel > developers agree, or we wouldn't have dirty_bytes, or > dirty_background_bytes, and they wouldn't have changed the defaults to 5% > and 10% instead of 10% and 40%. I'm not saying that those kernel parameters are "useless"; I'm saying they are used in the same way as the checkpoint_segments, checkpoint_timeout and checkpoint_completion_target are used by postgresql; and on a postgresql-only system I would rather have postgresql look after the fsync calls, not the OS.
Em 10-10-2011 11:04, Shaun Thomas wrote: > That's not entirely surprising. The problem with having lots of memory > is... that you have lots of memory. The operating system likes to cache, > and this includes writes. Normally this isn't a problem, but with 48GB > of RAM, the defaults (for CentOS 5.5 in particular) are to use up to 40% > of that to cache writes. Hi Shawn and all, After change the parameters in sysctl.conf, during some time I see that load average downs. But the system loads grow again. Dirty memory in meminfo is about 150MB and Whriteback is mostly 0 kB. I drop checkpoint_timeout to 1min and turn on log_checkpoint: <2011-10-10 14:18:48 BRT >LOG: checkpoint complete: wrote 6885 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=29.862 s, sync=28.466 s, total=58.651 s <2011-10-10 14:18:50 BRT >LOG: checkpoint starting: time <2011-10-10 14:19:40 BRT >LOG: checkpoint complete: wrote 6415 buffers (1.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=29.981 s, sync=19.960 s, total=50.111 s <2011-10-10 14:19:50 BRT >LOG: checkpoint starting: time <2011-10-10 14:20:45 BRT >LOG: checkpoint complete: wrote 6903 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=29.653 s, sync=25.504 s, total=55.477 s <2011-10-10 14:20:50 BRT >LOG: checkpoint starting: time <2011-10-10 14:21:45 BRT >LOG: checkpoint complete: wrote 7231 buffers (1.1%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=29.911 s, sync=24.899 s, total=55.037 s <2011-10-10 14:21:50 BRT >LOG: checkpoint starting: time <2011-10-10 14:22:45 BRT >LOG: checkpoint complete: wrote 6569 buffers (1.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=29.947 s, sync=25.303 s, total=55.342 s <2011-10-10 14:22:50 BRT >LOG: checkpoint starting: time <2011-10-10 14:23:44 BRT >LOG: checkpoint complete: wrote 5711 buffers (0.9%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=30.036 s, sync=24.299 s, total=54.507 s <2011-10-10 14:23:50 BRT >LOG: checkpoint starting: time <2011-10-10 14:24:50 BRT >LOG: checkpoint complete: wrote 6744 buffers (1.0%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=29.946 s, sync=29.792 s, total=60.223 s <2011-10-10 14:24:50 BRT >LOG: checkpoint starting: time [root@servernew data]# vmstat 1 30 -w procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st 22 0 2696 8290280 117852 38431540 0 0 328 59 9 17 17 3 79 1 0 34 0 2696 8289288 117852 38432268 0 0 8 2757 2502 4148 80 20 0 0 0 39 1 2696 8286128 117852 38432348 0 0 24 622 2449 4008 80 20 0 0 0 41 0 2696 8291100 117852 38433792 0 0 64 553 2487 3419 83 17 0 0 0 42 1 2696 8293596 117852 38434556 0 0 232 776 2372 2779 83 17 0 0 0 44 1 2696 8291984 117852 38435252 0 0 56 408 2388 3012 82 18 0 0 0 26 0 2696 8289884 117856 38435924 0 0 64 698 2486 3283 83 17 0 0 0 31 0 2696 8286788 117856 38437052 0 0 88 664 2452 3385 82 18 0 0 0 42 0 2696 8284500 117868 38437516 0 0 176 804 2492 3876 83 17 0 0 0 44 0 2696 8281392 117868 38438860 0 0 24 504 2338 2916 80 20 0 0 0 44 0 2696 8278540 117868 38439152 0 0 32 568 2337 2937 83 17 0 0 0 45 0 2696 8280280 117868 38440348 0 0 72 402 2492 3635 84 16 0 0 0 35 2 2696 8279928 117868 38440388 0 0 184 600 2492 3835 84 16 0 0 0 41 0 2696 8275948 117872 38441712 0 0 136 620 2624 4187 79 21 0 0 0 37 0 2696 8274392 117872 38442372 0 0 24 640 2492 3824 84 16 0 0 0 40 0 2696 8268548 117872 38443120 0 0 0 624 2421 3584 81 19 0 0 0 32 0 2696 8268308 117872 38443652 0 0 16 328 2384 3767 81 19 0 0 0 38 0 2696 8281820 117872 38427472 0 0 72 344 2505 3810 81 19 0 0 0 41 0 2696 8279776 117872 38427976 0 0 16 220 2496 3428 84 16 0 0 0 27 0 2696 8283252 117872 38428508 0 0 112 312 2563 4279 81 19 0 0 0 36 0 2696 8280332 117872 38429288 0 0 48 544 2626 4406 80 20 0 0 0 30 0 2696 8274372 117872 38429372 0 0 24 472 2442 3646 80 19 0 0 0 38 0 2696 8272144 117872 38429956 0 0 152 256 2465 4039 83 16 0 0 0 41 2 2696 8266496 117872 38430324 0 0 56 304 2414 3206 82 18 0 0 0 32 0 2696 8267188 117872 38431068 0 0 64 248 2540 4211 78 22 0 0 0 37 0 2696 8278876 117872 38431324 0 0 56 264 2547 4523 81 19 0 0 0 43 1 2696 8277460 117872 38431588 0 0 40 8627 2695 4143 82 18 0 0 0 41 0 2696 8272556 117872 38431716 0 0 40 216 2495 3744 79 21 0 0 0 40 1 2696 8267292 117876 38433204 0 0 192 544 2586 4437 77 23 0 0 0 34 1 2696 8263204 117876 38433628 0 0 320 929 2841 5166 78 22 0 0 0 Notice that we have no idle % in cpu column. [root@servernew data]# uptime 14:26:47 up 2 days, 3:26, 4 users, load average: 48.61, 46.12, 40.47 My client wants to remove the extra memory... :/ Best regards.
alexandre - aldeia digital <adaldeia@gmail.com> wrote: > Notice that we have no idle % in cpu column. So they're making full use of all the CPUs they paid for. That in itself isn't a problem. Unfortunately you haven't given us nearly enough information to know whether there is indeed a problem, or if so, what. What was throughput before? What is it now? How has latency been affected? And all those unanswered questions from my first email.... The problem *might* be something along the lines of most of the discussion on the thread. It might not be. I just don't know yet, myself. > 14:26:47 up 2 days, 3:26, 4 users, load average: 48.61, > 46.12, 40.47 This has me wondering again about your core count and your user connections. > My client wants to remove the extra memory... :/ Maybe we should identify the problem. It might be that a connection pooler is the solution. On the other hand, if critical production applications are suffering, it might make sense to take this out of production and figure out a safer place to test things and sort this out. -Kevin
Em 10-10-2011 14:46, Kevin Grittner escreveu: > alexandre - aldeia digital<adaldeia@gmail.com> wrote: > >> Notice that we have no idle % in cpu column. > > So they're making full use of all the CPUs they paid for. That in > itself isn't a problem. Unfortunately you haven't given us nearly > enough information to know whether there is indeed a problem, or if > so, what. What was throughput before? What is it now? How has > latency been affected? And all those unanswered questions from my > first email.... > > The problem *might* be something along the lines of most of the > discussion on the thread. It might not be. I just don't know yet, > myself. From the point of view of the client, the question is simple: until the last friday (with 16 GB of RAM), the load average of server rarely surpasses 4. Nothing change in normal database use. Tonight, we will remove the extra memory. :/ Best regards.
alexandre - aldeia digital <adaldeia@gmail.com> wrote: > From the point of view of the client, the question is simple: > until the last friday (with 16 GB of RAM), the load average of > server rarely surpasses 4. Nothing change in normal database use. Really? The application still performs as well or better, and it's the load average they care about? How odd. If they were happy with performance before the RAM was added, why did they add it? If they weren't happy with performance, what led them to believe that adding more RAM would help? If there's a performance problem, there's generally one bottleneck which is the limit, with one set of symptoms. When you remove that bottleneck and things get faster, you may well have a new bottleneck with different symptoms. (These symptoms might include high load average or CPU usage, for example.) You then figure out what is causing *that* bottleneck, and you can make things yet faster. In this whole thread you have yet to give enough information to know for sure whether there was or is any performance problem, or what the actual bottleneck is. I think you'll find that people happy to help identify the problem and suggest solutions if you provide that information. -Kevin
On 10/10/2011 12:31 PM, alexandre - aldeia digital wrote: > <2011-10-10 14:18:48 BRT >LOG: checkpoint complete: wrote 6885 buffers > (1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; > write=29.862 s, sync=28.466 s, total=58.651 s 28.466s sync time?! That's horrifying. At this point, I want to say the increase in effective_cache_size or shared_buffers triggered the planner to change one of your plans significantly enough it's doing a ton more disk IO and starving out your writes. Except you said you changed it back and it's still misbehaving. This also reminds me somewhat of an issue Greg mentioned a while back with xlog storms in 9.0 databases. I can't recall how he usually "fixed" these, though. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
Em 10-10-2011 16:39, Kevin Grittner escreveu: > alexandre - aldeia digital<adaldeia@gmail.com> wrote: > >> From the point of view of the client, the question is simple: >> until the last friday (with 16 GB of RAM), the load average of >> server rarely surpasses 4. Nothing change in normal database use. > > Really? The application still performs as well or better, and it's > the load average they care about? How odd. > > If they were happy with performance before the RAM was added, why > did they add it? If they weren't happy with performance, what led > them to believe that adding more RAM would help? If there's a > performance problem, there's generally one bottleneck which is the > limit, with one set of symptoms. When you remove that bottleneck > and things get faster, you may well have a new bottleneck with > different symptoms. (These symptoms might include high load average > or CPU usage, for example.) You then figure out what is causing > *that* bottleneck, and you can make things yet faster. Calm down: if the client plans to add , for example, another database in his server in a couple of weeks, he must only upgrade when this new database come to life and add another point of doubt ??? IMHO, the reasons to add MEMORY does not matters in this case. I came to the list to see if anyone else has experienced the same problem, that not necessarily is related with Postgres. Shaun and Greg apparently had the same the same problems in CentOS and the information provided by they helped too much...
alexandre - aldeia digital <adaldeia@gmail.com> wrote: > I came to the list to see if anyone else has experienced the same > problem A high load average or low idle CPU isn't a problem, it's a potentially useful bit of information in diagnosing a problem. I was hoping to hear what the actual problem was, since I've had a few problems in high RAM situations, but the solutions depend on what the actual problems are. I don't suppose you saw periods where queries which normally run very quickly (say in a millisecond or less) were suddenly taking tens of seconds to run -- "stalling" and then returning to normal? Because if I knew you were having a problem like *that* I might have been able to help. Same for other set of symptoms; it's just the suggestions would have been different. And the suggestions would have depended on what your system looked like besides the RAM. If you're satisfied with how things are running with less RAM, though, there's no need. -Kevin
On Mon, Oct 10, 2011 at 1:52 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> I came to the list to see if anyone else has experienced the sameA high load average or low idle CPU isn't a problem, it's a
> problem
potentially useful bit of information in diagnosing a problem. I
was hoping to hear what the actual problem was, since I've had a few
problems in high RAM situations, but the solutions depend on what
the actual problems are. I don't suppose you saw periods where
queries which normally run very quickly (say in a millisecond or
less) were suddenly taking tens of seconds to run -- "stalling" and
then returning to normal? Because if I knew you were having a
problem like *that* I might have been able to help. Same for other
set of symptoms; it's just the suggestions would have been
different. And the suggestions would have depended on what your
system looked like besides the RAM.
If you're satisfied with how things are running with less RAM,
though, there's no need.
The original question doesn't actually say that performance has gone down, only that cpu utilization has gone up. Presumably, with lots more RAM, it is blocking on I/O a lot less, so it isn't necessarily surprising that CPU utilization has gone up. The only problem would be if db performance has gotten worse. Maybe I missed a message where that was covered? I don't see it in the original query to the list.
On Tue, Oct 11, 2011 at 12:02 AM, Samuel Gendler <sgendler@ideasculptor.com> wrote: > The original question doesn't actually say that performance has gone down, > only that cpu utilization has gone up. Presumably, with lots more RAM, it is > blocking on I/O a lot less, so it isn't necessarily surprising that CPU > utilization has gone up. The only problem would be if db performance has > gotten worse. Maybe I missed a message where that was covered? I don't see > it in the original query to the list. Load average (which is presumably the metric in question) includes both processes using the CPU and processes waiting for I/O. So it *would* be strange for load average to go up like that, if database configuration remains the same (ie: equal query plans)
On 10/10/2011 01:31 PM, alexandre - aldeia digital wrote: > I drop checkpoint_timeout to 1min and turn on log_checkpoint: > > <2011-10-10 14:18:48 BRT >LOG: checkpoint complete: wrote 6885 > buffers (1.1%); 0 transaction log file(s) added, 0 removed, 1 > recycled; write=29.862 s, sync=28.466 s, total=58.651 s > <2011-10-10 14:18:50 BRT >LOG: checkpoint starting: time Sync times that go to 20 seconds suggest there's a serious problem here somewhere. But it would have been better to do these changes one at a time: turn on log_checkpoints, collect some data, then try lowering checkpoint_timeout. A checkpoint every minute is normally a bad idea, so that change may have caused this other issue. > procs -------------------memory------------------ ---swap-- > -----io---- --system-- -----cpu------- > r b swpd free buff cache si so bi > bo in cs us sy id wa st > 34 0 2696 8289288 117852 38432268 0 0 8 > 2757 2502 4148 80 20 0 0 0 > 39 1 2696 8286128 117852 38432348 0 0 24 > 622 2449 4008 80 20 0 0 0 > 41 0 2696 8291100 117852 38433792 0 0 64 > 553 2487 3419 83 17 0 0 0 > ...Notice that we have no idle % in cpu column. You also have no waiting for I/O! This is just plain strange; checkpoint sync time spikes with no I/O waits I've never seen before. System time going to 20% isn't normal either. I don't know what's going on with this server. What I would normally do in this case is use "top -c" to see what processes are taking up so much runtime, and then look at what they are doing with pg_stat_activity. You might see the slow processes in the log files by setting log_min_duration_statement instead. I'd be suspicious of Linux given your situation though. I wonder if increasing the memory is a coincidence, and the real cause is something related to the fact that you had to reboot to install it. You might have switched to a newer kernel in the process too, for example; I'd have to put a kernel bug on the list of suspects with this unusual vmstat output. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 10/10/2011 12:14 PM, Leonardo Francalanci wrote: > >> database makes the fsync call, and suddenly the OS wants to flush 2-6GB of data >> straight to disk. Without that background trickle, you now have a flood that >> only the highest-end disk controller or a backing-store full of SSDs or PCIe >> NVRAM could ever hope to absorb. >> > > Isn't checkpoint_completion_target supposed to deal exactly with that problem? > checkpoint_completion_targets spreads out the writes to disk. PostgreSQL doesn't make any attempt yet to spread out the sync calls. On a busy server, what can happen is that the whole OS write cache fills with dirty data--none of which is written out to disk because of the high kernel threshold--and then it all slams onto disk fast once the checkpoint starts executing sync calls. Lowering the size of the Linux write cache helps with that a lot, but can't quite eliminate the problem. > Plus: if 2-6GB is too much, why not decrease checkpoint_segments? Or > checkpoint_timeout? > Making checkpoints really frequent increases total disk I/O, both to the database and to the WAL, significantly. You don't want to do that if there's another way to achieve the same goal without those costs, which is what some kernel tuning can do here. Just need to be careful not to go too far; some write caching at the OS level helps a lot, too. > I'm not saying that those kernel parameters are "useless"; I'm saying > they are used > in the same way as the checkpoint_segments, checkpoint_timeout and > checkpoint_completion_target are used by postgresql; and on a postgresql-only system > I would rather have postgresql look after the fsync calls, not the OS. > Except that PostgreSQL doesn't look after the fsync calls yet. I wrote a patch for 9.1 that spread out the sync calls, similarly to how the writes are spread out now. I wasn't able to prove an improvement sufficient to commit the result. In the Linux case, the OS has more information to work with about how to schedule I/O efficiently given how the hardware is acting, and it's not possible for PostgreSQL to know all that--not without duplicating a large portion of the kernel development work at least. Right now, relying the kernel means that any improvements there magically apply to any PostgreSQL version. So far the results there have been beating out improvements made to the database fast enough that it's hard to innovate in this area within Postgres. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
> checkpoint_completion_targets spreads out the writes to disk. PostgreSQL > doesn't make any attempt yet to spread out the sync calls. On a busy > server, what can happen is that the whole OS write cache fills with dirty > data--none of which is written out to disk because of the high kernel > threshold--and then it all slams onto disk fast once the checkpoint starts > executing sync calls. Lowering the size of the Linux write cache helps with > that a lot, but can't quite eliminate the problem. Ok, I understand now > Except that PostgreSQL doesn't look after the fsync calls yet. I wrote a > patch for 9.1 that spread out the sync calls, similarly to how the writes are > spread out now. I wasn't able to prove an improvement sufficient to commit > the result. In the Linux case, the OS has more information to work with about > how to schedule I/O efficiently given how the hardware is acting, and it's > not possible for PostgreSQL to know all that--not without duplicating a large > portion of the kernel development work at least. Right now, relying the kernel > means that any improvements there magically apply to any PostgreSQL version. So > far the results there have been beating out improvements made to the database > fast enough that it's hard to innovate in this area within Postgres. Ok; thank you very much for the explanation. In fact, shouldn't those things be explained in the "WAL Configuration" section of the manual? It looks as important as configuring Postgresql itself... And: that applies to Linux. What about other OS, such as Solaris and FreeBSD?
On 11/10/2011 00:02, Samuel Gendler wrote: > The original question doesn't actually say that performance has gone down, > only that cpu utilization has gone up. Presumably, with lots more RAM, it is > blocking on I/O a lot less, so it isn't necessarily surprising that CPU > utilization has gone up. It's Linux - it counts IO wait in the load average. Again there are too little details to be sure of anything, but it is possible that the IO rate didn't go down.
Attachment
Em 10-10-2011 23:19, Claudio Freire escreveu: > On Tue, Oct 11, 2011 at 12:02 AM, Samuel Gendler > <sgendler@ideasculptor.com> wrote: >> The original question doesn't actually say that performance has gone down, >> only that cpu utilization has gone up. Presumably, with lots more RAM, it is >> blocking on I/O a lot less, so it isn't necessarily surprising that CPU >> utilization has gone up. The only problem would be if db performance has >> gotten worse. Maybe I missed a message where that was covered? I don't see >> it in the original query to the list. > > Load average (which is presumably the metric in question) includes > both processes using the CPU and processes waiting for I/O. > So it *would* be strange for load average to go up like that, if > database configuration remains the same (ie: equal query plans) Yep, that's the point. Iostat and vmstat reports a very low use of the disks (lower than before the changes are made - perhaps because the cache). Nothing changed in database itself.
Em 11-10-2011 03:42, Greg Smith escreveu: > On 10/10/2011 01:31 PM, alexandre - aldeia digital wrote: >> I drop checkpoint_timeout to 1min and turn on log_checkpoint: >> >> <2011-10-10 14:18:48 BRT >LOG: checkpoint complete: wrote 6885 buffers >> (1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; >> write=29.862 s, sync=28.466 s, total=58.651 s >> <2011-10-10 14:18:50 BRT >LOG: checkpoint starting: time > > Sync times that go to 20 seconds suggest there's a serious problem here > somewhere. But it would have been better to do these changes one at a > time: turn on log_checkpoints, collect some data, then try lowering > checkpoint_timeout. A checkpoint every minute is normally a bad idea, so > that change may have caused this other issue. I returned to 5 minutes. Thanks. >> procs -------------------memory------------------ ---swap-- >> -----io---- --system-- -----cpu-------.. >> r b swpd free buff cache si so bi bo in cs us sy id wa st >> 34 0 2696 8289288 117852 38432268 0 0 8 2757 2502 4148 80 20 0 0 0 >> 39 1 2696 8286128 117852 38432348 0 0 24 622 2449 4008 80 20 0 0 0 >> 41 0 2696 8291100 117852 38433792 0 0 64 553 2487 3419 83 17 0 0 0 >> ...Notice that we have no idle % in cpu column. > > You also have no waiting for I/O! This is just plain strange; checkpoint > sync time spikes with no I/O waits I've never seen before. System time > going to 20% isn't normal either. Have I anything to detect which proccess was causing the system time increasing ? > I don't know what's going on with this server. What I would normally do > in this case is use "top -c" to see what processes are taking up so much > runtime, and then look at what they are doing with pg_stat_activity. You > might see the slow processes in the log files by setting > log_min_duration_statement instead. I'd be suspicious of Linux given > your situation though. Last night, I put another disk in the server and install Debian 6, preserving the same structure, only poiting the olds data in the new postgresql 9.0.5 compilation. Today, the problem persists. And for all that asks: the performance is poor, unusable. > I wonder if increasing the memory is a coincidence, and the real cause > is something related to the fact that you had to reboot to install it. > You might have switched to a newer kernel in the process too, for > example; I'd have to put a kernel bug on the list of suspects with this > unusual vmstat output. I dont think that is a coincidence, because this machine was rebooted other times without problem. Best regards.
On Mon, Oct 10, 2011 at 3:26 PM, alexandre - aldeia digital <adaldeia@gmail.com> wrote: > Hi, > > Yesterday, a customer increased the server memory from 16GB to 48GB. A shot in the dark... what is the content of /proc/mtrr? Luca
On 10/11/2011 04:57 AM, Leonardo Francalanci wrote: > In fact, shouldn't those things be explained in the "WAL > Configuration" section > of the manual? It looks as important as configuring Postgresql itself... > And: that applies to Linux. What about other OS, such as Solaris and FreeBSD? > There's at least 10 pages of stuff about this floating around my book, including notes on the similar Solaris and FreeBSD parameters to tune. As for why that's not in the manual instead, two reasons. The PostgreSQL manual doesn't get too deep into things like operating system adjustments for performance. It's big enough without that, and any information that would be added is likely to get out of sync with new OS releases. This project doesn't want to take on the job of documenting every twist there. My book is less than a year old, and there's already some material on this topic that's turning obsolete due to Linux kernel changes. The PostgreSQL manual is shooting to still be relevant for much longer than that. Second, making an addition to the manual is the hardest possible way to document things. The docbook toolchain used is picky and fragile to malformed additions. And you can't get anything added there without passing through a series of people who will offer feedback of some sort; usually valuable, but still time consuming to process. Things need to asked quite frequently before it's worth the trouble. I did a few blog entries and mailing list posts about this earlier this year, and that was as much documentation as I could justify at the time. If there's a user-visible behavior changes here, that's the point where an update to the manual would be in order. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Hi, About 3 hours ago, the client contacted the Dell and they suggested 2 things: 1) Update the baseboard firmware (the only component that haven't updated yesterday). 2) Change all memory chips to new others, instead of maintain the old (16 GB) + new (32 GB). After do this, until now, the load average does not surpasses 4 points. The I/O wait returned to hit some values and the system % down to an average of 5%. Unfortunatly, we will never discover if the problem resides in baseboard firmware or in RAM, but util now the problem was solved. Thanks all for help ! Best regards
On Tue, Oct 11, 2011 at 3:02 PM, alexandre - aldeia digital <adaldeia@gmail.com> wrote: > 2) Change all memory chips to new others, instead of maintain the old (16 > GB) + new (32 GB). Of course, mixing disables double/triple/whatuple channel, and makes your memory subsystem correspondingly slower. By a lot.
Em 11-10-2011 15:05, Claudio Freire escreveu: > On Tue, Oct 11, 2011 at 3:02 PM, alexandre - aldeia digital > <adaldeia@gmail.com> wrote: >> 2) Change all memory chips to new others, instead of maintain the old (16 >> GB) + new (32 GB). > > Of course, mixing disables double/triple/whatuple channel, and makes > your memory subsystem correspondingly slower. > By a lot. The initial change (add more memory) are maded by a technical person of Dell and him told us that he use the same especification in memory chips. But, you know how "it works"... ;)
On Tue, Oct 11, 2011 at 5:02 PM, alexandre - aldeia digital <adaldeia@gmail.com> wrote: > The initial change (add more memory) are maded by a technical person of Dell > and him told us that he use the same especification in memory chips. > But, you know how "it works"... ;) Yeah, but different size == different specs I do know how it works ;-)
On 11-10-2011 20:05 Claudio Freire wrote: > On Tue, Oct 11, 2011 at 3:02 PM, alexandre - aldeia digital > <adaldeia@gmail.com> wrote: >> 2) Change all memory chips to new others, instead of maintain the old (16 >> GB) + new (32 GB). > > Of course, mixing disables double/triple/whatuple channel, and makes > your memory subsystem correspondingly slower. > By a lot. > That really depends on the chipset/server. The current intel E56xx-chips (and previous E55xx) basically just expect groups of 3 modules per processor, but it doesn't really matter whether that's 3x2+3x4 or 6x4 in terms of performance (unless the linuxkernel does some weirdness of course). It at least won't disable triple-channel, just because you added different size modules. Only when you get to too many 'ranks', you'll see performance degradation. But that's in terms of clock speed, not in disabling triple channel. But as said, that all depends on the memory controller in the server's mainboard or processors. Best regards, Arjen
On Tue, Oct 11, 2011 at 5:33 PM, Arjen van der Meijden <acmmailing@tweakers.net> wrote: > That really depends on the chipset/server. The current intel E56xx-chips > (and previous E55xx) basically just expect groups of 3 modules per > processor, but it doesn't really matter whether that's 3x2+3x4 or 6x4 in > terms of performance (unless the linuxkernel does some weirdness of course). > It at least won't disable triple-channel, just because you added different > size modules. Only when you get to too many 'ranks', you'll see performance > degradation. But that's in terms of clock speed, not in disabling triple > channel. > There are too many caveats to doing that, including how they're installed (in which slots), the timings, brand, internal structure, and whatnot, with all those details not always published, only sitting in the module's SPD. In essence, mixing is more probable to mess your multichannel configuration than not. That's why matched kits are sold.