Thread: MusicBrainz postgres performance issues
Robert Kaye <rob@musicbrainz.org> wrote: > Hi! > > We at MusicBrainz have been having trouble with our Postgres install for the > past few days. I’ve collected all the relevant information here: > > http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ > > If anyone could provide tips, suggestions or other relevant advice for what to > poke at next, we would love it. just a wild guess: raid-controller BBU faulty Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Mar 15, 2015, at 12:13 PM, Josh Krupka <jkrupka@gmail.com> wrote:It sounds like you've hit the postgres basics, what about some of the linux check list items?what does free -m show on your db server?
If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server?
There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim. I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have. That would at least give you another angle to investigate.
> On Mar 15, 2015, at 12:41 PM, Andreas Kretschmer <akretschmer@spamfence.net> wrote: > > just a wild guess: raid-controller BBU faulty We don’t have a BBU in this server, but at least we have redundant power supplies. In any case, how would a fault batter possibly cause this? -- --ruaok Robert Kaye -- rob@musicbrainz.org -- http://musicbrainz.org
<div class="moz-cite-prefix">pls check this if it helps: <a class="moz-txt-link-freetext" href="http://ubuntuforums.org/showthread.php?t=2258734">http://ubuntuforums.org/showthread.php?t=2258734</a><br/><br /> 在2015/3/15 18:54, Robert Kaye 写道:<br /></div><blockquote cite="mid:B008F2EE-45D7-4535-9163-56A8CDCA553C@musicbrainz.org"type="cite"> Hi! <div class=""><br class="" /></div><div class="">Weat MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all therelevant information here:</div><div class=""><br class="" /></div><div class=""> <a class="" href="http://blog.musicbrainz.org/2015/03/15/postgres-troubles/" moz-do-not-send="true">http://blog.musicbrainz.org/2015/03/15/postgres-troubles/</a></div><divclass=""><br class="" /></div><divclass="">If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we wouldlove it.</div><div class=""><br class="" /></div><div class="">Thanks!</div><div class=""><br class="" /></div><divclass=""><div class=""><div class="">--<br class="" /><br class="" /> --ruaok <br class="" /><br class=""/> Robert Kaye -- <a class="" href="mailto:rob@musicbrainz.org" moz-do-not-send="true">rob@musicbrainz.org</a> -- <a class="" href="http://musicbrainz.org" moz-do-not-send="true">http://musicbrainz.org</a></div></div><brclass="" /></div></blockquote><br />
On 2015-03-15 13:08:13 +0100, Robert Kaye wrote: > > On Mar 15, 2015, at 12:41 PM, Andreas Kretschmer <akretschmer@spamfence.net> wrote: > > > > just a wild guess: raid-controller BBU faulty > > We don’t have a BBU in this server, but at least we have redundant power supplies. > > In any case, how would a fault batter possibly cause this? Many controllers disable write-back caching when the battery is dead. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
what does free -m show on your db server?total used free shared buffers cachedMem: 48295 31673 16622 0 5 12670-/+ buffers/cache: 18997 29298Swap: 22852 2382 20470
If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server?0We adjusted that too, but no effect.(I’ve updated the blog post with these two comments)
There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim. I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have. That would at least give you another angle to investigate.If there are specific things you’d like to know, I’ve be happy to be a human proxy. :)
Hi!We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here:If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it.Thanks!
On 15.3.2015 13:07, Robert Kaye wrote: > >> If the load problem really is being caused by swapping when things >> really shouldn't be swapping, it could be a matter of adjusting your >> swappiness - what does cat /proc/sys/vm/swappiness show on your server? > > 0 > > We adjusted that too, but no effect. > > (I’ve updated the blog post with these two comments) IMHO setting swappiness to 0 is way too aggressive. Just set it to something like 10 - 20, that works better in my experience. >> There are other linux memory management things that can cause >> postgres and the server running it to throw fits like THP and zone >> reclaim. I don't have enough info about your system to say they are >> the cause either, but check out the many postings here and other >> places on the detrimental effect that those settings *can* have. >> That would at least give you another angle to investigate. > > If there are specific things you’d like to know, I’ve be happy to be a > human proxy. :) I'd start with vm.* configuration, so the output from this: # sysctl -a | grep '^vm.*' and possibly /proc/meminfo. I'm especially interested in the overcommit settings, because per the free output you provided there's ~16GB of free RAM. BTW what amounts of data are we talking about? How large is the database and how large is the active set? I also noticed you use kernel 3.2 - that's not the best kernel version for PostgreSQL - see [1] or [2] for example. [1] https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4 [2] http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Mar 15, 2015, at 13:45, Josh Krupka <jkrupka@gmail.com> wrote: > Hmm that's definitely odd that it's swapping since it has plenty of free memory at the moment. Is it still under heavyload right now? Has the output of free consistently looked like that during your trouble times? And it seems better to disable swapiness >
On 2015-03-15 13:07:25 +0100, Robert Kaye wrote: > > > On Mar 15, 2015, at 12:13 PM, Josh Krupka <jkrupka@gmail.com> wrote: > > > > It sounds like you've hit the postgres basics, what about some of the linux check list items? > > > > what does free -m show on your db server? > > total used free shared buffers cached > Mem: 48295 31673 16622 0 5 12670 > -/+ buffers/cache: 18997 29298 > Swap: 22852 2382 20470 Could you post /proc/meminfo instead? That gives a fair bit more information. Also: * What hardware is this running on? * Why do you need 500 connections (that are nearly all used) when you have a pgbouncer in front of the database? That's not going to be efficient. * Do you have any data tracking the state connections are in? I.e. whether they're idle or not? The connections graph on you linked doesn't give that information? * You're apparently not graphing CPU usage. How busy are the CPUs? How much time is spent in the kernel (i.e. system)? * Consider installing perf (linux-utils-$something) and doing a systemwide profile. 3.2 isn't the greatest kernel around, efficiency wise. At some point you might want to upgrade to something newer. I've seen remarkable differences around this. You really should upgrade postgres to a newer major version one of these days. Especially 9.2. can give you a remarkable improvement in performance with many connections in a read mostly workload. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Mar 15, 2015 at 7:50 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-03-15 13:07:25 +0100, Robert Kaye wrote: >> >> > On Mar 15, 2015, at 12:13 PM, Josh Krupka <jkrupka@gmail.com> wrote: >> > >> > It sounds like you've hit the postgres basics, what about some of the linux check list items? >> > >> > what does free -m show on your db server? >> >> total used free shared buffers cached >> Mem: 48295 31673 16622 0 5 12670 >> -/+ buffers/cache: 18997 29298 >> Swap: 22852 2382 20470 > > Could you post /proc/meminfo instead? That gives a fair bit more > information. > > Also: > * What hardware is this running on? > * Why do you need 500 connections (that are nearly all used) when you > have a pgbouncer in front of the database? That's not going to be > efficient. > * Do you have any data tracking the state connections are in? > I.e. whether they're idle or not? The connections graph on you linked > doesn't give that information? > * You're apparently not graphing CPU usage. How busy are the CPUs? How > much time is spent in the kernel (i.e. system)? htop is a great tool for watching the CPU cores live. Red == kernel btw. > * Consider installing perf (linux-utils-$something) and doing a > systemwide profile. > > 3.2 isn't the greatest kernel around, efficiency wise. At some point you > might want to upgrade to something newer. I've seen remarkable > differences around this. That is an understatement. Here's a nice article on why it's borked: http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html Had a 32 core machine with big RAID BBU and 512GB memory that was dying using 3.2 kernel. went to 3.11 and it went from a load of 20 to 40 to a load of 5. > You really should upgrade postgres to a newer major version one of these > days. Especially 9.2. can give you a remarkable improvement in > performance with many connections in a read mostly workload. Agreed. ubuntu 12.04 with kernel 3.11/3.13 with pg 9.2 has been a great improvement over debian squeeze and pg 8.4 that we were running at work until recently. As for the OP. if you've got swap activity causing issues when there's plenty of free space just TURN IT OFF. swapoff -a I do this on all my big memory servers that don't really need swap, esp when I was using hte 3.2 kernel which seems broken as regards swap on bigger memory machines.
On 03/15/2015 05:08 AM, Robert Kaye wrote: > > >> On Mar 15, 2015, at 12:41 PM, Andreas Kretschmer <akretschmer@spamfence.net> wrote: >> >> just a wild guess: raid-controller BBU faulty > > We don’t have a BBU in this server, but at least we have redundant power supplies. > > In any case, how would a fault batter possibly cause this? The controller would turn off the cache. JD > > -- > > --ruaok > > Robert Kaye -- rob@musicbrainz.org -- http://musicbrainz.org > > > -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc Now I get it: your service is designed for a customer base that grew up with Facebook, watches Japanese seizure robot anime, and has the attention span of a gnat. I'm not that user., "Tyler Riddle"
On 03/15/2015 09:43 AM, Scott Marlowe wrote: >> * Consider installing perf (linux-utils-$something) and doing a >> systemwide profile. >> >> 3.2 isn't the greatest kernel around, efficiency wise. At some point you >> might want to upgrade to something newer. I've seen remarkable >> differences around this. Not at some point, now. 3.2 - 3.8 are undeniably broken for PostgreSQL. > > That is an understatement. Here's a nice article on why it's borked: > > http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html > > Had a 32 core machine with big RAID BBU and 512GB memory that was > dying using 3.2 kernel. went to 3.11 and it went from a load of 20 to > 40 to a load of 5. Yep, I can confirm this behavior. > >> You really should upgrade postgres to a newer major version one of these >> days. Especially 9.2. can give you a remarkable improvement in >> performance with many connections in a read mostly workload. Seconded. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc Now I get it: your service is designed for a customer base that grew up with Facebook, watches Japanese seizure robot anime, and has the attention span of a gnat. I'm not that user., "Tyler Riddle"
On Sun, Mar 15, 2015 at 10:43 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Sun, Mar 15, 2015 at 7:50 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2015-03-15 13:07:25 +0100, Robert Kaye wrote: >>> >>> > On Mar 15, 2015, at 12:13 PM, Josh Krupka <jkrupka@gmail.com> wrote: >>> > >>> > It sounds like you've hit the postgres basics, what about some of the linux check list items? >>> > >>> > what does free -m show on your db server? >>> >>> total used free shared buffers cached >>> Mem: 48295 31673 16622 0 5 12670 >>> -/+ buffers/cache: 18997 29298 >>> Swap: 22852 2382 20470 >> >> Could you post /proc/meminfo instead? That gives a fair bit more >> information. >> >> Also: >> * What hardware is this running on? >> * Why do you need 500 connections (that are nearly all used) when you >> have a pgbouncer in front of the database? That's not going to be >> efficient. >> * Do you have any data tracking the state connections are in? >> I.e. whether they're idle or not? The connections graph on you linked >> doesn't give that information? >> * You're apparently not graphing CPU usage. How busy are the CPUs? How >> much time is spent in the kernel (i.e. system)? > > htop is a great tool for watching the CPU cores live. Red == kernel btw. > >> * Consider installing perf (linux-utils-$something) and doing a >> systemwide profile. >> >> 3.2 isn't the greatest kernel around, efficiency wise. At some point you >> might want to upgrade to something newer. I've seen remarkable >> differences around this. > > That is an understatement. Here's a nice article on why it's borked: > > http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html > > Had a 32 core machine with big RAID BBU and 512GB memory that was > dying using 3.2 kernel. went to 3.11 and it went from a load of 20 to > 40 to a load of 5. > >> You really should upgrade postgres to a newer major version one of these >> days. Especially 9.2. can give you a remarkable improvement in >> performance with many connections in a read mostly workload. > > Agreed. ubuntu 12.04 with kernel 3.11/3.13 with pg 9.2 has been a > great improvement over debian squeeze and pg 8.4 that we were running > at work until recently. > > As for the OP. if you've got swap activity causing issues when there's > plenty of free space just TURN IT OFF. > > swapoff -a > > I do this on all my big memory servers that don't really need swap, > esp when I was using hte 3.2 kernel which seems broken as regards swap > on bigger memory machines. OK I've now read your blog post. A few pointers I'd make. shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. 64MB work mem AND max_connections = 500 is a recipe for disaster. No db can actively process 500 queries at once without going kaboom, ad having 64MB work_mem means it will go kaboom long before it reaches 500 active connections. Lower that and let pgbouncer handle the extra connections for you. Get some monitoring installed if you don't already have it so you can track memory usage, cpu usage, disk usage etc. Zabbix or Nagios work well. Without some kind of system monitoring you're missing half the information you need to troubleshoot with. Install iotop, sysstat, and htop. Configure sysstat to collect data so you can use sar to see what the machine's been doing in the past few days etc. Set it to 1 minute intervals in the /etc/cron.d/sysstat file. Do whatever you have to to get kernel 3.11 or greater on that machine (or a new one). You don't have to upgrade pg just yet but the upgrade of the kernel is essential. Good luck. Let us know what you find and if you can get that machine back on its feet. -- To understand recursion, one must first understand recursion.
On Sun, Mar 15, 2015 at 11:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: Clarification: > 64MB work mem AND max_connections = 500 is a recipe for disaster. No > db can actively process 500 queries at once without going kaboom, ad > having 64MB work_mem means it will go kaboom long before it reaches > 500 active connections. Lower that and let pgbouncer handle the extra > connections for you. Lower max_connections. work_mem 64MB is fine as long as max_connections is something reasonable (reasonable is generally #CPU cores * 2 or so). work_mem is per sort. A single query could easily use 2 or 4x work_mem all by itself. You can see how having hundreds of active connections each using 64MB or more at the same time can kill your server.
On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: > shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. I've now seen a significant number of systems where a larger shared_buffers can help quite massively. The primary case where it can, in my experience, go bad are write mostly database where every buffer acquiration has to write out dirty data while holding locks. Especially during relation extension that's bad. A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: >> shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. > > I think that's a outdated wisdom, i.e. not generally true. Quite agreed. With note, that proper configured controller with BBU is needed. > A new enough kernel, a sane filesystem > (i.e. not ext3) and sane checkpoint configuration takes care of most of > the other disadvantages. Most likely. And better to be sure that filesystem mounted without barrier. And I agree with Scott - 64MB work mem AND max_connections = 500 is a recipe for disaster. The problem could be in session mode of pgbouncer. If you can work with transaction mode - do it. Best regards, Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
On 2015-03-15 20:42:51 +0300, Ilya Kosmodemiansky wrote: > On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: > >> shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. > > > > I think that's a outdated wisdom, i.e. not generally true. > > Quite agreed. With note, that proper configured controller with BBU is needed. That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. But, besides influencing the default for wal_buffers, a larger shared_buffers doesn't change the amount of synchronous writes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund <andres@2ndquadrant.com> wrote: > That imo doesn't really have anything to do with it. The primary benefit > of a BBU with writeback caching is accelerating (near-)synchronous > writes. Like the WAL. My point was, that having no proper raid controller (today bbu surely needed for the controller to be a proper one) + heavy writes of any kind, it is absolutely impossible to live with large shared_buffers and without io problems. > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
On Sun, Mar 15, 2015 at 11:46 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-03-15 20:42:51 +0300, Ilya Kosmodemiansky wrote: >> On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: >> >> shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. >> > >> > I think that's a outdated wisdom, i.e. not generally true. >> >> Quite agreed. With note, that proper configured controller with BBU is needed. > > That imo doesn't really have anything to do with it. The primary benefit > of a BBU with writeback caching is accelerating (near-)synchronous > writes. Like the WAL. But, besides influencing the default for > wal_buffers, a larger shared_buffers doesn't change the amount of > synchronous writes. Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :)
On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: > Here's the problem with a large shared_buffers on a machine that's > getting pushed into swap. It starts to swap BUFFERs. Once buffers > start getting swapped you're not just losing performance, that huge > shared_buffers is now working against you because what you THINK are > buffers in RAM to make things faster are in fact blocks on a hard > drive being swapped in and out during reads. It's the exact opposite > of fast. :) IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
cat /proc/cpuinfo | grep processor | wc -l
I see you got pg_stat_statements enabled, what are the SQL you experience during this heavy load time? And does explain on them show a lot of sorting activity that requires more work_mem.
Please enable log_checkpoints, so we can see if your checkpoint_segments is adequate.
Sunday, March 15, 2015 6:47 PM
IMNSHO that's tackling things from the wrong end. If 12GB of shared
buffers drive your 48GB dedicated OLTP postgres server into swapping out
actively used pages, the problem isn't the 12GB of shared buffers, but
that you require so much memory for other things. That needs to be
fixed.
But! We haven't even established that swapping is an actual problem
here. The ~2GB of swapped out memory could just as well be the java raid
controller management monstrosity or something similar. Those pages
won't ever be used and thus can better be used to buffer IO.
You can check what's actually swapped out using:
grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB'
For swapping to be actually harmful you need to have pages that are
regularly swapped in. vmstat will tell.
In a concurrent OLTP workload (~450 established connections do suggest
that) with a fair amount of data keeping the hot data set in
shared_buffers can significantly reduce problems. Constantly searching
for victim buffers isn't a nice thing, and that will happen if your most
frequently used data doesn't fit into s_b. On the other hand, if your
data set is so large that even the hottest part doesn't fit into memory
(perhaps because there's no hottest part as there's no locality at all),
a smaller shared buffers can make things more efficient, because the
search for replacement buffers is cheaper with a smaller shared buffers
setting.
Greetings,
Andres FreundSunday, March 15, 2015 2:25 PM
Here's the problem with a large shared_buffers on a machine that's
getting pushed into swap. It starts to swap BUFFERs. Once buffers
start getting swapped you're not just losing performance, that huge
shared_buffers is now working against you because what you THINK are
buffers in RAM to make things faster are in fact blocks on a hard
drive being swapped in and out during reads. It's the exact opposite
of fast. :)Sunday, March 15, 2015 1:46 PM
That imo doesn't really have anything to do with it. The primary benefit
of a BBU with writeback caching is accelerating (near-)synchronous
writes. Like the WAL. But, besides influencing the default for
wal_buffers, a larger shared_buffers doesn't change the amount of
synchronous writes.
Greetings,
Andres FreundSunday, March 15, 2015 1:42 PMOn Sun, Mar 15, 2015 at 8:20 PM, Andres Freund <andres@2ndquadrant.com> wrote:On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote:shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so.I think that's a outdated wisdom, i.e. not generally true.Quite agreed. With note, that proper configured controller with BBU is needed.A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages.Most likely. And better to be sure that filesystem mounted without barrier. And I agree with Scott - 64MB work mem AND max_connections = 500 is a recipe for disaster. The problem could be in session mode of pgbouncer. If you can work with transaction mode - do it. Best regards, Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.comSunday, March 15, 2015 1:20 PM
I think that's a outdated wisdom, i.e. not generally true. I've now seen
a significant number of systems where a larger shared_buffers can help
quite massively. The primary case where it can, in my experience, go
bad are write mostly database where every buffer acquiration has to
write out dirty data while holding locks. Especially during relation
extension that's bad. A new enough kernel, a sane filesystem
(i.e. not ext3) and sane checkpoint configuration takes care of most of
the other disadvantages.
Greetings,
Andres Freund
Attachment
On 15.3.2015 18:54, Ilya Kosmodemiansky wrote: > On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> That imo doesn't really have anything to do with it. The primary >> benefit of a BBU with writeback caching is accelerating >> (near-)synchronous writes. Like the WAL. > > My point was, that having no proper raid controller (today bbu > surely needed for the controller to be a proper one) + heavy writes > of any kind, it is absolutely impossible to live with large > shared_buffers and without io problems. That is not really true, IMHO. The benefit of the write cache is that it can absorb certain amount of writes, equal to the size of the cache (nowadays usually 512MB or 1GB), without forcing them to disks. It however still has to flush the dirty data to the drives later, but that side usually has much lower throughput - e.g. while you can easily write several GB/s to the controller, the drives usually handle only ~1MB/s of random writes each (I assume rotational drives here). But if you do a lot of random writes (which is likely the case for write-heavy databases), you'll fill the write cache pretty soon and will be bounded by the drives anyway. The controller really can't help with sequential writes much, because the drives already handle that quite well. And SSDs are a completely different story of course. That does not mean the write cache is useless - it can absorb short bursts of random writes, fix the write hole with RAID5, the controller may compute the parity computation etc. Whenever someone asks me whether they should buy a RAID controller with write cache for their database server, my answer is "absolutely yes" in 95.23% cases ... ... but really it's not something that magically changes the limits for write-heavy databases - the main limit are still the drives. regards Tomas -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-03-15 20:54:51 +0300, Ilya Kosmodemiansky wrote: > On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > That imo doesn't really have anything to do with it. The primary benefit > > of a BBU with writeback caching is accelerating (near-)synchronous > > writes. Like the WAL. > > My point was, that having no proper raid controller (today bbu surely > needed for the controller to be a proper one) + heavy writes of any > kind, it is absolutely impossible to live with large shared_buffers > and without io problems. And my point is that you're mostly wrong. What a raid controller's writeback usefully cache accelerates is synchronous writes. I.e. writes that the application waits for. Usually raid controllers don't have much chance to reorderer the queued writes (i.e. turning neighboring writes into one larger sequential write). What they do excel at is making synchronous writes to disk return faster because the data is only written to the the controller's memory, not to actual storage. They're also good at avoiding actual writes to disk when the *same* page is written to multiple times in short amount of time. In postgres writes for data that goes through shared_buffers are usally asynchronous. We write them to the OS's page cache when a page is needed for other contents, is undirtied by the bgwriter, or written out during a checkpoint; but do *not* wait for the write to hit the disk. The controller's write back cache doesn't hugely help here, because it doesn't make *that* much of a difference whether the dirty data stays in the kernel's page cache or in the controller. In contrast to that, writes to the WAL are often more or les synchronous. We actually wait (via fdatasync()/fsync() syscalls) for writes to hit disk in a bunch of scenarios, most commonly when committing a transaction. Unless synchronous_commit = off every COMMIT in a transaction that wrote data implies a fdatasync() of a file in pg_xlog (well, we do optimize that in some condition, but let's leave that out for now). Additionally, if there are many smaller/interleaved transactions, we will write()/fdatasync() out the same 8kb WAL page repeatedly. Everytime a transaction commits (and some other things) the page that commit record is on will be flushed. As the WAL records for insertions, updates, deletes, commits are frequently much smaller than 8kb that will often happen 20-100 for the same page in OLTP scenarios with narrow rows. That's why synchronous_commit = off can be such a huge win for OLTP write workloads without a writeback cache - synchronous writes are turned into asynchronous writes, and repetitive writes to the same page are avoided. It also explains why synchronous_commit = off has much less an effect for bulk write workloads: As there are no synchronous disk writes due to WAL flushes at commit time (there's only very few commits), synchronous commit doesn't have much of an effect. That said, there's a couple reasons why you're not completely wrong: Historically, when using ext3 with data=ordered and some other filesystems, synchronous writes to one file forced *all* other previously dirtied data to also be flushed. That means that if you have pg_xlog and the normal relation files on the same filesystem, the synchronous writes to the WAL will not only have to write out the new WAL (often not that much data), but also all the other dirty data. The OS will often be unable to do efficient write combining in that case, because a) there's not yet that much data there, b) postgres doesn't order writes during checkpoints. That means that WAL writes will suddenly have to write out much more data => COMMITs are slow. That's where the suggestion to keep pg_xlog on a separate partion largely comes from. Writes going through shared_buffers are sometimes indirectly turned into synchronous writes (from the OSs perspective at least. Which means they'll done at a higher priority). That happens when the checkpointer fsync()s all the files at the end of a checkpoint. When things are going well and checkpoints are executed infrequently and completely guided by time (i.e. triggered solely by checkpoint_timeout, and not checkpoint_segments) that's usually not too bad. You'll see a relatively small latency spike for transactions. Unfortunately the ext* filesystems have a implementation problem here, which can make this problem much worse: The way writes are priorized during an fsync() can stall out concurrent synchronous reads/writes pretty badly. That's much less of a problem with e.g. xfs. Which is why I'd atm not suggest ext4 for write intensive applications. The other situation where this can lead to big problems is if your checkpoints aren't scheduled by time (you can recognize that by enabling log_checkpoints and check a) that time is the trigger, b) they're actually happening in a interval consistent with checkpoint_timeout). If the relation files are not writtten out in a smoothed out fashion (configured by checkpoint_completion_target) a *lot* of dirty buffers can exist in the OS's page cache. Especially because the default 'dirty' settings in linux on servers with a lot of IO are often completely insane; especially with older kernels (pretty much everything before 3.11 is badly affected). The important thing to do here is to configure checkpoint_timeout, checkpoint_segments and checkpoint_completion_target in a consistent way. In my opinion the default checkpoint_timeout is *way* too low; and just leads to a large increase in overall writes (due to more frequent checkpoints repeated writes to the same page aren't coalesced) *and* an increase in WAL volume (many more full_page_writes). Lots more could be written about this topic; but I think I've blathered on enough for the moment ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 15.3.2015 23:47, Andres Freund wrote: > On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: >> Here's the problem with a large shared_buffers on a machine that's >> getting pushed into swap. It starts to swap BUFFERs. Once buffers >> start getting swapped you're not just losing performance, that huge >> shared_buffers is now working against you because what you THINK are >> buffers in RAM to make things faster are in fact blocks on a hard >> drive being swapped in and out during reads. It's the exact opposite >> of fast. :) > > IMNSHO that's tackling things from the wrong end. If 12GB of shared > buffers drive your 48GB dedicated OLTP postgres server into swapping > out actively used pages, the problem isn't the 12GB of shared > buffers, but that you require so much memory for other things. That > needs to be fixed. I second this opinion. As was already pointed out, the 500 connections is rather insane (assuming the machine does not have hundreds of cores). If there are memory pressure issues, it's likely because many queries are performing memory-expensive operations at the same time (might even be a bad estimate causing hashagg to use much more than work_mem). > But! We haven't even established that swapping is an actual problem > here. The ~2GB of swapped out memory could just as well be the java raid > controller management monstrosity or something similar. Those pages > won't ever be used and thus can better be used to buffer IO. > > You can check what's actually swapped out using: > grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' > > For swapping to be actually harmful you need to have pages that are > regularly swapped in. vmstat will tell. I've already asked for vmstat logs, so let's wait. > In a concurrent OLTP workload (~450 established connections do > suggest that) with a fair amount of data keeping the hot data set in > shared_buffers can significantly reduce problems. Constantly > searching for victim buffers isn't a nice thing, and that will happen > if your most frequently used data doesn't fit into s_b. On the other > hand, if your data set is so large that even the hottest part doesn't > fit into memory (perhaps because there's no hottest part as there's > no locality at all), a smaller shared buffers can make things more > efficient, because the search for replacement buffers is cheaper with > a smaller shared buffers setting. I've met many systems with max_connections values this high, and it was mostly idle connections because of separate connection pools on each application server. So mostly idle (90% of the time), but at peak time all the application servers want to od stuff at the same time. And it all goes KABOOOM! just like here. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sunday, March 15, 2015 7:41 PMOn 15.3.2015 23:47, Andres Freund wrote:On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote:Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :)IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed.I second this opinion. As was already pointed out, the 500 connections is rather insane (assuming the machine does not have hundreds of cores). If there are memory pressure issues, it's likely because many queries are performing memory-expensive operations at the same time (might even be a bad estimate causing hashagg to use much more than work_mem).But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell.I've already asked for vmstat logs, so let's wait.In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting.I've met many systems with max_connections values this high, and it was mostly idle connections because of separate connection pools on each application server. So mostly idle (90% of the time), but at peak time all the application servers want to od stuff at the same time. And it all goes KABOOOM! just like here.Sunday, March 15, 2015 6:47 PM
IMNSHO that's tackling things from the wrong end. If 12GB of shared
buffers drive your 48GB dedicated OLTP postgres server into swapping out
actively used pages, the problem isn't the 12GB of shared buffers, but
that you require so much memory for other things. That needs to be
fixed.
But! We haven't even established that swapping is an actual problem
here. The ~2GB of swapped out memory could just as well be the java raid
controller management monstrosity or something similar. Those pages
won't ever be used and thus can better be used to buffer IO.
You can check what's actually swapped out using:
grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB'
For swapping to be actually harmful you need to have pages that are
regularly swapped in. vmstat will tell.
In a concurrent OLTP workload (~450 established connections do suggest
that) with a fair amount of data keeping the hot data set in
shared_buffers can significantly reduce problems. Constantly searching
for victim buffers isn't a nice thing, and that will happen if your most
frequently used data doesn't fit into s_b. On the other hand, if your
data set is so large that even the hottest part doesn't fit into memory
(perhaps because there's no hottest part as there's no locality at all),
a smaller shared buffers can make things more efficient, because the
search for replacement buffers is cheaper with a smaller shared buffers
setting.
Greetings,
Andres FreundSunday, March 15, 2015 2:25 PM
Here's the problem with a large shared_buffers on a machine that's
getting pushed into swap. It starts to swap BUFFERs. Once buffers
start getting swapped you're not just losing performance, that huge
shared_buffers is now working against you because what you THINK are
buffers in RAM to make things faster are in fact blocks on a hard
drive being swapped in and out during reads. It's the exact opposite
of fast. :)Sunday, March 15, 2015 1:46 PM
That imo doesn't really have anything to do with it. The primary benefit
of a BBU with writeback caching is accelerating (near-)synchronous
writes. Like the WAL. But, besides influencing the default for
wal_buffers, a larger shared_buffers doesn't change the amount of
synchronous writes.
Greetings,
Andres FreundSunday, March 15, 2015 1:42 PMOn Sun, Mar 15, 2015 at 8:20 PM, Andres Freund <andres@2ndquadrant.com> wrote:On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote:shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so.I think that's a outdated wisdom, i.e. not generally true.Quite agreed. With note, that proper configured controller with BBU is needed.A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages.Most likely. And better to be sure that filesystem mounted without barrier. And I agree with Scott - 64MB work mem AND max_connections = 500 is a recipe for disaster. The problem could be in session mode of pgbouncer. If you can work with transaction mode - do it. Best regards, Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
Attachment
On 16.3.2015 00:55, michael@sqlexec.com wrote: > Why is 500 connections "insane". We got 32 CPU with 96GB and 3000 > max connections, and we are doing fine, even when hitting our max > concurrent connection peaks around 4500. At a previous site, we were > using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 > max concurrent connections. So I wouldn't be too hasty in saying more > than 500 is asking for trouble. Just as long as you got your kernel > resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and > ulimits), and RAM for work_mem. If all the connections are active at the same time (i.e. running queries), they have to share the 32 cores somehow. Or I/O, if that's the bottleneck. In other words, you're not improving the throughput of the system, you're merely increasing latencies. And it may easily happen that the latency increase is not linear, but grows faster - because of locking, context switches and other process-related management. Imagine you have a query taking 1 second of CPU time. If you have 64 such queries running concurrently on 32 cores, each gets only 1/2 a CPU and so takes >=2 seconds. With 500 queries, it's >=15 seconds per, etc. If those queries are acquiring the same locks (e.g. updating the same rows, or so), you can imagine what happens ... Also, if part of the query required a certain amount of memory for part of the plan, it now holds that memory for much longer too. That only increases the change of OOM issues. It may work fine when most of the connections are idle, but it makes storms like this possible. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
(please quote properly) On 2015-03-15 19:55:23 -0400, michael@sqlexec.com wrote: > Why is 500 connections "insane". We got 32 CPU with 96GB and 3000 max > connections, and we are doing fine, even when hitting our max concurrent > connection peaks around 4500. At a previous site, we were using 2000 max > connections on 24 CPU and 64GB RAM, with about 1500 max concurrent > connections. So I wouldn't be too hasty in saying more than 500 is asking > for trouble. Just as long as you got your kernel resources set high enough > to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. It may work acceptably in some scenarios, but it can lead to significant problems. Several things in postgres things in postgres scale linearly (from the algorithmic point of view, often CPU characteristics like cache sizes make it wors) with max_connections, most notably acquiring a snapshot. It usually works ok enough if you don't have a high number of queries per second, but if you do, you can run into horrible contention problems. Absurdly enough that matters *more* on bigger machines with several sockets. It's especially bad on 4+ socket systems. The other aspect is that such a high number of full connections usually just isn't helpful for throughput. Not even the most massive NUMA systems (~256 hardware threads is the realistic max atm IIRC) can process 4.5k queries at the same time. It'll often be much more efficient if all connections above a certain number aren't allocated a full postgres backend, with all it's overhead, but use a much more lightweight pooler connection. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I'm still waiting to find out how many CPUs on this DB server. Did i miss it somewhere in the email thread below?
Sunday, March 15, 2015 8:07 PM
If all the connections are active at the same time (i.e. running
queries), they have to share the 32 cores somehow. Or I/O, if that's the
bottleneck.
In other words, you're not improving the throughput of the system,
you're merely increasing latencies. And it may easily happen that the
latency increase is not linear, but grows faster - because of locking,
context switches and other process-related management.
Imagine you have a query taking 1 second of CPU time. If you have 64
such queries running concurrently on 32 cores, each gets only 1/2 a CPU
and so takes >=2 seconds. With 500 queries, it's >=15 seconds per, etc.
If those queries are acquiring the same locks (e.g. updating the same
rows, or so), you can imagine what happens ...
Also, if part of the query required a certain amount of memory for part
of the plan, it now holds that memory for much longer too. That only
increases the change of OOM issues.
It may work fine when most of the connections are idle, but it makes
storms like this possible.Sunday, March 15, 2015 7:55 PMWhy is 500 connections "insane". We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem.Sunday, March 15, 2015 7:41 PMOn 15.3.2015 23:47, Andres Freund wrote:On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote:Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :)IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed.I second this opinion. As was already pointed out, the 500 connections is rather insane (assuming the machine does not have hundreds of cores). If there are memory pressure issues, it's likely because many queries are performing memory-expensive operations at the same time (might even be a bad estimate causing hashagg to use much more than work_mem).But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell.I've already asked for vmstat logs, so let's wait.In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting.I've met many systems with max_connections values this high, and it was mostly idle connections because of separate connection pools on each application server. So mostly idle (90% of the time), but at peak time all the application servers want to od stuff at the same time. And it all goes KABOOOM! just like here.Sunday, March 15, 2015 6:47 PM
IMNSHO that's tackling things from the wrong end. If 12GB of shared
buffers drive your 48GB dedicated OLTP postgres server into swapping out
actively used pages, the problem isn't the 12GB of shared buffers, but
that you require so much memory for other things. That needs to be
fixed.
But! We haven't even established that swapping is an actual problem
here. The ~2GB of swapped out memory could just as well be the java raid
controller management monstrosity or something similar. Those pages
won't ever be used and thus can better be used to buffer IO.
You can check what's actually swapped out using:
grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB'
For swapping to be actually harmful you need to have pages that are
regularly swapped in. vmstat will tell.
In a concurrent OLTP workload (~450 established connections do suggest
that) with a fair amount of data keeping the hot data set in
shared_buffers can significantly reduce problems. Constantly searching
for victim buffers isn't a nice thing, and that will happen if your most
frequently used data doesn't fit into s_b. On the other hand, if your
data set is so large that even the hottest part doesn't fit into memory
(perhaps because there's no hottest part as there's no locality at all),
a smaller shared buffers can make things more efficient, because the
search for replacement buffers is cheaper with a smaller shared buffers
setting.
Greetings,
Andres FreundSunday, March 15, 2015 2:25 PM
Here's the problem with a large shared_buffers on a machine that's
getting pushed into swap. It starts to swap BUFFERs. Once buffers
start getting swapped you're not just losing performance, that huge
shared_buffers is now working against you because what you THINK are
buffers in RAM to make things faster are in fact blocks on a hard
drive being swapped in and out during reads. It's the exact opposite
of fast. :)
Attachment
On 16/03/15 13:07, Tomas Vondra wrote: > On 16.3.2015 00:55, michael@sqlexec.com wrote: >> Why is 500 connections "insane". We got 32 CPU with 96GB and 3000 >> max connections, and we are doing fine, even when hitting our max >> concurrent connection peaks around 4500. At a previous site, we were >> using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 >> max concurrent connections. So I wouldn't be too hasty in saying more >> than 500 is asking for trouble. Just as long as you got your kernel >> resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and >> ulimits), and RAM for work_mem. [...] > Also, if part of the query required a certain amount of memory for part > of the plan, it now holds that memory for much longer too. That only > increases the change of OOM issues. > [...] Also you could get a situation where a small number of queries & their data, relevant indexes, and working memory etc can all just fit into RAM, but the extra queries suddenly reduce the RAM so that even these queries spill to disk, plus the time required to process the extra queries. So a nicely behaved system could suddenly get a lot worse. Even before you consider additional lock contention and other nasty things! It all depends... Cheers, Gavin
Hi!Robert,We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here:If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it.
Wow - You've engaged the wizards indeed.
I haven't heard or seen anything that would answer my *second* question if faced with this (my first would have been "what changed")....
What is the database actually trying to do when it spikes? e.g. what queries are running ?
Is there any pattern in the specific activity (exactly the same query, or same query different data, or even just same tables, and/or same users, same apps) when it spikes?
I know from experience that well behaved queries can stop being well behaved if underlying data changes
and for the experts... what would a corrupt index do to memory usage?
Roxanne
Robert,
Wow - You've engaged the wizards indeed.
I haven't heard or seen anything that would answer my *second* question if faced with this (my first would have been "what changed")....
Yes, indeed — I feel honored to have so many people chime into this issue.
The problem was that nothing abnormal was happening — just the normal queries were running that hadn’t given us any problems for months. We undid everything that had been recently changed in an effort to address “what changed”. Nothing helped, which is what had us so perplexed.
However, I am glad to report that our problems are fixed and that our server is back to humming along nicely.
What we changed:
1. As it was pointed out here, max_connections of 500 was in fact insanely high, especially in light of using PGbouncer. Before we used PGbouncer we needed a lot more connections and when we started using PGbouncer, we never reduced this number.
2. Our server_lifetime was set far too high (1 hour). Josh Berkus suggested lowering that to 5 minutes.
3. We reduced the number of PGbouncer active connections to the DB.
What we learned:
1. We had too many backends
2. The backends were being kept around for too long by PGbouncer.
3. This caused too many idle backends to kick around. Once we exhausted physical ram, we started swapping.
4. Linux 3.2 apparently has some less than desirable swap behaviours. Once we started swapping, everything went nuts.
Going forward we’re going to upgrade our kernel the next time we have down time for our site and the rest should be sorted now.
I wanted to thank everyone who contributed their thoughts to this thread — THANK YOU.
And as I said to Josh earlier: "Postgres rocks our world. I’m immensely pleased that once again the problems were our own stupidity and not PG’s fault. In over 10 years of us using PG, it has never been PG’s fault. Not once.”
And thus we’re one tiny bit smarter today. Thank you everyone!
P.S. If anyone would still like to get some more information about this problem for their own edification, please let me know. Given that we’ve fixed the issue, I don’t want to spam this list by responding to all the questions that were posed.
Robert Kaye schrieb am 16.03.2015 um 13:59: > However, I am glad to report that our problems are fixed and that our > server is back to humming along nicely. > > And as I said to Josh earlier: "Postgres rocks our world. I’m > immensely pleased that once again the problems were our own stupidity > and not PG’s fault. In over 10 years of us using PG, it has never > been PG’s fault. Not once.” > > And thus we’re one tiny bit smarter today. Thank you everyone! > I think it would be nice if you can amend your blog posting to include the solution that you found. Otherwise this will simply stick around as yet another unsolved performance problem Thomas
Robert Kaye <rob@musicbrainz.org> wrote: > However, I am glad to report that our problems are fixed and that our server is > back to humming along nicely. > > What we changed: > > 1. As it was pointed out here, max_connections of 500 was in fact insanely > high, especially in light of using PGbouncer. Before we used PGbouncer we > needed a lot more connections and when we started using PGbouncer, we never > reduced this number. > > 2. Our server_lifetime was set far too high (1 hour). Josh Berkus suggested > lowering that to 5 minutes. > > 3. We reduced the number of PGbouncer active connections to the DB. > Many thanks for the feedback! Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
> On Mar 16, 2015, at 2:22 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > > I think it would be nice if you can amend your blog posting to include the solution that you found. > > Otherwise this will simply stick around as yet another unsolved performance problem Good thinking: http://blog.musicbrainz.org/2015/03/16/postgres-troubles-resolved/ I’ve also updated the original post with the like to the above. Case closed. :) -- --ruaok Robert Kaye -- rob@musicbrainz.org -- http://musicbrainz.org
On 03/16/2015 05:59 AM, Robert Kaye wrote: > 4. Linux 3.2 apparently has some less than desirable swap behaviours. > Once we started swapping, everything went nuts. Relevant to this: http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html Anybody who is on Linux Kernels 3.0 to 3.8 really needs to upgrade soon. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
> On Mar 16, 2015, at 2:22 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>
> I think it would be nice if you can amend your blog posting to include the solution that you found.
>
> Otherwise this will simply stick around as yet another unsolved performance problem
Good thinking:
http://blog.musicbrainz.org/2015/03/16/postgres-troubles-resolved/
I’ve also updated the original post with the like to the above. Case closed. :)
--
--ruaok
Robert Kaye -- rob@musicbrainz.org -- http://musicbrainz.org--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 3/15/15 7:17 PM, michael@sqlexec.com wrote: Please avoid top-posting. > I agree with your counter argument about how high max_connections "can" > cause problems, but max_connections may not part of the problem here. > There's a bunch of "depends stuff" in there based on workload details, # > cpus, RAM, etc. Sure, but the big, huge danger with a very large max_connections is that you now have a large grenade with the pin pulled out. If *anything* happens to disturb the server and push the active connection count past the number of actual cores the box is going to fall over and not recover. In contrast, if max_connections is <= the number of cores this is far less likely to happen. Each connection will get a CPU to run on, and as long as they're not all clamoring for the same locks the server will be making forward progress. Clients may have to wait in the pool for a free connection for some time, but once they get one their work will get done. > I'm still waiting to find out how many CPUs on this DB server. Did i > miss it somewhere in the email thread below? http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ might show it somewhere... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Mon, Mar 16, 2015 at 6:59 AM, Robert Kaye <rob@musicbrainz.org> wrote: > > 4. Linux 3.2 apparently has some less than desirable swap behaviours. Once > we started swapping, everything went nuts. On older machines I used to just turn off swap altogether. Esp if I wasn't running out of memory but swap was engaging anyway. swappiness = 0 didn't help, nothing did, I just kept seeing kswapd working it's butt off doing nothing but hitting the swap partition. So glad to be off those old kernels.