Thread: Monitoring number of backends
Hi all. My website is about to get a little more popular. I'm trying to add in some measurements to determine an upper limit of how many concurrent database connections I'm currently using. I've started running this: SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM pg_stat_database; Every 10 seconds or so. I don't think its what I want though. It seems way too small. I'm guessing that its not a measure of the previous 10 seconds. Its a count of how many backends are in use at the exact moment I run the sql. Is there a cumulative count someplace? Thanks for your time, -Andy
Andy, * andy (andy@squeakycode.net) wrote: > My website is about to get a little more popular. I'm trying to add > in some measurements to determine an upper limit of how many > concurrent database connections I'm currently using. PG is really *much* happier if you have only one backend per CPU in your system. The way to get there is by using a connection pooler like pg_bouncer and configuring it based on how many CPUs you have. pg_bouncer can also provide stats for you. Thanks, Stephen
Attachment
On 10/22/2013 12:59 PM, Stephen Frost wrote: > Andy, > > * andy (andy@squeakycode.net) wrote: >> My website is about to get a little more popular. I'm trying to add >> in some measurements to determine an upper limit of how many >> concurrent database connections I'm currently using. > > PG is really *much* happier if you have only one backend per CPU in your > system. The way to get there is by using a connection pooler like > pg_bouncer and configuring it based on how many CPUs you have. > > pg_bouncer can also provide stats for you. > > Thanks, > > Stephen > Hum.. I had not thought of that. My current setup uses 40 max connections, and I don't think I've ever hit it. I use apache and php, and my db connections are not persistent. If I did plugin pg_bouncer, is it worth switching my php from pg_connect to pg_pconnect? I'd bet plugging in pg_bouncer now while I'm not too busy would help me grow in the long run, huh? I like the sound of that. Thanks! -Andy
* andy (andy@squeakycode.net) wrote: > If I did plugin pg_bouncer, is it worth switching my php from > pg_connect to pg_pconnect? No, let pg_bouncer manage the connection pooling. Having two levels of pooling isn't a good idea (and pg_bouncer does a *much* better job of it anyway, imv..). > I'd bet plugging in pg_bouncer now while I'm not too busy would help > me grow in the long run, huh? I like the sound of that. Thanks! Yes. Note that, as you scale, you can actually connect pg_bouncers to other pg_bouncers and play other fun games like having pg_bouncer switch which servers it's connecting to transparently to the app.. Thanks, Stephen
Attachment
On 10/22/2013 10:59 AM, Stephen Frost wrote:
PG is really *much* happier if you have only one backend per CPU in your system. The way to get there is by using a connection pooler like pg_bouncer and configuring it based on how many CPUs you have.
Actually, I've found peak throughputs on a decent multicore server with lots of ram, and lots of disk IO parallelism (eg, big raid10) is aruond 2X the socket*hyperthread*core count... so for instance, on a modern 2 socket E5-2665 kind of server, thats 2 x 8 cores with 2 threads per core, thats 16 total cores, 32 total hardware threads, so about 64 database connections would be peak, given a decent raid10 of lots of SAS2 10k/15k disks
-- john r pierce 37N 122W somewhere on the middle of the left coast
On 10/22/2013 11:25 AM, andy wrote: > Hum.. I had not thought of that. My current setup uses 40 max > connections, and I don't think I've ever hit it. I use apache and > php, and my db connections are not persistent. that style of php programming, you're getting some HUGE overhead in connect/disconnect per web page. putting pg_bouncer in the middle will make a HUGE improvement, possibly a second per page load on a busy server. You can even set it up in parallel with things live... first install pgbouncer, and configure it to listen on an alternate port, say 15432, and verify you can connect and everything looks good (psql -p 15432...) then tweak your PHP app's database config to use that port instead of 5432... -- john r pierce 37N 122W somewhere on the middle of the left coast
* John R Pierce (pierce@hogranch.com) wrote: > On 10/22/2013 10:59 AM, Stephen Frost wrote: > >PG is really*much* happier if you have only one backend per CPU in your > >system. The way to get there is by using a connection pooler like > >pg_bouncer and configuring it based on how many CPUs you have. > > Actually, I've found peak throughputs on a decent multicore server > with lots of ram, and lots of disk IO parallelism (eg, big raid10) > is aruond 2X the socket*hyperthread*core count... so for instance, > on a modern 2 socket E5-2665 kind of server, thats 2 x 8 cores with > 2 threads per core, thats 16 total cores, 32 total hardware threads, > so about 64 database connections would be peak, given a decent > raid10 of lots of SAS2 10k/15k disks Sure. As always with performance- test, test, test on gear that is as close to identical to the prod gear (or the prod gear itself, if you can get away with it) as possible.. Every workload is different. Thanks, Stephen
Attachment
On 10/22/2013 2:18 PM, John R Pierce wrote: > On 10/22/2013 11:25 AM, andy wrote: >> Hum.. I had not thought of that. My current setup uses 40 max >> connections, and I don't think I've ever hit it. I use apache and >> php, and my db connections are not persistent. > > that style of php programming, you're getting some HUGE overhead in > connect/disconnect per web page. putting pg_bouncer in the middle > will make a HUGE improvement, possibly a second per page load on a busy > server. > > No, actually, I don't think my connect overhead is huge. My apache and postgres are on the same box, and it connects using unix socket. Perhaps if my apache on db were on different boxes it would be a problem. My page response time is sub-second, and I run quite a few queries to build the page. But also, my server isn't to busy at the moment. The load is around 0.3 to 0.5 when its busy. Stephen Said: >> If I did plugin pg_bouncer, is it worth switching my php from >> pg_connect to pg_pconnect? > > No, let pg_bouncer manage the connection pooling. Having two levels of > pooling isn't a good idea (and pg_bouncer does a *much* better job of it > anyway, imv..). > So you say DO use persistent connections, and Stephen says DONT use them. Although there are a few new players. Assuming Apache, pgbouncer and postgres are all on the same box, and I'm using unix sockets as much as possible, it probably doesn't matter if I use non-persistent connections from php. But if I need to move the db to its own box... then should I move pgbouncer there too? Assuming db is on a different box, persistent connections from php to postgres might be bad. But how about persistent connections to pgbouncer? Thinking about it, if I need to move the db, I'll leave pgbouncer on the webserv box. That way I can unix socket from php to pgbouncer (non-persistent, cuz its fast enough anyway), and let pgbouncer do it's own thing to the database box. Seem like a reasonable sort of thing? -Andy
andy <andy@squeakycode.net> writes: > On 10/22/2013 2:18 PM, John R Pierce wrote: >> that style of php programming, you're getting some HUGE overhead in >> connect/disconnect per web page. putting pg_bouncer in the middle >> will make a HUGE improvement, possibly a second per page load on a busy >> server. > No, actually, I don't think my connect overhead is huge. My apache and > postgres are on the same box, and it connects using unix socket. You're ignoring the fact that PG backends have a pretty considerable startup transient. By the time a backend has gotten its caches populated enough to be efficient, it's expended a lot of cycles. You might be getting away with this approach under low load, but it will bite you in painful places eventually. regards, tom lane
On 10/22/2013 1:13 PM, andy wrote: > No, actually, I don't think my connect overhead is huge. My apache > and postgres are on the same box, and it connects using unix socket. > Perhaps if my apache on db were on different boxes it would be a problem. each postgres connection, if you're not using a pool, requires a fork() of the postgres process. fork is inherently an expensive process, especially for a moderately large and fairly complex piece of software like postgresql. -- john r pierce 37N 122W somewhere on the middle of the left coast
On 10/22/2013 3:44 PM, Tom Lane wrote: > andy <andy@squeakycode.net> writes: >> On 10/22/2013 2:18 PM, John R Pierce wrote: >>> that style of php programming, you're getting some HUGE overhead in >>> connect/disconnect per web page. putting pg_bouncer in the middle >>> will make a HUGE improvement, possibly a second per page load on a busy >>> server. > >> No, actually, I don't think my connect overhead is huge. My apache and >> postgres are on the same box, and it connects using unix socket. > > You're ignoring the fact that PG backends have a pretty considerable > startup transient. By the time a backend has gotten its caches populated > enough to be efficient, it's expended a lot of cycles. You might be > getting away with this approach under low load, but it will bite you in > painful places eventually. > > regards, tom lane > > > but it will bite you in > painful places eventually. :-) heh. Well I think PG is even more impressive now. My server is on a VM, and I'm pretty much doing things the slow way, and I get a page back in 500ms. And this is a busy time of day. Of course, I'm right next to the server. Anyone wanna check page times for me? http://jasper.iowaassessors.com/parcel.php?gid=99680 I'm talking JUST parcel.php ... the maps and photos don't count. Thanks all. -Andy
* John R Pierce (pierce@hogranch.com) wrote: > On 10/22/2013 1:13 PM, andy wrote: > >No, actually, I don't think my connect overhead is huge. My > >apache and postgres are on the same box, and it connects using > >unix socket. Perhaps if my apache on db were on different boxes it > >would be a problem. > > each postgres connection, if you're not using a pool, requires a > fork() of the postgres process. fork is inherently an expensive > process, especially for a moderately large and fairly complex piece > of software like postgresql. As Tom points out, it's really PG that makes the new connections slow; fork(), while it can be slow on some platforms, really is small potatos compared to PG opening a database, populating caches, etc. Thanks, Stephen
Attachment
On 10/22/2013 12:59 PM, Stephen Frost wrote: > Andy, > > * andy (andy@squeakycode.net) wrote: >> My website is about to get a little more popular. I'm trying to add >> in some measurements to determine an upper limit of how many >> concurrent database connections I'm currently using. > > PG is really *much* happier if you have only one backend per CPU in your > system. The way to get there is by using a connection pooler like > pg_bouncer and configuring it based on how many CPUs you have. > > pg_bouncer can also provide stats for you. > > Thanks, > > Stephen > Ahh, bummer, man. PgBouncer doesn't work so well when you have lots of databases. I have about 90 databases, the websitecould connect to any one of them on any request. (They are all about as equally likely to be hit) In my pgbouncer.ini I set [databases] * = [pgbouncer] max_cleint_conn = 200 default_pool_size = 20 but that made each database its own pool, so 90*20 possible (but then maxed out at 200, I assume). Postgres has only a max_connectionsof 40, so I started getting "FATAL sorry too many cleints already" errors. I set the max_cleint_conn = 40 and default_pool_size = 3 so it should stop erroring out, but is this type of setup even going to help me? If pgbouncer has 40 connections open to40 different databases, and then a request comes in for a db it doesnt have, it'll have to drop one and build a new connection,yes? Won't that be slower than just connecting right to postgres in the first place? I need a Most-Recently-Used pool, not a per-database pool. Is there a way to do this with pgbouncer? (With a large numberof databases) -Andy
On 10/22/2013 7:45 PM, Andy Colson wrote: > Ahh, bummer, man. PgBouncer doesn't work so well when you have lots > of databases. I have about 90 databases, the website could connect to > any one of them on any request. (They are all about as equally likely > to be hit) that scenario would better be met by using schemas in a single database. -- john r pierce 37N 122W somewhere on the middle of the left coast
On Tue, Oct 22, 2013 at 12:41:58PM -0500, andy wrote: - Hi all. - - My website is about to get a little more popular. I'm trying to add in - some measurements to determine an upper limit of how many concurrent - database connections I'm currently using. - - I've started running this: - - SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM - pg_stat_database; - - Every 10 seconds or so. I don't think its what I want though. It seems - way too small. I'm guessing that its not a measure of the previous 10 - seconds. Its a count of how many backends are in use at the exact - moment I run the sql. - - Is there a cumulative count someplace? - - Thanks for your time, - - -Andy You've gotten good info from the other guys on how to scale your're DB but to answer you're original question. I usually use this query: select count(*) from pg_stat_activity where state <> 'idle'; That gives you the # of "active" connections to your database and is something you want to try to get good metrics on. Idle connections have some overhead but if Active > # of CPUs your performance starts to degrade. Now, really that's pretty normal but, ideally, you need to know what the ratio of Active Connections to # CPUs still gives you acceptable performance. And that's really based on your app and hardware.
On 10/23/2013 11:07 AM, David Kerr wrote: > On Tue, Oct 22, 2013 at 12:41:58PM -0500, andy wrote: > - Hi all. > - > - My website is about to get a little more popular. I'm trying to add in > - some measurements to determine an upper limit of how many concurrent > - database connections I'm currently using. > - > - I've started running this: > - > - SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM > - pg_stat_database; > - > - Every 10 seconds or so. I don't think its what I want though. It seems > - way too small. I'm guessing that its not a measure of the previous 10 > - seconds. Its a count of how many backends are in use at the exact > - moment I run the sql. > - > - Is there a cumulative count someplace? > - > - Thanks for your time, > - > - -Andy > > You've gotten good info from the other guys on how to scale your're DB > but to answer you're original question. I usually use this query: > > select count(*) from pg_stat_activity where state <> 'idle'; > > That gives you the # of "active" connections to your database and is > something you want to try to get good metrics on. > > Idle connections have some overhead but if Active > # of CPUs your performance > starts to degrade. Now, really that's pretty normal but, ideally, you need to > know what the ratio of Active Connections to # CPUs still gives you acceptable > performance. And that's really based on your app and hardware. > > How often do you run that? Once a second? And graph it? I was doing it every 10 seconds, but it doesn't give me a good view of the system. -Andy
On Wed, Oct 23, 2013 at 12:11:39PM -0500, andy wrote: - On 10/23/2013 11:07 AM, David Kerr wrote: - >On Tue, Oct 22, 2013 at 12:41:58PM -0500, andy wrote: - >- Hi all. - >- - >- My website is about to get a little more popular. I'm trying to add in - >- some measurements to determine an upper limit of how many concurrent - >- database connections I'm currently using. - >- - >- I've started running this: - >- - >- SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM - >- pg_stat_database; - >- - >- Every 10 seconds or so. I don't think its what I want though. It seems - >- way too small. I'm guessing that its not a measure of the previous 10 - >- seconds. Its a count of how many backends are in use at the exact - >- moment I run the sql. - >- - >- Is there a cumulative count someplace? - >- - >- Thanks for your time, - >- - >- -Andy - > - >You've gotten good info from the other guys on how to scale your're DB - >but to answer you're original question. I usually use this query: - > - >select count(*) from pg_stat_activity where state <> 'idle'; - > - >That gives you the # of "active" connections to your database and is - >something you want to try to get good metrics on. - > - >Idle connections have some overhead but if Active > # of CPUs your - >performance - >starts to degrade. Now, really that's pretty normal but, ideally, you need - >to - >know what the ratio of Active Connections to # CPUs still gives you - >acceptable - >performance. And that's really based on your app and hardware. - > - > - - How often do you run that? Once a second? And graph it? I was doing - it every 10 seconds, but it doesn't give me a good view of the system. I actually have it as a munin module so it runs every few minutes. If I'm actually doing a performance test or something I would run it every second or every 5 / 10 seconds. The knowledge is only really useful if you have a good trend built up so you know how your app responds to having a certian # of active connections.
On Tue, Oct 22, 2013 at 09:45:24PM -0500, Andy Colson wrote: > On 10/22/2013 12:59 PM, Stephen Frost wrote: > >Andy, > > > >* andy (andy@squeakycode.net) wrote: > >>My website is about to get a little more popular. I'm trying to add > >>in some measurements to determine an upper limit of how many > >>concurrent database connections I'm currently using. > > > >PG is really *much* happier if you have only one backend per CPU in your > >system. The way to get there is by using a connection pooler like > >pg_bouncer and configuring it based on how many CPUs you have. > > > >pg_bouncer can also provide stats for you. > > > > Thanks, > > > > Stephen > > > > Ahh, bummer, man. PgBouncer doesn't work so well when you have lots of databases. I have about 90 databases, the websitecould connect to any one of them on any request. (They are all about as equally likely to be hit) > > In my pgbouncer.ini I set > [databases] > * = > [pgbouncer] > max_cleint_conn = 200 > default_pool_size = 20 > > but that made each database its own pool, so 90*20 possible (but then maxed out at 200, I assume). Postgres has only amax_connections of 40, so I started getting "FATAL sorry too many cleints already" errors. > > I set the max_cleint_conn = 40 > and default_pool_size = 3 > > so it should stop erroring out, but is this type of setup even going to help me? If pgbouncer has 40 connections opento 40 different databases, and then a request comes in for a db it doesnt have, it'll have to drop one and build a newconnection, yes? Won't that be slower than just connecting right to postgres in the first place? > > I need a Most-Recently-Used pool, not a per-database pool. Is there a way to do this with pgbouncer? (With a large numberof databases) PgBouncer does not do any cross-pool limiting, so you still can have 3*90 server connections open. The max_client_conn is meant for rescue limit when something is completely broken and should not be hit in normal work. 40000 would be reasonable number... With that many databases, you just need to accept you need few connections to each db open, otherwise you won't get much win from pooling. So Postgres max_connections should be indeed >= 3*90 for you. And you can limit server connection via server_idle_timeout. -- marko
On Tue, Oct 22, 2013 at 10:41 AM, andy <andy@squeakycode.net> wrote:
Hi all.
My website is about to get a little more popular. I'm trying to add in some measurements to determine an upper limit of how many concurrent database connections I'm currently using.
I've started running this:
SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM pg_stat_database;
Every 10 seconds or so. I don't think its what I want though. It seems way too small. I'm guessing that its not a measure of the previous 10 seconds. Its a count of how many backends are in use at the exact moment I run the sql.
Is there a cumulative count someplace?
You can pull it from the log files if you change postgres.conf to use:
log_connections = on
log_disconnections = on
Cheers,
Jeff
On Tue, Oct 22, 2013 at 12:18 PM, John R Pierce <pierce@hogranch.com> wrote:
On 10/22/2013 11:25 AM, andy wrote:that style of php programming, you're getting some HUGE overhead in connect/disconnect per web page. putting pg_bouncer in the middle will make a HUGE improvement, possibly a second per page load on a busy server.Hum.. I had not thought of that. My current setup uses 40 max connections, and I don't think I've ever hit it. I use apache and php, and my db connections are not persistent.
My recent experience with mediawiki is that php is such a slow beast anyway (even with APC) that connection/disconnect overhead is likely not to be significant. But it would still be a good idea for him to learn pgbouncer, in case his php code is much faster than mediawiki's is, or he runs into the spinlock contention inside postgresql that has been all the rage lately. It just isn't the first place I would look anymore.
Cheers,
Jeff
On Tue, Oct 22, 2013 at 1:13 PM, andy <andy@squeakycode.net> wrote:
On 10/22/2013 2:18 PM, John R Pierce wrote:No, actually, I don't think my connect overhead is huge. My apache and postgres are on the same box, and it connects using unix socket. Perhaps if my apache on db were on different boxes it would be a problem.On 10/22/2013 11:25 AM, andy wrote:Hum.. I had not thought of that. My current setup uses 40 max
connections, and I don't think I've ever hit it. I use apache and
php, and my db connections are not persistent.
that style of php programming, you're getting some HUGE overhead in
connect/disconnect per web page. putting pg_bouncer in the middle
will make a HUGE improvement, possibly a second per page load on a busy
server.
I have not noticed a large difference between loopback and unix socket on any modern computer. The difference between a loopback connection and a connection between two machines in the same data center is more noticeable when benchmarked in highly optimized code, but I'm skeptical about how meaningful it would be in php. You can always use ab or ab2 and try it out for yourself.
My page response time is sub-second, and I run quite a few queries to build the page. But also, my server isn't to busy at the moment. The load is around 0.3 to 0.5 when its busy.
Wasn't your question to figure out how to make sure things continue to run fine when the demand increases to a higher level than it currently is? If you cite its current OK performance to reject the advice, I'm not really sure what we are going to accomplish.
Stephen Said:If I did plugin pg_bouncer, is it worth switching my php from
pg_connect to pg_pconnect?No, let pg_bouncer manage the connection pooling. Having two levels of
pooling isn't a good idea (and pg_bouncer does a *much* better job of it
anyway, imv..).
So you say DO use persistent connections, and Stephen says DONT use them.
They both say to use persistent connections--the ones between pg_bouncer and postgres itself. But for two different reasons, one to reduce the number of connections you make and break to postgresql, the other to reduce the number of active connections at any one time. Both are valid reasons.
Although there are a few new players. Assuming Apache, pgbouncer and postgres are all on the same box, and I'm using unix sockets as much as possible, it probably doesn't matter if I use non-persistent connections from php.
But if I need to move the db to its own box... then should I move pgbouncer there too?
That depends on where the bottleneck is.
Cheers,
Jeff
On 10/23/2013 04:28 PM, Jeff Janes wrote: > > My page response time is sub-second, and I run quite a few queries to build the page. But also, my server isn't tobusy at the moment. The load is around 0.3 to 0.5 when its busy. > > > Wasn't your question to figure out how to make sure things continue to run fine when the demand increases to a higher levelthan it currently is? If you cite its current OK performance to reject the advice, I'm not really sure what we are goingto accomplish. > Correct, my current load and response time are fine, but I'll be getting more load soon. I didn't reject the advice. Iinstalled pgbouncer on my test box, played with it, then installed it on the live box and let it run live for a while, untilI ran into problems, then shut it off. > > Although there are a few new players. Assuming Apache, pgbouncer and postgres are all on the same box, and I'm usingunix sockets as much as possible, it probably doesn't matter if I use non-persistent connections from php. > > But if I need to move the db to its own box... then should I move pgbouncer there too? > > > That depends on where the bottleneck is. > That's my big problem. I don't have a bottleneck now, and I'm trying to guess (without experience) where the bottleneckwill be. Judging by everyone's response, pg_connect will be a bottleneck that I'll have to try and find a solutionfor. Google did turn up some links about why you might put pgbouncer on web box vs. db box. Thats all well and good, except I'mnot sure I can even use pgbouncer as my magic pill. It doesn't work so well with lots of databases. And I have lotsof code and batch processes in place so its not gonna be simple to use a single db with lots of schemas. I'm still undecided on what to do, or if I should even do anything at all. I am grateful for all the advice though. -Andy