Thread: Estimated resources for a 500 connections instance (VM)
psql (9.6.0, server 11.3) on linux
We've ramped up usage on a PG server (a VM, I have no choice about this) and are approaching the 100 connections limit. We could increase the limit, but I've read that this can lead to a degradation in performance. If we bump it up to 500, what kind of compute resources would that require (mem/cpu)? Any estimates ?
Thanks !
Hello.
Have you checked https://wiki.postgresql.org/wiki/Number_Of_Database_Connections?
For raw estimations you can also check http://pgconfigurator.cybertec.at/ or https://pgtune.leopard.in.ua/#/.
út 7. 4. 2020 v 18:25 odesílatel David Gauthier <davegauthierpg@gmail.com> napsal:
psql (9.6.0, server 11.3) on linuxWe've ramped up usage on a PG server (a VM, I have no choice about this) and are approaching the 100 connections limit. We could increase the limit, but I've read that this can lead to a degradation in performance. If we bump it up to 500, what kind of compute resources would that require (mem/cpu)? Any estimates ?Thanks !
On Tue, 2020-04-07 at 12:24 -0400, David Gauthier wrote: > psql (9.6.0, server 11.3) on linux > > We've ramped up usage on a PG server (a VM, I have no choice about this) and are approaching the > 100 connections limit. We could increase the limit, but I've read that this can lead to a > degradation in performance. If we bump it up to 500, what kind of compute resources would that > require (mem/cpu)? Any estimates ? You should upgrade to the latest minor release. Don't increase max_connections. The main problem is that the more connections there are, the greater the likelihood that too many of them will be active, overloading the database machine. This can for example happen if a DDL statement has to wait behind a lock. Then other queries will "pile up" behind it, and as soon as the DDL is done or canceled, the avalance will break loose. The better solution is to use a connection pool. If your application doesn't have one, use pgBouncer. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
After looking at some of the factors that can affect this, I think it may be important to know that most of the connections will be almost idle (in terms of interacting with the DB). The "users" are perl/dbi scripts which connect to the DB and spend the vast majority of the time doing things other than interacting with the DB. So a connection is consumed, but it's not really working very hard with the DB per-se. I am cleaning up some of that code by strategically connecting/disconnecting only when a DB interaction is required. But for my edification, is it roughly true that 2 connections working with the DB 100% of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?
On Tue, 7 Apr 2020 at 21:52, David Gauthier <davegauthierpg@gmail.com> wrote:
After looking at some of the factors that can affect this, I think it may be important to know that most of the connections will be almost idle (in terms of interacting with the DB). The "users" are perl/dbi scripts which connect to the DB and spend the vast majority of the time doing things other than interacting with the DB. So a connection is consumed, but it's not really working very hard with the DB per-se. I am cleaning up some of that code by strategically connecting/disconnecting only when a DB interaction is required. But for my edification, is it roughly true that 2 connections working with the DB 100% of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?
Hi,
Every open connection consumes a bit of resources witch is not a big deal if you keeping open a few more connections than you strictly needed. However when you keeping a few hundred idle connections those resources add up quickly. So don't do that if it's possible.
Likewise, establishing a new connection is resource costly process. So don't do that either if it's possible.
Long story short, if those connections don't use many different users then(as others already suggested) connection pooling will be the best solution.
Regards,
Sándor
David Gauthier <davegauthierpg@gmail.com> writes: > After looking at some of the factors that can affect this, I think it may > be important to know that most of the connections will be almost idle (in > terms of interacting with the DB). The "users" are perl/dbi scripts which > connect to the DB and spend the vast majority of the time doing things > other than interacting with the DB. So a connection is consumed, but it's > not really working very hard with the DB per-se. I am cleaning up some of > that code by strategically connecting/disconnecting only when a DB > interaction is required. But for my edification, is it roughly true that 2 > connections working with the DB 100% of the time is equivalent to 20 > connections @ 10% = 200 connections @ 1 % (if you know what I mean) ? Based on that additional info, I would definitely follow Laurenz's suggestion. Long time since I used Perl DBI, but I'm pretty sure there is is support for connection pools or you can use one of the PG connection pooling solutions. There is a fixed memory allocation per connection, so 2 connections at 100% is not the same as 20 connections @ 10%. Using a connection pool is usually the first thing I will setup. If additional connections are still required, then I would increase the limit in small jumps - definitely would not go from 100 to 500. BTW running PG on a virtual is not an issue in itself - this is very common these days. However, I would ensure you are up-to-date wrt latest minor release for that version and would use clients with the same version as the master. -- Tim Cross
On 4/7/20 2:23 PM, Sándor Daku wrote:
And from my experience pg_bouncer if very easy to include in your stack. (If not tried pg_pool.)On Tue, 7 Apr 2020 at 21:52, David Gauthier <davegauthierpg@gmail.com> wrote:After looking at some of the factors that can affect this, I think it may be important to know that most of the connections will be almost idle (in terms of interacting with the DB). The "users" are perl/dbi scripts which connect to the DB and spend the vast majority of the time doing things other than interacting with the DB. So a connection is consumed, but it's not really working very hard with the DB per-se. I am cleaning up some of that code by strategically connecting/disconnecting only when a DB interaction is required. But for my edification, is it roughly true that 2 connections working with the DB 100% of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?Hi,Every open connection consumes a bit of resources witch is not a big deal if you keeping open a few more connections than you strictly needed. However when you keeping a few hundred idle connections those resources add up quickly. So don't do that if it's possible.Likewise, establishing a new connection is resource costly process. So don't do that either if it's possible.Long story short, if those connections don't use many different users then(as others already suggested) connection pooling will be the best solution.Regards,Sándor
On 4/7/20 12:51 PM, David Gauthier wrote: > After looking at some of the factors that can affect this, I think it > may be important to know that most of the connections will be almost > idle (in terms of interacting with the DB). The "users" are perl/dbi > scripts which connect to the DB and spend the vast majority of the time > doing things other than interacting with the DB. So a connection is > consumed, but it's not really working very hard with the DB per-se. I > am cleaning up some of that code by strategically > connecting/disconnecting only when a DB interaction is required. But > for my edification, is it roughly true that 2 connections working with > the DB 100% of the time is equivalent to 20 connections @ 10% = 200 > connections @ 1 % (if you know what I mean) ? Well to get a sense of the load you could use top, with in top: 1) Hitting u key and entering postgres as user 2) Hitting c key to get full command line That should result in something like(though formatted better): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 978 postgres 20 0 72072 7688 6548 S 0.000 0.096 0:00.04 /usr/lib/systemd/systemd --user 981 postgres 20 0 117124 2704 52 S 0.000 0.034 0:00.00 (sd-pam) 1201 postgres 20 0 184292 20396 19520 S 0.000 0.254 0:01.19 /usr/local/pgsql12/bin/postmaster -D /usr/local/pgsql12/data 1255 postgres 20 0 38364 4192 3332 S 0.000 0.052 0:00.00 postgres: logger 1263 postgres 20 0 184408 5916 5016 S 0.000 0.074 0:00.00 postgres: checkpointer 1264 postgres 20 0 184424 5520 4640 S 0.000 0.069 0:00.18 postgres: background writer 1265 postgres 20 0 184292 9500 8620 S 0.000 0.118 0:00.18 postgres: walwriter 1266 postgres 20 0 185116 7756 6520 S 0.000 0.096 0:01.32 postgres: autovacuum launcher 1267 postgres 20 0 39488 5316 3780 S 0.000 0.066 0:02.86 postgres: stats collector 1268 postgres 20 0 184844 6064 4980 S 0.000 0.075 0:00.00 postgres: logical replication launcher 14478 postgres 20 0 185252 9612 8184 S 0.000 0.119 0:00.00 postgres: postgres production [local] idle 14507 postgres 20 0 185348 11380 9848 S 0.000 0.141 0:00.00 postgres: aklaver task_manager ::1(45202) idle -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 2020-04-07 at 15:51 -0400, David Gauthier wrote: > But for my edification, is it roughly true that 2 connections working with the DB 100% > of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ? Roughly, yes. There is a certain overhead, as each query has to take a snapshot of the database, which has to consider all active connections. But the main problem is that you have no way to ensure that all those idle connections stay idle. Imagine that most of these sessions issue short statements against "table1". There are also some longer SELECTs. No problem so far. Now along comes a statement that blocks the table for a very short time, like a CREATE INDEX CONCURRENTLY. Unfortunately, that ACCESS EXCLUSIVE lock has to wait for one of the longer SELECTs to finish. Now all those moderately short statements cannot run any more, but they have to queue behind the ACCESS EXCLUSIVE lock. For a while, nothing happens. Then, as soon as the CREATE INDEX CONCURRENTLY has got its lock, done its work and finished, all hell breaks loose, because all sessions start executing their statements at the same time. I have seen databases servers go down because of such events, and the problem is the high number of connections, even if they are idle most of the time. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com