Thread: Estimated resources for a 500 connections instance (VM)

Estimated resources for a 500 connections instance (VM)

From
David Gauthier
Date:
psql (9.6.0, server 11.3) on linux

We've ramped up usage on a PG server (a VM, I have no choice about this) and are approaching the 100 connections limit.  We could increase the limit, but I've read that this can lead to a degradation in performance. If we bump it up to 500, what kind of compute resources would that require (mem/cpu)?  Any estimates ?

Thanks ! 

Re: Estimated resources for a 500 connections instance (VM)

From
Josef Šimánek
Date:

út 7. 4. 2020 v 18:25 odesílatel David Gauthier <davegauthierpg@gmail.com> napsal:
psql (9.6.0, server 11.3) on linux

We've ramped up usage on a PG server (a VM, I have no choice about this) and are approaching the 100 connections limit.  We could increase the limit, but I've read that this can lead to a degradation in performance. If we bump it up to 500, what kind of compute resources would that require (mem/cpu)?  Any estimates ?

Thanks ! 

Re: Estimated resources for a 500 connections instance (VM)

From
Laurenz Albe
Date:
On Tue, 2020-04-07 at 12:24 -0400, David Gauthier wrote:
> psql (9.6.0, server 11.3) on linux
> 
> We've ramped up usage on a PG server (a VM, I have no choice about this) and are approaching the
> 100 connections limit.  We could increase the limit, but I've read that this can lead to a
> degradation in performance. If we bump it up to 500, what kind of compute resources would that
> require (mem/cpu)?  Any estimates ?

You should upgrade to the latest minor release.

Don't increase max_connections.  The main problem is that the more connections
there are, the greater the likelihood that too many of them will be active,
overloading the database machine.

This can for example happen if a DDL statement has to wait behind a lock.
Then other queries will "pile up" behind it, and as soon as the DDL is done
or canceled, the avalance will break loose.

The better solution is to use a connection pool.  If your application doesn't
have one, use pgBouncer.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Estimated resources for a 500 connections instance (VM)

From
David Gauthier
Date:
After looking at some of the factors that can affect this, I think it may be important to know that most of the connections will be almost idle (in terms of interacting with the DB).  The "users" are perl/dbi scripts which connect to the DB and spend the vast majority of the time doing things other than interacting with the DB.  So a connection is consumed, but it's not really working very hard with the DB per-se.  I am cleaning up some of that code by strategically connecting/disconnecting only when a DB interaction is required.  But for my edification, is it roughly true that 2 connections working with the DB 100% of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?  

Re: Estimated resources for a 500 connections instance (VM)

From
Sándor Daku
Date:
On Tue, 7 Apr 2020 at 21:52, David Gauthier <davegauthierpg@gmail.com> wrote:
After looking at some of the factors that can affect this, I think it may be important to know that most of the connections will be almost idle (in terms of interacting with the DB).  The "users" are perl/dbi scripts which connect to the DB and spend the vast majority of the time doing things other than interacting with the DB.  So a connection is consumed, but it's not really working very hard with the DB per-se.  I am cleaning up some of that code by strategically connecting/disconnecting only when a DB interaction is required.  But for my edification, is it roughly true that 2 connections working with the DB 100% of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?  

Hi,

Every open connection consumes a bit of resources witch is not a big deal if you keeping open a few more connections than you strictly needed. However when you keeping a few hundred idle connections those resources add up quickly. So don't do that if it's possible.
Likewise, establishing a new connection is resource costly process. So don't do that either if it's possible.
Long story short, if those connections don't use many different users then(as others already suggested) connection pooling will be the best solution.

Regards,
Sándor   

Re: Estimated resources for a 500 connections instance (VM)

From
Tim Cross
Date:
David Gauthier <davegauthierpg@gmail.com> writes:

> After looking at some of the factors that can affect this, I think it may
> be important to know that most of the connections will be almost idle (in
> terms of interacting with the DB).  The "users" are perl/dbi scripts which
> connect to the DB and spend the vast majority of the time doing things
> other than interacting with the DB.  So a connection is consumed, but it's
> not really working very hard with the DB per-se.  I am cleaning up some of
> that code by strategically connecting/disconnecting only when a DB
> interaction is required.  But for my edification, is it roughly true that 2
> connections working with the DB 100% of the time is equivalent to 20
> connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?

Based on that additional info, I would definitely follow Laurenz's
suggestion. Long time since I used Perl DBI, but I'm pretty sure there
is is support for connection pools or you can use one of the PG
connection pooling solutions.

There is a fixed memory allocation per connection, so 2 connections at
100% is not the same as 20 connections @ 10%.

Using a connection pool is usually the first thing I will setup. If
additional connections are still required, then I would increase the
limit in small jumps - definitely would not go from 100 to 500.

BTW running PG on a virtual is not an issue in itself - this is very
common these days. However, I would ensure you are up-to-date wrt latest
minor release for that version and would use clients with the same
version as the master. 

-- 
Tim Cross



Re: Estimated resources for a 500 connections instance (VM)

From
Rob Sargent
Date:


On 4/7/20 2:23 PM, Sándor Daku wrote:
On Tue, 7 Apr 2020 at 21:52, David Gauthier <davegauthierpg@gmail.com> wrote:
After looking at some of the factors that can affect this, I think it may be important to know that most of the connections will be almost idle (in terms of interacting with the DB).  The "users" are perl/dbi scripts which connect to the DB and spend the vast majority of the time doing things other than interacting with the DB.  So a connection is consumed, but it's not really working very hard with the DB per-se.  I am cleaning up some of that code by strategically connecting/disconnecting only when a DB interaction is required.  But for my edification, is it roughly true that 2 connections working with the DB 100% of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?  

Hi,

Every open connection consumes a bit of resources witch is not a big deal if you keeping open a few more connections than you strictly needed. However when you keeping a few hundred idle connections those resources add up quickly. So don't do that if it's possible.
Likewise, establishing a new connection is resource costly process. So don't do that either if it's possible.
Long story short, if those connections don't use many different users then(as others already suggested) connection pooling will be the best solution.

Regards,
Sándor   
And from my experience pg_bouncer if very easy to include in your stack.  (If not tried pg_pool.)

Re: Estimated resources for a 500 connections instance (VM)

From
Adrian Klaver
Date:
On 4/7/20 12:51 PM, David Gauthier wrote:
> After looking at some of the factors that can affect this, I think it 
> may be important to know that most of the connections will be almost 
> idle (in terms of interacting with the DB).  The "users" are perl/dbi 
> scripts which connect to the DB and spend the vast majority of the time 
> doing things other than interacting with the DB.  So a connection is 
> consumed, but it's not really working very hard with the DB per-se.  I 
> am cleaning up some of that code by strategically 
> connecting/disconnecting only when a DB interaction is required.  But 
> for my edification, is it roughly true that 2 connections working with 
> the DB 100% of the time is equivalent to 20 connections @ 10% = 200 
> connections @ 1 % (if you know what I mean) ?

Well to get a sense of the load you could use top, with in top:

1) Hitting u key and entering postgres as user

2) Hitting c key to get full command line

That should result in something like(though formatted better):

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
COMMAND 

   978 postgres  20   0   72072   7688   6548 S 0.000 0.096   0:00.04 
/usr/lib/systemd/systemd --user 

   981 postgres  20   0  117124   2704     52 S 0.000 0.034   0:00.00 
(sd-pam) 

  1201 postgres  20   0  184292  20396  19520 S 0.000 0.254   0:01.19 
/usr/local/pgsql12/bin/postmaster -D /usr/local/pgsql12/data 

  1255 postgres  20   0   38364   4192   3332 S 0.000 0.052   0:00.00 
postgres: logger 

  1263 postgres  20   0  184408   5916   5016 S 0.000 0.074   0:00.00 
postgres: checkpointer 

  1264 postgres  20   0  184424   5520   4640 S 0.000 0.069   0:00.18 
postgres: background writer 

  1265 postgres  20   0  184292   9500   8620 S 0.000 0.118   0:00.18 
postgres: walwriter 

  1266 postgres  20   0  185116   7756   6520 S 0.000 0.096   0:01.32 
postgres: autovacuum launcher 

  1267 postgres  20   0   39488   5316   3780 S 0.000 0.066   0:02.86 
postgres: stats collector 

  1268 postgres  20   0  184844   6064   4980 S 0.000 0.075   0:00.00 
postgres: logical replication launcher 

14478 postgres  20   0  185252   9612   8184 S 0.000 0.119   0:00.00 
postgres: postgres production [local] idle 

14507 postgres  20   0  185348  11380   9848 S 0.000 0.141   0:00.00 
postgres: aklaver task_manager ::1(45202) idle


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Estimated resources for a 500 connections instance (VM)

From
Laurenz Albe
Date:
On Tue, 2020-04-07 at 15:51 -0400, David Gauthier wrote:
> But for my edification, is it roughly true that 2 connections working with the DB 100%
> of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?

Roughly, yes.
There is a certain overhead, as each query has to take a snapshot of the
database, which has to consider all active connections.

But the main problem is that you have no way to ensure that all those
idle connections stay idle.

Imagine that most of these sessions issue short statements
against "table1".  There are also some longer SELECTs.
No problem so far.
Now along comes a statement that blocks the table for a very short
time, like a CREATE INDEX CONCURRENTLY.
Unfortunately, that ACCESS EXCLUSIVE lock has to wait for one of
the longer SELECTs to finish.
Now all those moderately short statements cannot run any more, but
they have to queue behind the ACCESS EXCLUSIVE lock. For a
while, nothing happens.  Then, as soon as the CREATE INDEX CONCURRENTLY
has got its lock, done its work and finished, all hell breaks loose,
because all sessions start executing their statements at the same time.

I have seen databases servers go down because of such events, and
the problem is the high number of connections, even if they are idle
most of the time.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com