Thread: Connection pooling - Number of connections

Connection pooling - Number of connections

From
Guillaume Smet
Date:
Hi all,

Brett Wooldridge, the creator of HikariCP [1] - a high performance
Java connection pool - is contemplating the idea to change the way
pooling is done in HikariCP and have a fixed-size pool of connections
always open.

No maxPoolSize, no minIdle, no minPoolSize, juste a poolSize parameter
which sets the size of the pool. At application startup, all the
connections are opened and maintained by the pool throughout the life
of the application.

The basic idea is that if you decide that your application might need
100 connections at time, you set poolSize to 100 and HikariCP
maintains 100 connections open.

I recall very old posts on this list where people were talking about
code paths sensitive to the number of connections open (or even
max_connections) and that it wasn't such a good idea to keep
connections open if they were not really needed.

As a lot of scalability work has been done since this (very old) time,
I was wondering if it was still the rule of thumb or if the idea of
Brett to completely simplify the connection management is the way to
go.

It seems that at least another pool implementation is going this way
so I thought it might be a good idea to have the opinion of the
database side of things. This way, it will be easier to take a well
informed decision.

Thanks in advance for your comments/advices.

--
Guillaume

[1] https://github.com/brettwooldridge/HikariCP


Re: Connection pooling - Number of connections

From
David Johnston
Date:
Guillaume Smet wrote
> Brett Wooldridge, the creator of HikariCP [1] - a high performance
> Java connection pool - is contemplating the idea to change the way
> pooling is done in HikariCP and have a fixed-size pool of connections
> always open.
>
> No maxPoolSize, no minIdle, no minPoolSize, juste a poolSize parameter
> which sets the size of the pool. At application startup, all the
> connections are opened and maintained by the pool throughout the life
> of the application.
>
> The basic idea is that if you decide that your application might need
> 100 connections at time, you set poolSize to 100 and HikariCP
> maintains 100 connections open.
>
> I recall very old posts on this list where people were talking about
> code paths sensitive to the number of connections open (or even
> max_connections) and that it wasn't such a good idea to keep
> connections open if they were not really needed.
>
> As a lot of scalability work has been done since this (very old) time,
> I was wondering if it was still the rule of thumb or if the idea of
> Brett to completely simplify the connection management is the way to
> go.
>
> It seems that at least another pool implementation is going this way
> so I thought it might be a good idea to have the opinion of the
> database side of things. This way, it will be easier to take a well
> informed decision.

The developer, not the pool implementer, is going to ultimately decide which
trade-offs to incur.  Having a connection open, even if idle, consumes
resources and performance no matter how minimal.

Pool management does cost cycles as well so if one does not need pool
management then getting rid of it is probably worthwhile to them.  The
question is whether you want to only meet the need of this specific user or
whether you want to provide them with flexibility.

If existing pool management implementations are reasonably well implemented
and efficient then focusing effort on a potentially under-served use-case
definitely has merit.

Consider this train-of-thought:  no matter how large the pool size if you
are constantly keeping, say, 90% of the connections actively working then
having, on average, 10% of the connections sitting idle is probably not
going to be noticeable on the server and the reduction in overhead of
managing a pool is typically a net positive.  Now, I had no clue what
percentage is actually true, or under what conditions and pool sizes it may
vary, but that is a calculation that someone deciding on between managed and
un-managed pools would need to make.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Connection-pooling-Number-of-connections-tp5797025p5797030.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Connection pooling - Number of connections

From
Guillaume Smet
Date:
On Fri, Mar 21, 2014 at 4:49 PM, David Johnston <polobo@yahoo.com> wrote:
> Consider this train-of-thought:  no matter how large the pool size if you
> are constantly keeping, say, 90% of the connections actively working then
> having, on average, 10% of the connections sitting idle is probably not
> going to be noticeable on the server and the reduction in overhead of
> managing a pool is typically a net positive.  Now, I had no clue what
> percentage is actually true, or under what conditions and pool sizes it may
> vary, but that is a calculation that someone deciding on between managed and
> un-managed pools would need to make.

Sure.

The big question is if it is suited for general purpose or if having
100 connections open when 10 only are necessary at the time is causing
any unnecessary contention/spinlock issues/performance
overhead/whatever...

--
Guillaume


Re: Connection pooling - Number of connections

From
Tom Lane
Date:
Guillaume Smet <guillaume.smet@gmail.com> writes:
> On Fri, Mar 21, 2014 at 4:49 PM, David Johnston <polobo@yahoo.com> wrote:
>> Consider this train-of-thought:  no matter how large the pool size if you
>> are constantly keeping, say, 90% of the connections actively working then
>> having, on average, 10% of the connections sitting idle is probably not
>> going to be noticeable on the server and the reduction in overhead of
>> managing a pool is typically a net positive.  Now, I had no clue what
>> percentage is actually true, or under what conditions and pool sizes it may
>> vary, but that is a calculation that someone deciding on between managed and
>> un-managed pools would need to make.

> Sure.

> The big question is if it is suited for general purpose or if having
> 100 connections open when 10 only are necessary at the time is causing
> any unnecessary contention/spinlock issues/performance
> overhead/whatever...

It will cost you, in ProcArray scans for example.  But lots-of-idle-
connections is exactly what a pooler is supposed to prevent.  If you have
a server that can handle say 10 active queries, you should have a pool
size of 10, not 100.  (If you have a server that can actually handle
100 active queries, I'd like to have your IT budget.)

The proposed design sounds fairly reasonable to me, as long as users are
clear on how to set the pool size --- and in particular that bigger is
not better.  Clueless users could definitely shoot themselves in the
foot, though.

            regards, tom lane


Re: Connection pooling - Number of connections

From
Sethu Prasad
Date:
Reaching the maxPoolSize from the minPoolSize means creating the connections at the crucial moment where the client application is in the desperate need of completing an important query/transaction which the primary responsibility since it cannot hold the data collected.

So here the connection creation action is the costliest among all the other management tasks. so keeping the connections ready is the best option.

poolSize parameter is very good in the sense when the application owner know what is the optimal number to put, after having application performance analysed with the history of previous settings and the improvements made on it. server sizing always shows up in this sort of analysis.




On Fri, Mar 21, 2014 at 9:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Guillaume Smet <guillaume.smet@gmail.com> writes:
> On Fri, Mar 21, 2014 at 4:49 PM, David Johnston <polobo@yahoo.com> wrote:
>> Consider this train-of-thought:  no matter how large the pool size if you
>> are constantly keeping, say, 90% of the connections actively working then
>> having, on average, 10% of the connections sitting idle is probably not
>> going to be noticeable on the server and the reduction in overhead of
>> managing a pool is typically a net positive.  Now, I had no clue what
>> percentage is actually true, or under what conditions and pool sizes it may
>> vary, but that is a calculation that someone deciding on between managed and
>> un-managed pools would need to make.

> Sure.

> The big question is if it is suited for general purpose or if having
> 100 connections open when 10 only are necessary at the time is causing
> any unnecessary contention/spinlock issues/performance
> overhead/whatever...

It will cost you, in ProcArray scans for example.  But lots-of-idle-
connections is exactly what a pooler is supposed to prevent.  If you have
a server that can handle say 10 active queries, you should have a pool
size of 10, not 100.  (If you have a server that can actually handle
100 active queries, I'd like to have your IT budget.)

The proposed design sounds fairly reasonable to me, as long as users are
clear on how to set the pool size --- and in particular that bigger is
not better.  Clueless users could definitely shoot themselves in the
foot, though.

                        regards, tom lane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Connection pooling - Number of connections

From
Guillaume Smet
Date:
Hi Tom,

On Fri, Mar 21, 2014 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It will cost you, in ProcArray scans for example.  But lots-of-idle-
> connections is exactly what a pooler is supposed to prevent.  If you have
> a server that can handle say 10 active queries, you should have a pool
> size of 10, not 100.  (If you have a server that can actually handle
> 100 active queries, I'd like to have your IT budget.)
>
> The proposed design sounds fairly reasonable to me, as long as users are
> clear on how to set the pool size --- and in particular that bigger is
> not better.  Clueless users could definitely shoot themselves in the
> foot, though.

Yeah, well.

My understanding of what happened on the field is that people usually
set the pool size limit quite high because they don't want to
experience connection starvation even if there is a temporary slowdown
of their application/database.

Is the overhead of having 100 connections open noticeable or is it
better to not have them but not so bad to have them?

Thanks.

--
Guillaume


Re: Connection pooling - Number of connections

From
Guillaume Smet
Date:
Hi Sethu,

On Fri, Mar 21, 2014 at 6:51 PM, Sethu Prasad <sethuprasad.in@gmail.com> wrote:
> So here the connection creation action is the costliest among all the other
> management tasks. so keeping the connections ready is the best option.

That's why you often have a minIdle parameter which allows to create
idle connections in advance.

> poolSize parameter is very good in the sense when the application owner know
> what is the optimal number to put, after having application performance
> analysed with the history of previous settings and the improvements made on
> it. server sizing always shows up in this sort of analysis.

It supposes that you do this job. From my experience, most of the "not
so demanding" apps are put into production without this sort of
detailed analysis.

You do it for your critical high throughput applications, not for the others.

That said, interesting discussion. Not exactly what I expected.

--
Guillaume


Re: Connection pooling - Number of connections

From
David Johnston
Date:
Sethu Prasad wrote
> Reaching the maxPoolSize from the minPoolSize means creating the
> connections at the crucial moment where the client application is in the
> desperate need of completing an important query/transaction which the
> primary responsibility since it cannot hold the data collected.

One query is slowed down a little in the unusual situation where not enough
pooled connections are available.  To fix that you want to slow down the
entire server all of the time?  Really?  And even if this is sometimes the
best option your assertion is unqualified so do you really think this is
best for everyone, always?

I think it is good to give developers options but if your situation is 10 /
100 then a fixed 100 connection pool is probably not the best configuration.

The question I'd ask is if you are developing a new driver what problem and
use-case are you trying to accommodate?  For those in the general case a
resizing pool is probably the best bet.  It will usually stabilize at
typical volume so that an optimum number of connections are maintained while
still allowing for some expansion in times of excess demand.  A fixed size
pool would be something an experienced user would decide they need based
upon their usage patterns and need to eke out every last bit of performance
in the extremes situations while only trading a little bit of performance
when the connections are not maxed out.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Connection-pooling-Number-of-connections-tp5797025p5797061.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Connection pooling - Number of connections

From
Claudio Freire
Date:
On Fri, Mar 21, 2014 at 3:36 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Fri, Mar 21, 2014 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It will cost you, in ProcArray scans for example.  But lots-of-idle-
>> connections is exactly what a pooler is supposed to prevent.  If you have
>> a server that can handle say 10 active queries, you should have a pool
>> size of 10, not 100.  (If you have a server that can actually handle
>> 100 active queries, I'd like to have your IT budget.)
>>
>> The proposed design sounds fairly reasonable to me, as long as users are
>> clear on how to set the pool size --- and in particular that bigger is
>> not better.  Clueless users could definitely shoot themselves in the
>> foot, though.
>
> Yeah, well.
>
> My understanding of what happened on the field is that people usually
> set the pool size limit quite high because they don't want to
> experience connection starvation even if there is a temporary slowdown
> of their application/database.

My experience is that small transaction-mode connection pools used to
serve very quick queries can sometimes not fully use the hardware if
the connections aren't set in autocommit mode on the client side,
because the network roundtrips hold onto the server slot for a sizable
portion of the lifecycle.

So, my recommendation: use protocol-level autocommit for read-only
queries and cores+spindles workers - that will use your hardware
fully.

On Fri, Mar 21, 2014 at 3:41 PM, David Johnston <polobo@yahoo.com> wrote:
>> Reaching the maxPoolSize from the minPoolSize means creating the
>> connections at the crucial moment where the client application is in the
>> desperate need of completing an important query/transaction which the
>> primary responsibility since it cannot hold the data collected.
>
> One query is slowed down a little in the unusual situation where not enough
> pooled connections are available.  To fix that you want to slow down the
> entire server all of the time?  Really?  And even if this is sometimes the
> best option your assertion is unqualified so do you really think this is
> best for everyone, always?

I don't think a variable connection pool makes any sense if you cast
deadlocks aside for a moment.

The only reason for connection starvation for a properly sized pool,
is hardware overload. When you have hardware overload, you really
don't want to throw more load at it, you want to let it cool down.

The solution, if you have many heavy, OLAP-style queries that block
the rest, is to have two pools, and size both so that you don't
overload (or at least you overload controlledly) the server. Send the
OLAP queries to one pool, and the OLTP queries to the other, and you
guarantee a smooth flow, even if you cannot guarantee 100% hardware
utilization and maximum thoughput both.

Now, if we consider deadlocks, you might have connection starvation
because all of the connections are waiting for an operation that is
deadlocked in application code (the deadlock cannot occur at the DB
level if you use transaction mode, but you can have open transactions
waiting on a mutex that is waiting for a connection). I've experienced
application-level deadlocks like these, and the solution for me has
always been to have a reasonable timeout and connections for a reserve
pool - when connections are waiting for more than X seconds (that one
considers abnormal given the knowledge you have about load
characteristics), grow the pool to free up resources and try to
dislodge the deadlocked application threads.

Sometimes, regular overload triggers the reserve pool, so you cannot
be too generous on the number of connections you'll have in reserve.


Re: Connection pooling - Number of connections

From
Brett Wooldridge
Date:
Hi, Brett Wooldridge here, one of the principals of HikariCP.  I thought I'd
wade into the conversation pool a little myself if you guys don't mind.

Speaking to David's point...
>> Reaching the maxPoolSize from the minPoolSize means creating the
>> connections at the crucial moment where the client application is in the
>> desperate need of completing an important query/transaction which the
>> primary responsibility since it cannot hold the data collected.

This was one of the reasons I was proposing the fixed pool design.  In my
experience, even in pools that maintain a minimum number of idle
connections, responding to spike demands is problematic.  If you have a pool
with say 30 max. connections, and a 10 minimum idle connection goal, a
sudden spike demand for 20 connections means the pool can satisfy 10
instantly but then is left to [try to] establish 10 connections before the
application's connectionTimeout (read acquisition timeout from the pool) is
reached.  This in turn generates a spike demand on the database slowing down
not only the connection establishments themselves but also slowing down the
completion of transactions that might actually return connections to the
pool.

As I think Tom noted is a slidestack I read somewhere, there is a "knee" in
the performance curve beyond which additional connections cause a drop in
TPS.  While users think it is a good idea to have 10 idle connections but a
maxPoolSize of 100, the reality is, they can retire/reuse connections faster
with a much smaller maxPoolSize.  And I didn't see a pool of a few dozen
connections actually impacting performance much when half of them are idle
and half are executing transactions (ie. the idle ones don't impact the
overall performance much).

Finally, one of my contentions was, either your database server has
resources or it doesn't.  Either it has enough memory and processing power
for N connections or it doesn't.  If the pool is set below, near, or at that
capacity what is the purpose of releasing connections in that case?  Yes, it
frees up memory, but that memory is not really available for other use given
that at any instant the maximum capacity of the pool may be demanded.
Instead releasing resources only to try to reallocate them during a demand
peak seems counter-productive.

-Brett




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Connection-pooling-Number-of-connections-tp5797025p5797135.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.