Thread: Connection pooling - Number of connections

Connection pooling - Number of connections

From
Brett Wooldridge
Date:
Hi, Brett Wooldridge here, one of the principals of HikariCP.  I thought I'd wade into the conversation pool a little myself if you guys don't mind. 

Speaking to David's point... 
>> Reaching the maxPoolSize from the minPoolSize means creating the 
>> connections at the crucial moment where the client application is in the 
>> desperate need of completing an important query/transaction which the 
>> primary responsibility since it cannot hold the data collected. 

This was one of the reasons I was proposing the fixed pool design.  In my experience, even in pools that maintain a minimum number of idle connections, responding to spike demands is problematic.  If you have a pool with say 30 max. connections, and a 10 minimum idle connection goal, a sudden spike demand for 20 connections means the pool can satisfy 10 instantly but then is left to [try to] establish 10 connections before the application's connectionTimeout (read acquisition timeout from the pool) is reached.  This in turn generates a spike demand on the database slowing down not only the connection establishments themselves but also slowing down the completion of transactions that might actually return connections to the pool. 

As I think Tom noted is a slidestack I read somewhere, there is a "knee" in the performance curve beyond which additional connections cause a drop in TPS.  While users often think it is a good idea to have maxPoolSize of 100, the reality is they can retire/reuse connections faster with a much smaller pool.  I didn't see a pool of a 2 or 3 dozen connections actually impacting performance much when half of them are idle and half are executing transactions (ie. the idle ones don't impact the overall performance much). 

Finally, one of my contentions was, either your database server has resources or it doesn't.  Either it has enough memory and processing power for N connections or it doesn't.  If the pool is set below, near, or at that capacity what is the purpose of releasing connections in that case?  Yes, it frees up memory, but that memory is not really available for other use given that at any instant the maximum capacity of the pool may be demanded. 
Instead releasing resources only to try to reallocate them during a demand peak seems counter-productive. 

I'd appreciate any shared thoughts on my presuppositions.

-Brett 

Re: Connection pooling - Number of connections

From
Gavin Flower
Date:
On 25/03/14 02:27, Brett Wooldridge wrote:
Hi, Brett Wooldridge here, one of the principals of HikariCP.  I thought I'd wade into the conversation pool a little myself if you guys don't mind. 

Speaking to David's point... 
>> Reaching the maxPoolSize from the minPoolSize means creating the 
>> connections at the crucial moment where the client application is in the 
>> desperate need of completing an important query/transaction which the 
>> primary responsibility since it cannot hold the data collected. 

This was one of the reasons I was proposing the fixed pool design.  In my experience, even in pools that maintain a minimum number of idle connections, responding to spike demands is problematic.  If you have a pool with say 30 max. connections, and a 10 minimum idle connection goal, a sudden spike demand for 20 connections means the pool can satisfy 10 instantly but then is left to [try to] establish 10 connections before the application's connectionTimeout (read acquisition timeout from the pool) is reached.  This in turn generates a spike demand on the database slowing down not only the connection establishments themselves but also slowing down the completion of transactions that might actually return connections to the pool. 

As I think Tom noted is a slidestack I read somewhere, there is a "knee" in the performance curve beyond which additional connections cause a drop in TPS.  While users often think it is a good idea to have maxPoolSize of 100, the reality is they can retire/reuse connections faster with a much smaller pool.  I didn't see a pool of a 2 or 3 dozen connections actually impacting performance much when half of them are idle and half are executing transactions (ie. the idle ones don't impact the overall performance much). 

Finally, one of my contentions was, either your database server has resources or it doesn't.  Either it has enough memory and processing power for N connections or it doesn't.  If the pool is set below, near, or at that capacity what is the purpose of releasing connections in that case?  Yes, it frees up memory, but that memory is not really available for other use given that at any instant the maximum capacity of the pool may be demanded. 
Instead releasing resources only to try to reallocate them during a demand peak seems counter-productive. 

I'd appreciate any shared thoughts on my presuppositions.

-Brett 

Surely no code changes are required, as one can simply set the min and max pool sizes to be the same?


Cheers,
Gavin

Re: Connection pooling - Number of connections

From
Brett Wooldridge
Date:
On Tue, Mar 25, 2014 at 5:24 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
Surely no code changes are required, as one can simply set the min and max pool sizes to be the same?
Cheers,
Gavin

To be sure it can be implemented that way, but its a question of design targets.  For example, if a pool is allowed to grow and shrink, the design might encompass a pool of threads that try to maintain the configured-minimum idle connections to respond to spike demands.  And there is the additional setting in the pool for users to [typically] misconfigure.

However, if the pool is fixed size, and attrition from the pool is only by idle timeout (typically measured in minutes), the design does not need to account for spike demand.  Likely connections that have dropped out can either be restored on-demand rather than something running constantly in the background trying to maintain and idle level.

One of the attributes of HikariCP is a minimalistic set of configuration options with sane defaults, and minimalistic code.  There are many competitor pools, offering dozens of settings ranging form banal to unsafe.  HikariCP initially even offered some of these options, but one-ny-one they're getting the ax.  More and more we're trying to look at what is the true core functionality that user's need -- eliminate what is unnecessary and easily misconfigured.

Thus a debate started over in our group, with some similar views as expressed here (on both sides).  Guillaume Smet was concerned about the impact of idle connections on active ones in PostgreSQL (in a fixed pool scenario) and wanted to ask some of the experts over here.

Re: Connection pooling - Number of connections

From
Gavin Flower
Date:
On 25/03/14 13:23, Brett Wooldridge wrote:
On Tue, Mar 25, 2014 at 5:24 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
Surely no code changes are required, as one can simply set the min and max pool sizes to be the same?
Cheers,
Gavin

To be sure it can be implemented that way, but its a question of design targets.  For example, if a pool is allowed to grow and shrink, the design might encompass a pool of threads that try to maintain the configured-minimum idle connections to respond to spike demands.  And there is the additional setting in the pool for users to [typically] misconfigure.

However, if the pool is fixed size, and attrition from the pool is only by idle timeout (typically measured in minutes), the design does not need to account for spike demand.  Likely connections that have dropped out can either be restored on-demand rather than something running constantly in the background trying to maintain and idle level.

One of the attributes of HikariCP is a minimalistic set of configuration options with sane defaults, and minimalistic code.  There are many competitor pools, offering dozens of settings ranging form banal to unsafe.  HikariCP initially even offered some of these options, but one-ny-one they're getting the ax.  More and more we're trying to look at what is the true core functionality that user's need -- eliminate what is unnecessary and easily misconfigured.

Thus a debate started over in our group, with some similar views as expressed here (on both sides).  Guillaume Smet was concerned about the impact of idle connections on active ones in PostgreSQL (in a fixed pool scenario) and wanted to ask some of the experts over here.

Would it be a valid option to switch in simpler code when min = max and both could be set to the same default?  This would allow more efficient code to be run for a fixed pool size and allow a sane default, while preserving the option to have a range, though obviously not as simple as only allowing a fixed pool size in terms of code complexity.

Re: Connection pooling - Number of connections

From
Brett Wooldridge
Date:
Sure.  It's all just code.  It's not particularly a question of efficiency, I'm sure it could be made equally efficient.  But "simpler" code-wise would be not having two implementations, or not having one that is designed to try to keep up with spike demands.  The question for this group was really around PostgreSQL performance regarding the impact of idle connections.  If there is a pool of 20 connections, and half of them are idle, what is the impact performance-wise of the idle connections on the active connections?


On Tue, Mar 25, 2014 at 10:42 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
On 25/03/14 13:23, Brett Wooldridge wrote:
On Tue, Mar 25, 2014 at 5:24 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
Surely no code changes are required, as one can simply set the min and max pool sizes to be the same?
Cheers,
Gavin

To be sure it can be implemented that way, but its a question of design targets.  For example, if a pool is allowed to grow and shrink, the design might encompass a pool of threads that try to maintain the configured-minimum idle connections to respond to spike demands.  And there is the additional setting in the pool for users to [typically] misconfigure.

However, if the pool is fixed size, and attrition from the pool is only by idle timeout (typically measured in minutes), the design does not need to account for spike demand.  Likely connections that have dropped out can either be restored on-demand rather than something running constantly in the background trying to maintain and idle level.

One of the attributes of HikariCP is a minimalistic set of configuration options with sane defaults, and minimalistic code.  There are many competitor pools, offering dozens of settings ranging form banal to unsafe.  HikariCP initially even offered some of these options, but one-ny-one they're getting the ax.  More and more we're trying to look at what is the true core functionality that user's need -- eliminate what is unnecessary and easily misconfigured.

Thus a debate started over in our group, with some similar views as expressed here (on both sides).  Guillaume Smet was concerned about the impact of idle connections on active ones in PostgreSQL (in a fixed pool scenario) and wanted to ask some of the experts over here.

Would it be a valid option to switch in simpler code when min = max and both could be set to the same default?  This would allow more efficient code to be run for a fixed pool size and allow a sane default, while preserving the option to have a range, though obviously not as simple as only allowing a fixed pool size in terms of code complexity.


Re: Connection pooling - Number of connections

From
Josh Berkus
Date:
On 03/24/2014 06:27 AM, Brett Wooldridge wrote:
> This was one of the reasons I was proposing the fixed pool design.  In my
> experience, even in pools that maintain a minimum number of idle
> connections, responding to spike demands is problematic.  If you have a
> pool with say 30 max. connections, and a 10 minimum idle connection goal, a
> sudden spike demand for 20 connections means the pool can satisfy 10
> instantly but then is left to [try to] establish 10 connections before the
> application's connectionTimeout (read acquisition timeout from the pool) is
> reached.  This in turn generates a spike demand on the database slowing
> down not only the connection establishments themselves but also slowing
> down the completion of transactions that might actually return connections
> to the pool.

So what makes sense really depends on what your actual connection
pattern is.  Idle connections aren't free; aside from PostgreSQL
lock-checking overhead, they hold on to any virtual memory allocated to
them when they were working.  In the aggregate, this can add up to quite
a bit of memory, which can then cause the OS to decide not to cache some
data you could really use.

Now, if your peak is 100 connections and your median is 50, this doesn't
signify.  But I know more than a few workloads where the peak is 1000
and the median is 25, and in that case you want to drop the idle
connections gradually.  The key is to keep enough of a buffer of ready
connections to deal with the next peak when it comes in.

That also means that even if the pool is a fixed size, you want to
rotate in and out the actual sessions, so that they don't hang onto
maximum virtual memory indefinitely.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Connection pooling - Number of connections

From
Jeff Janes
Date:
On Mon, Mar 24, 2014 at 6:27 AM, Brett Wooldridge <brett.wooldridge@gmail.com> wrote:
Hi, Brett Wooldridge here, one of the principals of HikariCP.  I thought I'd wade into the conversation pool a little myself if you guys don't mind. 

Speaking to David's point... 
>> Reaching the maxPoolSize from the minPoolSize means creating the 
>> connections at the crucial moment where the client application is in the 
>> desperate need of completing an important query/transaction which the 
>> primary responsibility since it cannot hold the data collected. 

This was one of the reasons I was proposing the fixed pool design.  In my experience, even in pools that maintain a minimum number of idle connections, responding to spike demands is problematic.  If you have a pool with say 30 max. connections, and a 10 minimum idle connection goal, a sudden spike demand for 20 connections means the pool can satisfy 10 instantly but then is left to [try to] establish 10 connections before the application's connectionTimeout (read acquisition timeout from the pool) is reached.  This in turn generates a spike demand on the database slowing down not only the connection establishments themselves but also slowing down the completion of transactions that might actually return connections to the pool. 


Do you have publishable test scripts to demonstrate this?  It would be nice for people to be able to try it out for themselves, on their own hardware, to see what it does.  I have seen some database products for which I would not doubt this effect is real, having seen prodigiously long connection set up times.  But I'm skeptical that that is a meaningful problem for PostgreSQL, at least not with the md5 authentication method.
 
As I think Tom noted is a slidestack I read somewhere, there is a "knee" in the performance curve beyond which additional connections cause a drop in TPS.  While users often think it is a good idea to have maxPoolSize of 100, the reality is they can retire/reuse connections faster with a much smaller pool.  I didn't see a pool of a 2 or 3 dozen connections actually impacting performance much when half of them are idle and half are executing transactions (ie. the idle ones don't impact the overall performance much). 

I think the knee applies mostly to active connections.  I've seen no indication of completely idle connections causing observable problems in recent releases, until the number of them gets absurd.

And the location of the knee for the number of active connections is going to depend greatly on the hardware and the work load.
 
Finally, one of my contentions was, either your database server has resources or it doesn't.  Either it has enough memory and processing power for N connections or it doesn't.  If the pool is set below, near, or at that capacity what is the purpose of releasing connections in that case?  Yes, it frees up memory, but that memory is not really available for other use given that at any instant the maximum capacity of the pool may be demanded. 
Instead releasing resources only to try to reallocate them during a demand peak seems counter-productive. 

I pretty much agree with you on that.  Most of the arguments I see for getting rid of idle connections really seem to be arguments for lowering the maximum number.

That said, I don't like it when people take away my options because they think I might not be able to set them correctly. Just say the default value for the min is the same as the max, and most people don't need to change that. If you don't want to implement a feature, that is one thing, but to take out a feature that already exists just because you think I'm not quite clever enough to use it is quite another thing.

Also, often people don't have a realistic work-load generator with which to test the maximum, or a budget to run such tests.  They will have no idea where to set it.  A plausible thing to do in that case is to just set it rather high and hope the actual usage never get high enough to test whether the chosen value was correct.  That would be more problematic if the pool size is static.

Cheers,

Jeff

Re: Connection pooling - Number of connections

From
Brett Wooldridge
Date:
Sent from my iPhone
> On Mar 27, 2014, at 9:35 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> On 03/24/2014 06:27 AM, Brett Wooldridge wrote:
>> This was one of the reasons I was proposing the fixed pool design.  In my
>> experience, even in pools that maintain a minimum number of idle
>> connections, responding to spike demands is problematic.  If you have a
>> pool with say 30 max. connections, and a 10 minimum idle connection goal, a
>> sudden spike demand for 20 connections means the pool can satisfy 10
>> instantly but then is left to [try to] establish 10 connections before the
>> application's connectionTimeout (read acquisition timeout from the pool) is
>> reached.  This in turn generates a spike demand on the database slowing
>> down not only the connection establishments themselves but also slowing
>> down the completion of transactions that might actually return connections
>> to the pool.
>
> Now, if your peak is 100 connections and your median is 50, this doesn't
> signify.  But I know more than a few workloads where the peak is 1000
> and the median is 25, and in that case you want to drop the idle
> connections gradually.

In the end we've gone with a maxPoolSize + minIdle model where the default is that they are equal (fixed pool).

Though I won't dispute that such workloads (1000 active connections) exist, in that someone created it, I would love to
heartheir justification. Unless they have >128 CPU cores and solid state storage they are basically spinning their
wheels.

> That also means that even if the pool is a fixed size, you want to
> rotate in and out the actual sessions, so that they don't hang onto
> maximum virtual memory indefinitely.

We do this, there is a maxLifeTime setting to rotate out connections.

-Brett