Re: Connection pooling - Number of connections - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Connection pooling - Number of connections
Date
Msg-id CAMkU=1wdmeQDvVSgzaBYCo-no0Ox7TqizyrOdjg5Co0ZvHRHAw@mail.gmail.com
Whole thread Raw
In response to Connection pooling - Number of connections  (Brett Wooldridge <brett.wooldridge@gmail.com>)
List pgsql-performance
On Mon, Mar 24, 2014 at 6:27 AM, Brett Wooldridge <brett.wooldridge@gmail.com> wrote:
Hi, Brett Wooldridge here, one of the principals of HikariCP.  I thought I'd wade into the conversation pool a little myself if you guys don't mind. 

Speaking to David's point... 
>> Reaching the maxPoolSize from the minPoolSize means creating the 
>> connections at the crucial moment where the client application is in the 
>> desperate need of completing an important query/transaction which the 
>> primary responsibility since it cannot hold the data collected. 

This was one of the reasons I was proposing the fixed pool design.  In my experience, even in pools that maintain a minimum number of idle connections, responding to spike demands is problematic.  If you have a pool with say 30 max. connections, and a 10 minimum idle connection goal, a sudden spike demand for 20 connections means the pool can satisfy 10 instantly but then is left to [try to] establish 10 connections before the application's connectionTimeout (read acquisition timeout from the pool) is reached.  This in turn generates a spike demand on the database slowing down not only the connection establishments themselves but also slowing down the completion of transactions that might actually return connections to the pool. 


Do you have publishable test scripts to demonstrate this?  It would be nice for people to be able to try it out for themselves, on their own hardware, to see what it does.  I have seen some database products for which I would not doubt this effect is real, having seen prodigiously long connection set up times.  But I'm skeptical that that is a meaningful problem for PostgreSQL, at least not with the md5 authentication method.
 
As I think Tom noted is a slidestack I read somewhere, there is a "knee" in the performance curve beyond which additional connections cause a drop in TPS.  While users often think it is a good idea to have maxPoolSize of 100, the reality is they can retire/reuse connections faster with a much smaller pool.  I didn't see a pool of a 2 or 3 dozen connections actually impacting performance much when half of them are idle and half are executing transactions (ie. the idle ones don't impact the overall performance much). 

I think the knee applies mostly to active connections.  I've seen no indication of completely idle connections causing observable problems in recent releases, until the number of them gets absurd.

And the location of the knee for the number of active connections is going to depend greatly on the hardware and the work load.
 
Finally, one of my contentions was, either your database server has resources or it doesn't.  Either it has enough memory and processing power for N connections or it doesn't.  If the pool is set below, near, or at that capacity what is the purpose of releasing connections in that case?  Yes, it frees up memory, but that memory is not really available for other use given that at any instant the maximum capacity of the pool may be demanded. 
Instead releasing resources only to try to reallocate them during a demand peak seems counter-productive. 

I pretty much agree with you on that.  Most of the arguments I see for getting rid of idle connections really seem to be arguments for lowering the maximum number.

That said, I don't like it when people take away my options because they think I might not be able to set them correctly. Just say the default value for the min is the same as the max, and most people don't need to change that. If you don't want to implement a feature, that is one thing, but to take out a feature that already exists just because you think I'm not quite clever enough to use it is quite another thing.

Also, often people don't have a realistic work-load generator with which to test the maximum, or a budget to run such tests.  They will have no idea where to set it.  A plausible thing to do in that case is to just set it rather high and hope the actual usage never get high enough to test whether the chosen value was correct.  That would be more problematic if the pool size is static.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Dave Johansen
Date:
Subject: Partitions and prepared statements?
Next
From: Brett Wooldridge
Date:
Subject: Re: Connection pooling - Number of connections