Thread: [psycopg] Caching connection pool?

[psycopg] Caching connection pool?

From
Israel Brewster
Date:
I've been using the psycopg2.pool.ThreadedConnectionPool in my projects for years, and all that time was under the (false) impression that the logic of the pool went something like this:

• create a "pool" of connections, with an initial number of connections equal to the "minconn" argument
• When getconn is called, see if there is an available connection. If so, return it. If not, open a new connection and return that (up to "maxconn" total connections)
• When putconn is called, return the connection to the pool for re-use, but do not close it (unless the close argument is specified as True, documentation says default is False)
• On the next request to getconn, this connection is now available and so no new connection will be made
• After some time, old connections would be closed and purged from the pool to prevent large numbers of no-longer-needed connections from laying around.

I have since been corrected about this impression, and now know that the pool simply keeps minconn connections open for the lifetime of the object, and if additional connections are needed, it opens them (up to maxconn), then immediately closes them again upon being returned to the pool. So my assumptions went wrong basically from point 3 onward. This means that for optimal usage of a pool object, some tweaking (probably ongoing as use changes) of minconn is needed, and if usage varies widely, there will be periods of having to open/close many connections and/or having many connections sitting unused, depending on how minconn is tuned.

To resolve this, I went ahead and wrote up a AbstractConnectionPool subclass that implements the caching logic I laid out above (https://github.com/psycopg/psycopg2/pull/565). At the moment I would consider it to be a fairly basic implementation - pruning of expired connections occurs only on a call to putconn (as opposed to running periodically), and connections time out at a (user-specified) period of time from when they were created, rather than last used, for example. It's arguable that both those elements (and perhaps others) could be improved on, but in my initial testing it appears to work. What do you guys think? Is this worth my time, or am I re-creating the wheel or engaging in premature optimization or something? Is the logic as I laid it out above good, or would there be some improvements that could be suggested for it? One thing I was thinking that could be helpful would be that if it saw the pool of available connections getting low, it could open one or more additional connections "in the background" so they are ready when needed. But maybe that's going a bit overboard?

Thanks!

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------



Attachment

Re: [psycopg] Caching connection pool?

From
Daniele Varrazzo
Date:
On Thu, Jun 8, 2017 at 7:28 PM, Israel Brewster <israel@ravnalaska.net> wrote:

> To resolve this, I went ahead and wrote up a AbstractConnectionPool subclass
> that implements the caching logic I laid out above
> (https://github.com/psycopg/psycopg2/pull/565).

Thank you for your work, Israel

For the list: I've not been a heavy pool user for a long time, and the
previous logic always felt a bit weak (destroy immediately a
connection if > minconn?) So I welcome improvement on this object. I'd
be happy to know from other people what they expect as a reasonable
default behaviour for the connection pool.

> At the moment I would
> consider it to be a fairly basic implementation - pruning of expired
> connections occurs only on a call to putconn (as opposed to running
> periodically), and connections time out at a (user-specified) period of time
> from when they were created, rather than last used, for example.

For me it is desirable that the pool is "passive": a periodic run
would require a separate thread, which then someone would like to be
an eventlet green thread and some other a gevent green thread... In
any case it wouldn't behave in a deterministic way.

Probably using the date of last usage would be better than the date of
creation, otherwise you would just delete all the connections every 10
minutes, or something like that.

> It's
> arguable that both those elements (and perhaps others) could be improved on,
> but in my initial testing it appears to work. What do you guys think? Is
> this worth my time, or am I re-creating the wheel or engaging in premature
> optimization or something? Is the logic as I laid it out above good, or
> would there be some improvements that could be suggested for it? One thing I
> was thinking that could be helpful would be that if it saw the pool of
> available connections getting low, it could open one or more additional
> connections "in the background" so they are ready when needed. But maybe
> that's going a bit overboard?

I think the pool provided by psycopg should still be a "mechanism"
rather than a "policy". It would be great to improve on what we
currently have, but not the pool too "intelligent": a super-smart pool
would be probably tied and tweaked to the application using it, which
may decide to use e.g. background threads to maintain the pool at a
certain level, may periodically check the state of the connections,
etc. But the pool is a weird object, it's a bit on the fence between
being mechanism and being policy, so I welcome a discussion about what
should be reasonable to expect from this object out of the psycopg
box, what may instead belong to an extension module from PyPI, what is
too closely tied to an appication/environment to be generically
useful.

Cheers,

-- Daniele


Re: [psycopg] Caching connection pool?

From
Israel Brewster
Date:
On Jun 8, 2017, at 11:02 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:

On Thu, Jun 8, 2017 at 7:28 PM, Israel Brewster <israel@ravnalaska.net> wrote:

To resolve this, I went ahead and wrote up a AbstractConnectionPool subclass
that implements the caching logic I laid out above
(https://github.com/psycopg/psycopg2/pull/565).

Thank you for your work, Israel

For the list: I've not been a heavy pool user for a long time, and the
previous logic always felt a bit weak (destroy immediately a
connection if > minconn?) So I welcome improvement on this object. I'd
be happy to know from other people what they expect as a reasonable
default behaviour for the connection pool.

At the moment I would
consider it to be a fairly basic implementation - pruning of expired
connections occurs only on a call to putconn (as opposed to running
periodically), and connections time out at a (user-specified) period of time
from when they were created, rather than last used, for example.

For me it is desirable that the pool is "passive": a periodic run
would require a separate thread, which then someone would like to be
an eventlet green thread and some other a gevent green thread... In
any case it wouldn't behave in a deterministic way.

Good point. I was thinking of the situation where you might have a "rush" of connections needed, and then nothing, whereupon all the connections opened in the rush would stay open indefinitely, but that's probably a corner case that's not worth worrying about - if someone is using a pool, chances are connections are being returned to it regularly, and so old connections would get cleaned up regularly. If not, oh well.


Probably using the date of last usage would be better than the date of
creation, otherwise you would just delete all the connections every 10
minutes, or something like that.

Right. Incidentally, I have seen some things that do pretty much exactly that: I've seen some web servers, for example, that have a setting where a given process will only server up so many requests before it is killed and re-spawned. So similar thing, only time based. Agreed that date of last usage would probably be better here. Of course, that leads to the question of if there should be some logic, such as "least recently used", applied when getting a connection, but it probably doesn't really matter.


It's
arguable that both those elements (and perhaps others) could be improved on,
but in my initial testing it appears to work. What do you guys think? Is
this worth my time, or am I re-creating the wheel or engaging in premature
optimization or something? Is the logic as I laid it out above good, or
would there be some improvements that could be suggested for it? One thing I
was thinking that could be helpful would be that if it saw the pool of
available connections getting low, it could open one or more additional
connections "in the background" so they are ready when needed. But maybe
that's going a bit overboard?

I think the pool provided by psycopg should still be a "mechanism"
rather than a "policy". It would be great to improve on what we
currently have, but not the pool too "intelligent": a super-smart pool
would be probably tied and tweaked to the application using it, which
may decide to use e.g. background threads to maintain the pool at a
certain level, may periodically check the state of the connections,
etc. But the pool is a weird object, it's a bit on the fence between
being mechanism and being policy, so I welcome a discussion about what
should be reasonable to expect from this object out of the psycopg
box, what may instead belong to an extension module from PyPI, what is
too closely tied to an appication/environment to be generically
useful.

Exactly - I could probably come up with a bunch of "cool" stuff for the pool to do, like connecting to multiple servers or trying to dynamically predict the number of needed connections or something, but most of it probably wouldn't be appropriate for a "base" pool class. Discussion is good :)

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

Cheers,

-- Daniele