Thread: Why would I want to use connection pooling middleware?

Why would I want to use connection pooling middleware?

From
Kirk Strauser
Date:
I have a PostgreSQL 8.3.5 server with max_connections = 400.  At this
moment, I have 223 open connections, including 64 from a bunch of
webserver processes and about 100 from desktop machines running a
particular application.  The rest are from various scheduled processes
and other assorted things.  Now, I know there are projects like pgpool-
II that can serve to pool connections to the server.  Why would I want
to do that, though?

I understand why pooling within a process itself is a good thing.
However, say I have two users running the same program on different
desktop machines.  At present, those applications connect with the
same username/password that's tied to the program and not the actual
user.  It seems like if Abby and Barb end up sharing the same
connection from the pool, and Abby runs some giant report query, then
Barb would get held back while she waits for it to finish.  Is that
true?  Even if not, what would be the advantage in the two of them
sharing a connection?

I'm just trying to wrap my head around this.  Thanks!
--
Kirk Strauser





Re: Why would I want to use connection pooling middleware?

From
Tom Lane
Date:
Kirk Strauser <kirk@strauser.com> writes:
> I have a PostgreSQL 8.3.5 server with max_connections = 400.  At this
> moment, I have 223 open connections, including 64 from a bunch of
> webserver processes and about 100 from desktop machines running a
> particular application.  The rest are from various scheduled processes
> and other assorted things.  Now, I know there are projects like pgpool-
> II that can serve to pool connections to the server.  Why would I want
> to do that, though?

Idle backends eat resources that would be better spent elsewhere.
(I'm assuming that the majority of those 223 backends aren't actually
doing anything at any one instant.)  As an example, any system catalog
update has to be broadcast to all live backends, and they all have to
dutifully search their catalog caches to flush stale entries.  That costs
the same whether the backend is being put to use or has been sitting
idle for minutes.

There's no percentage in trying to pool connections from applications
that are constantly doing something; but webserver sessions tend to have
lots of user "think time" as well as similar DB environments, so often
they can be pooled profitably.

            regards, tom lane

Re: Why would I want to use connection pooling middleware?

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Kirk Strauser <kirk@strauser.com> writes:
> > I have a PostgreSQL 8.3.5 server with max_connections = 400.  At this
> > moment, I have 223 open connections, including 64 from a bunch of
> > webserver processes and about 100 from desktop machines running a
> > particular application.  The rest are from various scheduled processes
> > and other assorted things.  Now, I know there are projects like pgpool-
> > II that can serve to pool connections to the server.  Why would I want
> > to do that, though?
>
> Idle backends eat resources that would be better spent elsewhere.
> (I'm assuming that the majority of those 223 backends aren't actually
> doing anything at any one instant.)  As an example, any system catalog
> update has to be broadcast to all live backends, and they all have to
> dutifully search their catalog caches to flush stale entries.  That costs
> the same whether the backend is being put to use or has been sitting
> idle for minutes.

Also, memory wasted in per-process memory is memory not used for caches.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Why would I want to use connection pooling middleware?

From
"Christopher Browne"
Date:
On Thu, Jan 15, 2009 at 10:54 AM, Kirk Strauser <kirk@strauser.com> wrote:
> I have a PostgreSQL 8.3.5 server with max_connections = 400.  At this
> moment, I have 223 open connections, including 64 from a bunch of webserver
> processes and about 100 from desktop machines running a particular
> application.  The rest are from various scheduled processes and other
> assorted things.  Now, I know there are projects like pgpool-II that can
> serve to pool connections to the server.  Why would I want to do that,
> though?
>
> I understand why pooling within a process itself is a good thing.  However,
> say I have two users running the same program on different desktop machines.
>  At present, those applications connect with the same username/password
> that's tied to the program and not the actual user.  It seems like if Abby
> and Barb end up sharing the same connection from the pool, and Abby runs
> some giant report query, then Barb would get held back while she waits for
> it to finish.  Is that true?  Even if not, what would be the advantage in
> the two of them sharing a connection?

There tend to be three effects that provide benefits:

1.  Fewer connections tend to consume less resources on the DBMS server.

Each connection consumes some resources, memory, generates lock
entries, and such, and having fewer connections means that the
aggregate size of the postmaster processes is likely to be smaller.

2.  Pooling connections should mean that you can use and re-use
connections, which should reduce the amount of work done building up
and tearing down connections.

Each PostgreSQL connection is handled by a separate OS process; if the
connection pool is passing the same connection from user to user, your
system is doing less work spawning backend processes, doing
authentication, and otherwise getting from fork() to "ready to handle
queries."

3.  There is only so much *genuine* concurrency that you can actually
get out of your DB server, and there is only limited value to having
more backend processes than this "emergent quantity."

For instance, if you only have a single CPU and a single disk drive,
then your computer is only ever *truly* doing one thing at a time.
Trying to make such a server service 200 connections, each trying to
do work, means that this server will be doing a great deal of work
switching from process to process, doing the context switches.

That's an extreme point, of course, but it should be reasonably
intuitive to consider that...
- If you have 10 CPUs and a RAID array of 10 disk drives, then that
host can likely cope comfortably with doing ~10 things at once;
- Change those numbers to 20/20 and the intuition continues.

If some of your 200 connections are only intermittently used, then if
you had a connection pool with 20 "real" connections, then the 200
users would seldom notice delays due to sharing.  And the connection
pool usage would mean that the DB server would have way fewer
processes kicking around consuming memory.

You might well be better using the process for the extra 180 backends
for shared cache :-).
--
http://linuxfinances.info/info/linuxdistributions.html
Joe E. Lewis  - "There's only one thing money won't buy, and that is poverty."

Re: Why would I want to use connection pooling middleware?

From
Bill Moran
Date:
Kirk Strauser <kirk@strauser.com> wrote:

[snip]

> I understand why pooling within a process itself is a good thing.
> However, say I have two users running the same program on different
> desktop machines.  At present, those applications connect with the
> same username/password that's tied to the program and not the actual
> user.  It seems like if Abby and Barb end up sharing the same
> connection from the pool, and Abby runs some giant report query, then
> Barb would get held back while she waits for it to finish.  Is that
> true?

I don't believe that's true.  My understanding of pgpool is that it will
reuse an existing connection if it's free, or open a new one if required.

--
Bill Moran
http://www.potentialtech.com

Re: Why would I want to use connection pooling middleware?

From
Kirk Strauser
Date:
On Jan 15, 2009, at 10:08 AM, Tom Lane wrote:

>  As an example, any system catalog update has to be broadcast to all
> live backends, and they all have to dutifully search their catalog
> caches to flush stale entries.  That costs the same whether the
> backend is being put to use or has been sitting idle for minutes.

I didn't realize that.  I wasn't sure what types of overheads were
involved and didn't think about those sorts of things.

> There's no percentage in trying to pool connections from applications
> that are constantly doing something; but webserver sessions tend to
> have
> lots of user "think time" as well as similar DB environments, so often
> they can be pooled profitably.


That makes sense.  Along those lines, how do you actually enable
connection pooling in pgpool-II?  I've RTFM a few times but it doesn't
seem to have a flag for "enable_pooling".  Is "num_init_children"
effectively the same as a hypothetical "max_children?"  If I set it to
"1" and leave "max_pool" at "4", then clients queue up while one at a
time gets to connect.

Sorry, I know this isn't the pgpool-II mailing list.  :-)
--
Kirk Strauser





Re: Why would I want to use connection pooling middleware?

From
Guy Rouillier
Date:
Kirk Strauser wrote:

> I understand why pooling within a process itself is a good thing.
> However, say I have two users running the same program on different
> desktop machines.  At present, those applications connect with the same
> username/password that's tied to the program and not the actual user.
> It seems like if Abby and Barb end up sharing the same connection from
> the pool, and Abby runs some giant report query, then Barb would get
> held back while she waits for it to finish.  Is that true?  Even if not,
> what would be the advantage in the two of them sharing a connection?

Connections are pooled on the client end, not on the server end.  So,
you'd be able to pool connections on your web server, and should, for
reasons documented by others.  However, since Abby and Barb are using
different computers, you won't achieve anything by introducing pooling
into your desktop application.

--
Guy Rouillier

Re: Why would I want to use connection pooling middleware?

From
Sam Mason
Date:
On Thu, Jan 15, 2009 at 11:57:13AM -0500, Guy Rouillier wrote:
> Connections are pooled on the client end, not on the server end.  So,
> you'd be able to pool connections on your web server, and should, for
> reasons documented by others.  However, since Abby and Barb are using
> different computers, you won't achieve anything by introducing pooling
> into your desktop application.

Connection pooling can function anywhere; you could use it client side
(like the connection pooling built into, say, PHP) where what you say is
true.  You can also use something like pgpool on another box (maybe the
database server itself) and point all clients at it.

--
  Sam  http://samason.me.uk/

Re: Why would I want to use connection pooling middleware?

From
Kirk Strauser
Date:
On Jan 15, 2009, at 10:20 AM, Bill Moran wrote:

> I don't believe that's true.  My understanding of pgpool is that it
> will
> reuse an existing connection if it's free, or open a new one if
> required.


Gah!  It just made it worse!

$ ps auxwww | grep pgpool | grep dbuser | wc -l
       30
$ ps auxwww | grep postgres: | grep dbuser | wc -l
       38

So not only is it not sharing connections among clients, but it's
keeping old ones open too.  This isn't really what I had in mind.

Also, many of the applications are launched in the morning and open a
connection, then get used all day, then closed at the end of the
afternoon.  I'm starting to wonder if perhaps pgpool-II isn't what I
was looking for.
--
Kirk Strauser





Re: Why would I want to use connection pooling middleware?

From
Steve Crawford
Date:
Kirk Strauser wrote:
> ...
>
> I understand why pooling within a process itself is a good thing.
> However, say I have two users running the same program on different
> desktop machines.  At present, those applications connect with the
> same username/password that's tied to the program and not the actual
> user.  It seems like if Abby and Barb end up sharing the same
> connection from the pool, and Abby runs some giant report query, then
> Barb would get held back while she waits for it to finish.  Is that
> true?  Even if not, what would be the advantage in the two of them
> sharing a connection?

Short answer: performance.

First you need the correct mental model. The "pool" in pooling is a
pre-established pool of connections to the database. When a client
connects to the pooler, the pooler decides, based on its configuration,
which database connection will receive the statement sent to the pool by
the client. The pooler can also monitor the connection pool and decide
when to increase the pool size or release database connections.

Pgbouncer (part of Skype tools) has worked very well for me. It can be
configured to handle connections in a variety of ways. At one extreme, a
connection to the database is only used for the duration of a single
statement then that database connection is available and waiting for the
next statement. At the other extreme, a connection is assigned and tied
up for the entire duration that the client is connected to the pooler.

In any case, Barb and Abby will not be sharing a database connection
*simultaneously* - if they are using their desktop app which is
connected to the db through the pooler and they are both running queries
at the same time then each of them will have their own connection to the
database.

Whether or not you will benefit from pooling depends on the nature of
your application. If, for example, the application runs queries by
connecting, running query, disconnecting and those queries are
long-running and/or infrequent then pooling will probably be of little
if any benefit.

At the other extreme, say you have a very high volume of short
single-statement transactions (think web). You start seeing a
significant impact of the connection setup/teardown time. Even if the
app connects for each web-hit, connecting to a pooler like pgbouncer is
much faster than connecting directly to the database (I've done some
tests showing nearly a factor of 10 improvement in page delivery rates).
But if the pooler is set to statement-level pooling, you can have
hundreds of web processes each maintaining a persistent connection to
the pooler while the pooler only needs to maintain sufficient
connections for simultaneous statements - generally a tiny fraction of
the number of web processes.

The pooler typically has a number of other configuration parameters that
govern things like the maximum amount of time a server connection will
be used before it is dropped and reconnected as well as how long idle
database connections hang around before they are dropped.

Beware. You can not just drop a pooler in place without understanding
the applications that will connect through it. Setting GUC variables,for
example, is a red flag. You could easily create a situation where you
increase work-memory for a specific statement but that statement ends up
assigned to a different database backend or have a client connection be
unaware of GUC changes made by a previously connected client that was
assigned to that backend.

But if your application is designed to work well with pooling, it can
provide dramatic performance benefits.

Cheers,
Steve


Re: Why would I want to use connection pooling middleware?

From
Kirk Strauser
Date:
On Jan 15, 2009, at 12:30 PM, Steve Crawford wrote:

> But if your application is designed to work well with pooling, it
> can provide dramatic performance benefits.

I think that's the problem.  As I mentioned at one point, a lot of our
applications have connections open for hours at a time and fire off
queries when the user does something.  I'm coming to think that
pooling wouldn't give much benefit to long-living processes like that.

On a related note, is max_connections=400 reasonably sized for a
server with 8GB of RAM?  Again, most of these are dormant at any given
time.  The database itself is currently hosted on a  dual Xeon server
with 3GB of RAM and other applications so I'm sure the new 8-core/8GB
hardware is bound to do better at any rate.
--
Kirk Strauser





Re: Why would I want to use connection pooling middleware?

From
Bill Moran
Date:
In response to Kirk Strauser <kirk@strauser.com>:

> On Jan 15, 2009, at 12:30 PM, Steve Crawford wrote:
>
> > But if your application is designed to work well with pooling, it
> > can provide dramatic performance benefits.
>
> I think that's the problem.  As I mentioned at one point, a lot of our
> applications have connections open for hours at a time and fire off
> queries when the user does something.  I'm coming to think that
> pooling wouldn't give much benefit to long-living processes like that.

Actually, that's exactly the kind of app that should benefit from
pooling.

However, it pgpool can't pool connections if each connection has its
own username.  Not sure what exactly is causing it not to work for you,
but that was the first thing that came to mind.

> On a related note, is max_connections=400 reasonably sized for a
> server with 8GB of RAM?  Again, most of these are dormant at any given
> time.  The database itself is currently hosted on a  dual Xeon server
> with 3GB of RAM and other applications so I'm sure the new 8-core/8GB
> hardware is bound to do better at any rate.

It all depends.  We had servers with 300 - 400 connections that did
just fine.  We've looked at pgpool, but deemed it not worth the
added complexity.  Quite honestly, why attempt to solve problems that
don't even exist?

Are you having a problem?  If so, what is the problem?

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Why would I want to use connection pooling middleware?

From
Steve Crawford
Date:
Kirk Strauser wrote:
> On Jan 15, 2009, at 12:30 PM, Steve Crawford wrote:
>
>> But if your application is designed to work well with pooling, it can
>> provide dramatic performance benefits.
>
> I think that's the problem.  As I mentioned at one point, a lot of our
> applications have connections open for hours at a time and fire off
> queries when the user does something.  I'm coming to think that
> pooling wouldn't give much benefit to long-living processes like that.
>

If you know that the application does not change GUC variables then you
will probably benefit greatly by using pgbouncer. If all the queries are
single-statements then set pool_mode=statement. If you have
multiple-statement transactions then configure pgbouncer to use
pool_mode=transaction. Either way, your app won't tie up a back-end
connection when it is sitting idle. You will probably find that you can
handle your hundreds of clients with a pretty small pool of backend
connections. Pgbouncer will give you some nice statistics to help you
adjust the pool sizing and such.

> On a related note, is max_connections=400 reasonably sized for a
> server with 8GB of RAM?  Again, most of these are dormant at any given
> time.  The database itself is currently hosted on a  dual Xeon server
> with 3GB of RAM and other applications so I'm sure the new 8-core/8GB
> hardware is bound to do better at any rate.

Too little info (and others here can answer better anyway). But I think
you should test pooling and find out how many you really need before
jumping into tuning. I haven't tried Pgpool* but have found pgbouncer to
be easy-to-use, reliable and effective.

Cheers,
Steve


Re: Why would I want to use connection pooling middleware?

From
Kirk Strauser
Date:
On Jan 15, 2009, at 2:39 PM, Bill Moran wrote:

> However, it pgpool can't pool connections if each connection has its
> own username.  Not sure what exactly is causing it not to work for
> you,
> but that was the first thing that came to mind.

The usernames are per-app.  Zope connections with username "zope", for
example.  However, any given application might have 30 instances
running at any time.

> Are you having a problem?  If so, what is the problem?


Honestly?  That so many people are singing the praises of connection
pooling and I thought I'd better at least see what the excitingment is
about.
--
Kirk Strauser





Re: Why would I want to use connection pooling middleware?

From
"Scott Marlowe"
Date:
On Thu, Jan 15, 2009 at 1:26 PM, Kirk Strauser <kirk@strauser.com> wrote:
> On Jan 15, 2009, at 12:30 PM, Steve Crawford wrote:
>
>> But if your application is designed to work well with pooling, it can
>> provide dramatic performance benefits.
>
> I think that's the problem.  As I mentioned at one point, a lot of our
> applications have connections open for hours at a time and fire off queries
> when the user does something.  I'm coming to think that pooling wouldn't
> give much benefit to long-living processes like that.

As mentioned, this is exactly what pooling is good for.  Imagine 200
users each accessing the database once every hour.  If on average you
have two or three users actually hitting the database, then you could
handle that many users with only 10 or 20 database connections.  The
apps each have a persistent connection up to the pool service, which
then routes the active sessions through a connection pool so the db
only thinks it has 20 or fewer users.

> On a related note, is max_connections=400 reasonably sized for a server with
> 8GB of RAM?  Again, most of these are dormant at any given time.  The
> database itself is currently hosted on a  dual Xeon server with 3GB of RAM
> and other applications so I'm sure the new 8-core/8GB hardware is bound to
> do better at any rate.

That depends very much on how many of those are ever active, or could
become active.  When things get ugly is when everyone at the company
needs to log in at once.  Suddenly, the 390 connections out of the 400
that were idle go active, and the server grinds to a crawl under the
load.  A connection pool would simple hold the connection in a wait
state until one of the 20 or so live connections to the db became
available, preventing server meltdown.

Re: Why would I want to use connection pooling middleware?

From
Kirk Strauser
Date:
On Jan 15, 2009, at 2:54 PM, Steve Crawford wrote:

> If you know that the application does not change GUC variables then
> you will probably benefit greatly by using pgbouncer.

Thanks, Steve!  That's just the kind of pointer I can use.  I've been
using PostgreSQL for years but I've never really gone far into low-
level optimizations like this.
--
Kirk Strauser





Re: Why would I want to use connection pooling middleware?

From
Bill Moran
Date:
In response to Kirk Strauser <kirk@strauser.com>:

> On Jan 15, 2009, at 2:39 PM, Bill Moran wrote:
>
> > However, it pgpool can't pool connections if each connection has its
> > own username.  Not sure what exactly is causing it not to work for
> > you,
> > but that was the first thing that came to mind.
>
> The usernames are per-app.  Zope connections with username "zope", for
> example.  However, any given application might have 30 instances
> running at any time.

You might be hitting up against pgpool being pre-emptive on startup.
i.e., it's establishing a bunch of connections right off the bat
so they're available right away.  If your application actually uses
less connections than pgpool maintains, then it's not going to be
a benefit.

> > Are you having a problem?  If so, what is the problem?
>
> Honestly?  That so many people are singing the praises of connection
> pooling and I thought I'd better at least see what the excitingment is
> about.

Well, it's a good instinct to look into stuff like that.  Especially now
that you've discovered that it's not cut and dry.

Try exercising your application under load to see if pgpool helps.  If
it keeps extra connections open during idle time, that won't really
hurt much, but if it reduces server load under stress, that's worthwhile.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Followup: Here's why I want to use connection pooling middleware!

From
Kirk Strauser
Date:
On Thursday 15 January 2009 09:54:50 Kirk Strauser wrote:
> I have a PostgreSQL 8.3.5 server with max_connections = 400.  At this
> moment, I have 223 open connections, including 64 from a bunch of
> webserver processes and about 100 from desktop machines running a
> particular application.  The rest are from various scheduled processes
> and other assorted things.  Now, I know there are projects like pgpool-
> II that can serve to pool connections to the server.  Why would I want
> to do that, though?

After installing and configuring PgBouncer and then pointing all of our clients
at it, our average number of database connections has dropped from 250+ to 17.
Query times are also much better, and more RAM is going to caching than to
holding processes.  Count me as a new fan.
--
Kirk Strauser