Re: Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17) - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)
Date
Msg-id 5045D4DE0200002500049E0A@gw.wicourts.gov
Whole thread Raw
In response to Re: Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Wiki link for max_connections? (Fwd: Re: [ADMIN] PostgreSQL oom_adj postmaster process to -17)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>>> Also, I am a bit doubtful about the advice on sizing the
>>> connection pool as applied to small servers:
>>> surely it's not sane to recommend that a single-processor system
>>> with one disk should have max_connections = 3.  At least, *I*
>>> don't think that's sane.
>>
>> I'm not sure it's wrong when combined with this: "Remember that
>> this "sweet spot" is for the number of connections that are
>> actively doing work.  ...  You should always make max_connections
>> a bit bigger than the number of connections you enable in your
>> connection pool. That way there are always a few slots available
>> for direct connections for system maintenance and monitoring." 
>> Where would you expect the knee to be for connections
>> concurrently actively doing work on a single-core, single-drive
>> system ?
> 
> I don't know.  But my experience with our customers is that people
> are often forced to set the size of the connection pool far larger
> than what that formula would suggest.  Many people are doing
> transaction-level pooling, and for those people, they've got to
> look at how many multi-statement transactions they've got and
> think about what the peak value for that quantity is.  It's still
> worth using pooling because it reduces the number of simultaneous
> connections, but it's not going to reduce it to the kind of values
> you're talking about.  Also, consider that transactions aren't all
> the same length.  Suppose 90% of your queries execute in 50ms, and
> 10% execute in 60s.  Even though it's less efficient, you've got
> to set the connection pool large enough that at least some of the
> 50 ms queries can continue to get processed even if the maximum
> number of 60s queries that you ever expect to see in parallel are
> already running.  This may seem like a theoretical problem but we
> have customers who use connection pools to get the number of
> simultaneous connections down to, say, 800.  I guarantee you that
> these people do not have 200 CPUs and 400 disks, but they're smart
> people and they find that smaller pool sizes don't work.
It is something which has to be considered, and I don't think it's
theoretical at all.  Here's how we deal with it.  We don't use a
plain FIFO queue for our transaction requests, but a prioritized
FIFO with 10 levels of priority (0 to 9).  The highest priority (9)
is reserved for "utility" requests -- where a running transaction
needs to spin off a related transaction to do some work for it.  For
the lowest level (0) we normally allocate only a single connection,
and it is used for very long-running reports which we want to queue
to run one-at-a-time.  As examples of how we categorize queries,
filling a large list in an interactive application will run at
priority 3, while translating a key which must cause a description
on the screen to display is run at priority 8.  Normal single-row
updates and deletes from an interactive application run at priority
5.
Each connection in the pool has a worker thread, and is assigned a
minimum priority that it will handle.  When all threads are busy and
transaction requests are queued, any thread completing a database
transaction pulls from the front of the highest priority queue with
a waiting request to run a transaction, looking only at priorities
which are not "beneath it".  If there are no waiting requests of
high enough priority, the thread waits for one to arrive.
We have found that the formula I presented, when combined with
transactional request queuing like I describe here gives us our best
performance.  I don't have the exact numbers in front of me at the
moment, but on a machine with 16 cores and a 40-drive array (but
heavily cached, so that the effective spindle count was lower than
that), servicing thousands of concurrent web users with hundreds of
tps, we improved performance significantly by dropping our
connection pool size from about 60 to about 30, in addition to the
separate pool of six which are handling logical replication from
about 80 sources.  That was a real-life production situation, but we
ran a series of benchmarks and found that in a pretty wide spectrum
of situations the formula I gave fits pretty neatly.
If someone is using 800 connections for, say, a 32 core machine with
a 200 drive array I would suspect that they would get a lot of
benefit from a smarter connection pool.
> Sure, we can say, well, the fine print tells you that 2*CPUs+disks
> is not REALLY the formula you should use, but it's just so far off
> what I see in the field that I have a hard time thinking it's
> really helping people to give them that as a starting point.
The point is that it *is* generally really close to the numbers we
have seen here in both benchmarks and production, and I have gotten
comments both on and off the lists from people who have told me that
they tried that formula against their benchmark results and found
that it fit well.  Now, this may be dependent on OS or hardware in
some way that I haven't recognized, and it seems likely to need
adjustment under 9.2, but so far on Linux with hard drives it has
been a useful guide for me.  It might be instructive to figure out
what the difference is in environments where much larger pools have
been found beneficial.
-Kevin



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade del/rmdir path fix
Next
From: Andrew Dunstan
Date:
Subject: Re: 9.2 pg_upgrade regression tests on WIndows