Thread: Thousands of parallel connections

Thousands of parallel connections

From
Peter Eisentraut
Date:
Is there any practical limit on the number of parallel connections that a
PostgreSQL server can service?  We're in the process of setting up a system
that will require up to 10000 connections open in parallel.  The query load
is not the problem, but we're wondering about the number of connections.
Does anyone have experience with these kinds of numbers?

Re: Thousands of parallel connections

From
Michal Taborsky
Date:
Peter Eisentraut wrote:
> Is there any practical limit on the number of parallel connections that a
> PostgreSQL server can service?  We're in the process of setting up a system
> that will require up to 10000 connections open in parallel.  The query load
> is not the problem, but we're wondering about the number of connections.
> Does anyone have experience with these kinds of numbers?

No experience, but a little thinking and elementary school math tells
me, that you'd need huge amount of RAM to support 10000 connections,
since postgres is multi-process. Our typical postgres process eats 5-40
megs of memory, depending on activity. So even if it was just 5 megs,
with 10k connections we are talking about 50G of RAM. If these
connections are idle, it would be plain waste of resources.

I suggest you look into some sort of connection pooling.

--
Michal Taborsky
http://www.taborsky.cz


Re: Thousands of parallel connections

From
Csaba Nagy
Date:
Hi guys,

Peter is definitely not a newby on this list, so i'm sure he already
thought about some kind of pooling if applicable... but then I'm
dead-curious what kind of application could possibly rule out connection
pooling even if it means so many open connections ? Please give us some
light Peter...

Cheers,
Csaba.

On Mon, 2004-08-16 at 15:53, Michal Taborsky wrote:
> Peter Eisentraut wrote:
> > Is there any practical limit on the number of parallel connections that a
> > PostgreSQL server can service?  We're in the process of setting up a system
> > that will require up to 10000 connections open in parallel.  The query load
> > is not the problem, but we're wondering about the number of connections.
> > Does anyone have experience with these kinds of numbers?
>
> No experience, but a little thinking and elementary school math tells
> me, that you'd need huge amount of RAM to support 10000 connections,
> since postgres is multi-process. Our typical postgres process eats 5-40
> megs of memory, depending on activity. So even if it was just 5 megs,
> with 10k connections we are talking about 50G of RAM. If these
> connections are idle, it would be plain waste of resources.
>
> I suggest you look into some sort of connection pooling.


Re: Thousands of parallel connections

From
Peter Eisentraut
Date:
Am Montag, 16. August 2004 16:20 schrieb Csaba Nagy:
> Peter is definitely not a newby on this list, so i'm sure he already
> thought about some kind of pooling if applicable... but then I'm
> dead-curious what kind of application could possibly rule out connection
> pooling even if it means so many open connections ? Please give us some
> light Peter...

There is already a connection pool in front of the real server, but the
connection pool doesn't help you if you have in fact 10000 concurrent
requests, it only saves connection start effort.  (You could make the
connection pool server queue the requests, but that is not the point of this
exercise.)  I didn't quite consider the RAM question, but the machine is
almost big enough that it wouldn't matter.  I'm thinking more in terms of the
practical limits of the internal structures or the (Linux 2.6) kernel.

Re: Thousands of parallel connections

From
Tom Lane
Date:
Michal Taborsky <michal@taborsky.cz> writes:
> Peter Eisentraut wrote:
>> Is there any practical limit on the number of parallel connections that a
>> PostgreSQL server can service?  We're in the process of setting up a system
>> that will require up to 10000 connections open in parallel.  The query load
>> is not the problem, but we're wondering about the number of connections.
>> Does anyone have experience with these kinds of numbers?

> No experience, but a little thinking and elementary school math tells
> me, that you'd need huge amount of RAM to support 10000 connections,
> since postgres is multi-process. Our typical postgres process eats 5-40
> megs of memory, depending on activity. So even if it was just 5 megs,
> with 10k connections we are talking about 50G of RAM. If these
> connections are idle, it would be plain waste of resources.

5-40 megs sounds high, unless you run very complex queries.  I wonder
whether you aren't counting Postgres shared memory in that "per process"
figure.  (Most implementations of "top" are not very good about
distinguishing shared and private memory, FWIW.)

But even estimating just a meg or two of private space apiece, the total
is daunting.

You'd also have to worry about overstressing the kernel --- allowing
for 50 or so open files per process, which is not a lot, you need a
half-million-entry open files table.

> I suggest you look into some sort of connection pooling.

Agreed.  If you are not actually *running* 10000 queries at a time,
it'd be better to try to pool the connections.

            regards, tom lane

Re: Thousands of parallel connections

From
Ben
Date:
Really, this seems like it would be a pretty strong case for a
replicated database..... assuming not all 10000 clients will need to be
doing modifications. Or if they do, that they could open up a seperate,
temporary connection with the master db.

On Aug 16, 2004, at 7:37 AM, Peter Eisentraut wrote:

> Am Montag, 16. August 2004 16:20 schrieb Csaba Nagy:
>> Peter is definitely not a newby on this list, so i'm sure he already
>> thought about some kind of pooling if applicable... but then I'm
>> dead-curious what kind of application could possibly rule out
>> connection
>> pooling even if it means so many open connections ? Please give us
>> some
>> light Peter...
>
> There is already a connection pool in front of the real server, but the
> connection pool doesn't help you if you have in fact 10000 concurrent
> requests, it only saves connection start effort.  (You could make the
> connection pool server queue the requests, but that is not the point
> of this
> exercise.)  I didn't quite consider the RAM question, but the machine
> is
> almost big enough that it wouldn't matter.  I'm thinking more in terms
> of the
> practical limits of the internal structures or the (Linux 2.6) kernel.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


Re: Thousands of parallel connections

From
Csaba Nagy
Date:
[snip]
> requests, it only saves connection start effort.  (You could make the
> connection pool server queue the requests, but that is not the point of this
> exercise.)  I didn't quite consider the RAM question, but the machine is
[snip]

Well, I would disagree here. If the connections are not busy all the
time, a queueing connection pool will help you drastically reduce the
effective number of needed concurrent connections. We do this with good
results, and the programming overhead is practically null, the queueing
can be hidden in the pool itself (which of course must be well written,
but that's a one time job).

Just my 2c,
Csaba.



Re: Thousands of parallel connections

From
Christopher Browne
Date:
Centuries ago, Nostradamus foresaw when peter_e@gmx.net (Peter Eisentraut) would write:
> Is there any practical limit on the number of parallel connections that a
> PostgreSQL server can service?  We're in the process of setting up a system
> that will require up to 10000 connections open in parallel.  The query load
> is not the problem, but we're wondering about the number of connections.
> Does anyone have experience with these kinds of numbers?

We commonly have a thousand connections open, on some servers, and
while it works, we consider there to be something problematic about
it.  It tends to lead to using spinlocks a lot.

You might want to look into pgpool:
<http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html>

Jan Wieck has tried it out with his version of the TPC-W benchmark,
and found that it allowed cutting down on the _true_ number of
connections, and was very helpful in improving performance under
conditions where the application imagined it needed a lot of
connections.
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/spiritual.html
"The last good thing written in C was Franz Schubert's Symphony number
9."  -- Erwin Dieterich

Re: Thousands of parallel connections

From
Gaetano Mendola
Date:
Tom Lane wrote:
> Michal Taborsky <michal@taborsky.cz> writes:
>
>>Peter Eisentraut wrote:
>>
>>>Is there any practical limit on the number of parallel connections that a
>>>PostgreSQL server can service?  We're in the process of setting up a system
>>>that will require up to 10000 connections open in parallel.  The query load
>>>is not the problem, but we're wondering about the number of connections.
>>>Does anyone have experience with these kinds of numbers?
>
>
>>No experience, but a little thinking and elementary school math tells
>>me, that you'd need huge amount of RAM to support 10000 connections,
>>since postgres is multi-process. Our typical postgres process eats 5-40
>>megs of memory, depending on activity. So even if it was just 5 megs,
>>with 10k connections we are talking about 50G of RAM. If these
>>connections are idle, it would be plain waste of resources.
>
>
> 5-40 megs sounds high, unless you run very complex queries.  I wonder
> whether you aren't counting Postgres shared memory in that "per process"
> figure.  (Most implementations of "top" are not very good about
> distinguishing shared and private memory, FWIW.)
>
> But even estimating just a meg or two of private space apiece, the total
> is daunting.

I did last week an Ariadne+Postgresql valutation for the company where I work
and I learned that
with 250 MB you can open up to  80 concurrent query
with 500 MB you can open up to 120 concurrent query
from now on for each 250MB you can have ~40 connections more

if you break these rules that machine trash...

Peter for 10000 connections need then 61 GB that is quite amazing :-)


Regards
Gaetano Mendola

Re: Thousands of parallel connections

From
Tom Lane
Date:
Gaetano Mendola <mendola@bigfoot.com> writes:
> I did last week an Ariadne+Postgresql valutation for the company where I work
> and I learned that
> with 250 MB you can open up to  80 concurrent query
> with 500 MB you can open up to 120 concurrent query
> from now on for each 250MB you can have ~40 connections more

That does not add up: the graph can't have a negative y-intercept.
There should be a substantial cost to run the postmaster at all,
and then an essentially fixed cost per connection --- assuming
that all the connections are running similar queries, of course.
You're telling us the first 40 connections require zero RAM.

            regards, tom lane

Re: Thousands of parallel connections

From
Gaetano Mendola
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:

| Gaetano Mendola <mendola@bigfoot.com> writes:
|
|>I did last week an Ariadne+Postgresql valutation for the company where I work
|>and I learned that
|>with 250 MB you can open up to  80 concurrent query
|>with 500 MB you can open up to 120 concurrent query
|>from now on for each 250MB you can have ~40 connections more
|
|
| That does not add up: the graph can't have a negative y-intercept.
| There should be a substantial cost to run the postmaster at all,
| and then an essentially fixed cost per connection --- assuming
| that all the connections are running similar queries, of course.
| You're telling us the first 40 connections require zero RAM.

I was not speaking about a single process memory consumption I was
speaking in general, and indeed I don't know why but seems the first
concurrent queries are less expensive, I was able to "confirm" this
rule till 2GB I don't know what there is after.

BTW the machine is a single processor with HT enabled.

Regards
Gaetano Mendola





-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBIUB17UpzwH2SGd4RAiF7AJ9SFrs+sjcHhNyT4BU9svvBHqmrRgCg7A0w
es6qvgRJPiu7XzmJ/zup5gU=
=6k1Q
-----END PGP SIGNATURE-----


Re: Thousands of parallel connections

From
Tom Lane
Date:
Chris Travers <chris@metatrontech.com> writes:
> Tom Lane wrote:
>> That does not add up: the graph can't have a negative y-intercept.
>> There should be a substantial cost to run the postmaster at all,
>> and then an essentially fixed cost per connection --- assuming
>> that all the connections are running similar queries, of course.
>> You're telling us the first 40 connections require zero RAM.

> That is strange.  Is it really linear or does the cost go up somewhat
> after the first few?

Well, if you have significant contention problems then the speed could
be worse than linear --- but he was talking about memory usage.  AFAICS,
a backend doing a particular query should need X amount of RAM pretty
much independently of how many others there are.  The only data structure
I can think of that would be impacted at all is QuerySnapshot, and at
4 bytes per sibling backend it's *way* down in the noise...

            regards, tom lane

Re: Thousands of parallel connections

From
Chris Travers
Date:
Tom Lane wrote:

>Chris Travers <chris@metatrontech.com> writes:
>
>
>>Tom Lane wrote:
>>
>>
>>>That does not add up: the graph can't have a negative y-intercept.
>>>There should be a substantial cost to run the postmaster at all,
>>>and then an essentially fixed cost per connection --- assuming
>>>that all the connections are running similar queries, of course.
>>>You're telling us the first 40 connections require zero RAM.
>>>
>>>
>
>
>
>>That is strange.  Is it really linear or does the cost go up somewhat
>>after the first few?
>>
>>
>
>Well, if you have significant contention problems then the speed could
>be worse than linear --- but he was talking about memory usage.  AFAICS,
>a backend doing a particular query should need X amount of RAM pretty
>much independently of how many others there are.  The only data structure
>I can think of that would be impacted at all is QuerySnapshot, and at
>4 bytes per sibling backend it's *way* down in the noise...
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>
>
>
Then my final question is whether part of the issue could be increased
shared memory and other performance related settings in order to
gracefully handle the larger number of connections.

Best Wishes,
Chris Travers
Metatron Technology Consulting

Re: Thousands of parallel connections

From
Chris Travers
Date:
Tom Lane wrote:

>Gaetano Mendola <mendola@bigfoot.com> writes:
>
>
>>I did last week an Ariadne+Postgresql valutation for the company where I work
>>and I learned that
>>with 250 MB you can open up to  80 concurrent query
>>with 500 MB you can open up to 120 concurrent query
>>from now on for each 250MB you can have ~40 connections more
>>
>>
>
>That does not add up: the graph can't have a negative y-intercept.
>There should be a substantial cost to run the postmaster at all,
>and then an essentially fixed cost per connection --- assuming
>that all the connections are running similar queries, of course.
>You're telling us the first 40 connections require zero RAM.
>
>
>
That is strange.  Is it really linear or does the cost go up somewhat
after the first few?

>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
>
>


Attachment