Thread: Thousands of parallel connections
Is there any practical limit on the number of parallel connections that a PostgreSQL server can service? We're in the process of setting up a system that will require up to 10000 connections open in parallel. The query load is not the problem, but we're wondering about the number of connections. Does anyone have experience with these kinds of numbers?
Peter Eisentraut wrote: > Is there any practical limit on the number of parallel connections that a > PostgreSQL server can service? We're in the process of setting up a system > that will require up to 10000 connections open in parallel. The query load > is not the problem, but we're wondering about the number of connections. > Does anyone have experience with these kinds of numbers? No experience, but a little thinking and elementary school math tells me, that you'd need huge amount of RAM to support 10000 connections, since postgres is multi-process. Our typical postgres process eats 5-40 megs of memory, depending on activity. So even if it was just 5 megs, with 10k connections we are talking about 50G of RAM. If these connections are idle, it would be plain waste of resources. I suggest you look into some sort of connection pooling. -- Michal Taborsky http://www.taborsky.cz
Hi guys, Peter is definitely not a newby on this list, so i'm sure he already thought about some kind of pooling if applicable... but then I'm dead-curious what kind of application could possibly rule out connection pooling even if it means so many open connections ? Please give us some light Peter... Cheers, Csaba. On Mon, 2004-08-16 at 15:53, Michal Taborsky wrote: > Peter Eisentraut wrote: > > Is there any practical limit on the number of parallel connections that a > > PostgreSQL server can service? We're in the process of setting up a system > > that will require up to 10000 connections open in parallel. The query load > > is not the problem, but we're wondering about the number of connections. > > Does anyone have experience with these kinds of numbers? > > No experience, but a little thinking and elementary school math tells > me, that you'd need huge amount of RAM to support 10000 connections, > since postgres is multi-process. Our typical postgres process eats 5-40 > megs of memory, depending on activity. So even if it was just 5 megs, > with 10k connections we are talking about 50G of RAM. If these > connections are idle, it would be plain waste of resources. > > I suggest you look into some sort of connection pooling.
Am Montag, 16. August 2004 16:20 schrieb Csaba Nagy: > Peter is definitely not a newby on this list, so i'm sure he already > thought about some kind of pooling if applicable... but then I'm > dead-curious what kind of application could possibly rule out connection > pooling even if it means so many open connections ? Please give us some > light Peter... There is already a connection pool in front of the real server, but the connection pool doesn't help you if you have in fact 10000 concurrent requests, it only saves connection start effort. (You could make the connection pool server queue the requests, but that is not the point of this exercise.) I didn't quite consider the RAM question, but the machine is almost big enough that it wouldn't matter. I'm thinking more in terms of the practical limits of the internal structures or the (Linux 2.6) kernel.
Michal Taborsky <michal@taborsky.cz> writes: > Peter Eisentraut wrote: >> Is there any practical limit on the number of parallel connections that a >> PostgreSQL server can service? We're in the process of setting up a system >> that will require up to 10000 connections open in parallel. The query load >> is not the problem, but we're wondering about the number of connections. >> Does anyone have experience with these kinds of numbers? > No experience, but a little thinking and elementary school math tells > me, that you'd need huge amount of RAM to support 10000 connections, > since postgres is multi-process. Our typical postgres process eats 5-40 > megs of memory, depending on activity. So even if it was just 5 megs, > with 10k connections we are talking about 50G of RAM. If these > connections are idle, it would be plain waste of resources. 5-40 megs sounds high, unless you run very complex queries. I wonder whether you aren't counting Postgres shared memory in that "per process" figure. (Most implementations of "top" are not very good about distinguishing shared and private memory, FWIW.) But even estimating just a meg or two of private space apiece, the total is daunting. You'd also have to worry about overstressing the kernel --- allowing for 50 or so open files per process, which is not a lot, you need a half-million-entry open files table. > I suggest you look into some sort of connection pooling. Agreed. If you are not actually *running* 10000 queries at a time, it'd be better to try to pool the connections. regards, tom lane
Really, this seems like it would be a pretty strong case for a replicated database..... assuming not all 10000 clients will need to be doing modifications. Or if they do, that they could open up a seperate, temporary connection with the master db. On Aug 16, 2004, at 7:37 AM, Peter Eisentraut wrote: > Am Montag, 16. August 2004 16:20 schrieb Csaba Nagy: >> Peter is definitely not a newby on this list, so i'm sure he already >> thought about some kind of pooling if applicable... but then I'm >> dead-curious what kind of application could possibly rule out >> connection >> pooling even if it means so many open connections ? Please give us >> some >> light Peter... > > There is already a connection pool in front of the real server, but the > connection pool doesn't help you if you have in fact 10000 concurrent > requests, it only saves connection start effort. (You could make the > connection pool server queue the requests, but that is not the point > of this > exercise.) I didn't quite consider the RAM question, but the machine > is > almost big enough that it wouldn't matter. I'm thinking more in terms > of the > practical limits of the internal structures or the (Linux 2.6) kernel. > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
[snip] > requests, it only saves connection start effort. (You could make the > connection pool server queue the requests, but that is not the point of this > exercise.) I didn't quite consider the RAM question, but the machine is [snip] Well, I would disagree here. If the connections are not busy all the time, a queueing connection pool will help you drastically reduce the effective number of needed concurrent connections. We do this with good results, and the programming overhead is practically null, the queueing can be hidden in the pool itself (which of course must be well written, but that's a one time job). Just my 2c, Csaba.
Centuries ago, Nostradamus foresaw when peter_e@gmx.net (Peter Eisentraut) would write: > Is there any practical limit on the number of parallel connections that a > PostgreSQL server can service? We're in the process of setting up a system > that will require up to 10000 connections open in parallel. The query load > is not the problem, but we're wondering about the number of connections. > Does anyone have experience with these kinds of numbers? We commonly have a thousand connections open, on some servers, and while it works, we consider there to be something problematic about it. It tends to lead to using spinlocks a lot. You might want to look into pgpool: <http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html> Jan Wieck has tried it out with his version of the TPC-W benchmark, and found that it allowed cutting down on the _true_ number of connections, and was very helpful in improving performance under conditions where the application imagined it needed a lot of connections. -- (reverse (concatenate 'string "gro.gultn" "@" "enworbbc")) http://www.ntlug.org/~cbbrowne/spiritual.html "The last good thing written in C was Franz Schubert's Symphony number 9." -- Erwin Dieterich
Tom Lane wrote: > Michal Taborsky <michal@taborsky.cz> writes: > >>Peter Eisentraut wrote: >> >>>Is there any practical limit on the number of parallel connections that a >>>PostgreSQL server can service? We're in the process of setting up a system >>>that will require up to 10000 connections open in parallel. The query load >>>is not the problem, but we're wondering about the number of connections. >>>Does anyone have experience with these kinds of numbers? > > >>No experience, but a little thinking and elementary school math tells >>me, that you'd need huge amount of RAM to support 10000 connections, >>since postgres is multi-process. Our typical postgres process eats 5-40 >>megs of memory, depending on activity. So even if it was just 5 megs, >>with 10k connections we are talking about 50G of RAM. If these >>connections are idle, it would be plain waste of resources. > > > 5-40 megs sounds high, unless you run very complex queries. I wonder > whether you aren't counting Postgres shared memory in that "per process" > figure. (Most implementations of "top" are not very good about > distinguishing shared and private memory, FWIW.) > > But even estimating just a meg or two of private space apiece, the total > is daunting. I did last week an Ariadne+Postgresql valutation for the company where I work and I learned that with 250 MB you can open up to 80 concurrent query with 500 MB you can open up to 120 concurrent query from now on for each 250MB you can have ~40 connections more if you break these rules that machine trash... Peter for 10000 connections need then 61 GB that is quite amazing :-) Regards Gaetano Mendola
Gaetano Mendola <mendola@bigfoot.com> writes: > I did last week an Ariadne+Postgresql valutation for the company where I work > and I learned that > with 250 MB you can open up to 80 concurrent query > with 500 MB you can open up to 120 concurrent query > from now on for each 250MB you can have ~40 connections more That does not add up: the graph can't have a negative y-intercept. There should be a substantial cost to run the postmaster at all, and then an essentially fixed cost per connection --- assuming that all the connections are running similar queries, of course. You're telling us the first 40 connections require zero RAM. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: | Gaetano Mendola <mendola@bigfoot.com> writes: | |>I did last week an Ariadne+Postgresql valutation for the company where I work |>and I learned that |>with 250 MB you can open up to 80 concurrent query |>with 500 MB you can open up to 120 concurrent query |>from now on for each 250MB you can have ~40 connections more | | | That does not add up: the graph can't have a negative y-intercept. | There should be a substantial cost to run the postmaster at all, | and then an essentially fixed cost per connection --- assuming | that all the connections are running similar queries, of course. | You're telling us the first 40 connections require zero RAM. I was not speaking about a single process memory consumption I was speaking in general, and indeed I don't know why but seems the first concurrent queries are less expensive, I was able to "confirm" this rule till 2GB I don't know what there is after. BTW the machine is a single processor with HT enabled. Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBIUB17UpzwH2SGd4RAiF7AJ9SFrs+sjcHhNyT4BU9svvBHqmrRgCg7A0w es6qvgRJPiu7XzmJ/zup5gU= =6k1Q -----END PGP SIGNATURE-----
Chris Travers <chris@metatrontech.com> writes: > Tom Lane wrote: >> That does not add up: the graph can't have a negative y-intercept. >> There should be a substantial cost to run the postmaster at all, >> and then an essentially fixed cost per connection --- assuming >> that all the connections are running similar queries, of course. >> You're telling us the first 40 connections require zero RAM. > That is strange. Is it really linear or does the cost go up somewhat > after the first few? Well, if you have significant contention problems then the speed could be worse than linear --- but he was talking about memory usage. AFAICS, a backend doing a particular query should need X amount of RAM pretty much independently of how many others there are. The only data structure I can think of that would be impacted at all is QuerySnapshot, and at 4 bytes per sibling backend it's *way* down in the noise... regards, tom lane
Tom Lane wrote: >Chris Travers <chris@metatrontech.com> writes: > > >>Tom Lane wrote: >> >> >>>That does not add up: the graph can't have a negative y-intercept. >>>There should be a substantial cost to run the postmaster at all, >>>and then an essentially fixed cost per connection --- assuming >>>that all the connections are running similar queries, of course. >>>You're telling us the first 40 connections require zero RAM. >>> >>> > > > >>That is strange. Is it really linear or does the cost go up somewhat >>after the first few? >> >> > >Well, if you have significant contention problems then the speed could >be worse than linear --- but he was talking about memory usage. AFAICS, >a backend doing a particular query should need X amount of RAM pretty >much independently of how many others there are. The only data structure >I can think of that would be impacted at all is QuerySnapshot, and at >4 bytes per sibling backend it's *way* down in the noise... > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend > > > > Then my final question is whether part of the issue could be increased shared memory and other performance related settings in order to gracefully handle the larger number of connections. Best Wishes, Chris Travers Metatron Technology Consulting
Tom Lane wrote: >Gaetano Mendola <mendola@bigfoot.com> writes: > > >>I did last week an Ariadne+Postgresql valutation for the company where I work >>and I learned that >>with 250 MB you can open up to 80 concurrent query >>with 500 MB you can open up to 120 concurrent query >>from now on for each 250MB you can have ~40 connections more >> >> > >That does not add up: the graph can't have a negative y-intercept. >There should be a substantial cost to run the postmaster at all, >and then an essentially fixed cost per connection --- assuming >that all the connections are running similar queries, of course. >You're telling us the first 40 connections require zero RAM. > > > That is strange. Is it really linear or does the cost go up somewhat after the first few? > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > >