Thread: Need help in performance tuning.
Hi, I want to fine tune my postgresql to increase number of connects it can handle in a minutes time. Decrease the response time per request etc. The exact case will be to handle around 100 concurrent requests. Can any one please help me in this. Any hardware suggestions are also welcomed. Regards Harpreet
Harpreet singh Wadhwa <harpreetsingh.wadhwa@gmail.com> wrote: > I want to fine tune my postgresql to increase number of connects > it can handle in a minutes time. > Decrease the response time per request etc. > The exact case will be to handle around 100 concurrent requests. I have found that connection pooling is crucial. The "concurrent requests" phrase worries me a bit -- you should be focusing more on "concurrent connections" and perhaps "requests per second". With most hardware, you will get faster response time and better overall throughput by funneling 100 connections through a connection pool which limits the number of concurrent requests to just enough to keep all your hardware resources busy, queuing any requests beyond that for submission when a pending request completes. > Any hardware suggestions are also welcomed. If you don't have the hardware yet, you'd need to provide a bit more information to get advice on what hardware you need. -Kevin
On 9/07/2010 3:20 AM, Harpreet singh Wadhwa wrote: > Hi, > > > I want to fine tune my postgresql to increase number of connects it > can handle in a minutes time. > Decrease the response time per request etc. > The exact case will be to handle around 100 concurrent requests. If you're not using a connection pool, start using one. Do you really need 100 *active* working query threads at one time? Because if you do, you're going to need a scary-big disk subsystem and a lot of processors. Most people actually only need a few queries executing simultaneously, they just need lots of connections to the database open concurrently and/or lots of queries queued up for processing. For that purpose, a connection pool is ideal. You will get BETTER performance from postgresql with FEWER connections to the "real" database that're all doing active work. If you need lots and lots of connections you should use a connection pool to save the main database the overhead of managing that. -- Craig Ringer
On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > If you're not using a connection pool, start using one. > > Do you really need 100 *active* working query threads at one time? Because > if you do, you're going to need a scary-big disk subsystem and a lot of > processors. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool?
Samuel Gendler <sgendler@ideasculptor.com> writes: > On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer > <craig@postnewspapers.com.au> wrote: >> If you're not using a connection pool, start using one. > I see this issue and subsequent advice cross this list awfully > frequently. Is there in architectural reason why postgres itself > cannot pool incoming connections in order to eliminate the requirement > for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. regards, tom lane
On 09/07/10 12:42, Tom Lane wrote: > Samuel Gendler <sgendler@ideasculptor.com> writes: >> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer >> <craig@postnewspapers.com.au> wrote: >>> If you're not using a connection pool, start using one. > >> I see this issue and subsequent advice cross this list awfully >> frequently. Is there in architectural reason why postgres itself >> cannot pool incoming connections in order to eliminate the requirement >> for an external pool? > > Perhaps not, but there's no obvious benefit either. Since there's > More Than One Way To Do It, it seems more practical to keep that as a > separate problem that can be solved by a choice of add-on packages. Admittedly I'm relatively ignorant of the details, but I increasingly think PostgreSQL will need a different big architectural change in the coming years, as the typical performance characteristics of machines change: It'll need to separate "running queries" from "running processes", or start threading backends, so that one way or the other a single query can benefit from the capabilities of multiple CPUs. The same separation, or a move to async I/O, might be needed to get one query to concurrently read multiple partitions of a table, or otherwise get maximum benefit from high-capacity I/O subsystems when running just a few big, expensive queries. Otherwise I'm wondering if PostgreSQL will begin really suffering in performance on workloads where queries are big and expensive but there are relatively few of them running at a time. My point? *if* I'm not full of hot air and there's some truth to my blather above, any change like that might be accompanied by a move to separate query execution state from connection state, so that idle connections have a much lower resource cost. OK, that's my hand-waving for the day done. -- Craig Ringer
Thanx you all for the replies. I got a gist on where should I head towards like I should rely a bit on postgres for performance and rest on my tomcat and application. And will try connection pooling on postgres part. And if I come back for any query (related to this topic) then this time it will be more precise (with real time data of my testing). ;-) Regards haps On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 09/07/10 12:42, Tom Lane wrote: >> Samuel Gendler <sgendler@ideasculptor.com> writes: >>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer >>> <craig@postnewspapers.com.au> wrote: >>>> If you're not using a connection pool, start using one. >> >>> I see this issue and subsequent advice cross this list awfully >>> frequently. Is there in architectural reason why postgres itself >>> cannot pool incoming connections in order to eliminate the requirement >>> for an external pool? >> >> Perhaps not, but there's no obvious benefit either. Since there's >> More Than One Way To Do It, it seems more practical to keep that as a >> separate problem that can be solved by a choice of add-on packages. > > Admittedly I'm relatively ignorant of the details, but I increasingly > think PostgreSQL will need a different big architectural change in the > coming years, as the typical performance characteristics of machines > change: > > It'll need to separate "running queries" from "running processes", or > start threading backends, so that one way or the other a single query > can benefit from the capabilities of multiple CPUs. The same separation, > or a move to async I/O, might be needed to get one query to concurrently > read multiple partitions of a table, or otherwise get maximum benefit > from high-capacity I/O subsystems when running just a few big, expensive > queries. > > Otherwise I'm wondering if PostgreSQL will begin really suffering in > performance on workloads where queries are big and expensive but there > are relatively few of them running at a time. > > My point? *if* I'm not full of hot air and there's some truth to my > blather above, any change like that might be accompanied by a move to > separate query execution state from connection state, so that idle > connections have a much lower resource cost. > > OK, that's my hand-waving for the day done. > > -- > Craig Ringer > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
> Otherwise I'm wondering if PostgreSQL will begin really suffering in > performance on workloads where queries are big and expensive but there > are relatively few of them running at a time. Oh, I should note at this point that I'm *not* whining that "someone" should volunteer to do this, or that "the postgresql project" should just "make it happen". I'm fully aware that Pg is a volunteer project and that even if these speculations were in a vaguely reasonable direction, that doesn't mean anyone has the time/skills/knowledge/interest to undertake such major architectural change. I certainly know I have zero right to ask/expect anyone to - I'm very, very grateful to all those who already spend time helping out and enhancing Pg. With particular props to Tom Lane for patience on the -general list and heroic bug-fixing persistence. Sorry for the rely-to-self, I just realized my post could've been taken as a whine about Pg's architecture and some kind of demand that someone do something about it. That couldn't be further from my intent. -- Craig Ringer
On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote: > Samuel Gendler <sgendler@ideasculptor.com> writes: > > On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer > > <craig@postnewspapers.com.au> wrote: > >> If you're not using a connection pool, start using one. > > > I see this issue and subsequent advice cross this list awfully > > frequently. Is there in architectural reason why postgres itself > > cannot pool incoming connections in order to eliminate the requirement > > for an external pool? > > Perhaps not, but there's no obvious benefit either. Since there's > More Than One Way To Do It, it seems more practical to keep that as a > separate problem that can be solved by a choice of add-on packages. This sounds similar to the approach to taken with Replication for years before being moved into core. Just like replication, pooling has different approaches. I do think that in both cases, having a solution that works, easily, out of the "box" will meet the needs of most users. There is also the issue of perception/adoption here as well. One of my colleagues mentioned that at PG East that he repeatedly heard people talking (negatively) about the over reliance on add-on packages to deal with core DB functionality. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
Thanx you all for the replies. I got a gist on where should I head towards like I should rely a bit on postgres for performance and rest on my tomcat and application. And will try connection pooling on postgres part. And if I come back for any query (related to this topic) then this time it will be more precise (with real time data of my testing). ;-) Regards haps On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 09/07/10 12:42, Tom Lane wrote: >> Samuel Gendler <sgendler@ideasculptor.com> writes: >>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer >>> <craig@postnewspapers.com.au> wrote: >>>> If you're not using a connection pool, start using one. >> >>> I see this issue and subsequent advice cross this list awfully >>> frequently. Is there in architectural reason why postgres itself >>> cannot pool incoming connections in order to eliminate the requirement >>> for an external pool? >> >> Perhaps not, but there's no obvious benefit either. Since there's >> More Than One Way To Do It, it seems more practical to keep that as a >> separate problem that can be solved by a choice of add-on packages. > > Admittedly I'm relatively ignorant of the details, but I increasingly > think PostgreSQL will need a different big architectural change in the > coming years, as the typical performance characteristics of machines > change: > > It'll need to separate "running queries" from "running processes", or > start threading backends, so that one way or the other a single query > can benefit from the capabilities of multiple CPUs. The same separation, > or a move to async I/O, might be needed to get one query to concurrently > read multiple partitions of a table, or otherwise get maximum benefit > from high-capacity I/O subsystems when running just a few big, expensive > queries. > > Otherwise I'm wondering if PostgreSQL will begin really suffering in > performance on workloads where queries are big and expensive but there > are relatively few of them running at a time. > > My point? *if* I'm not full of hot air and there's some truth to my > blather above, any change like that might be accompanied by a move to > separate query execution state from connection state, so that idle > connections have a much lower resource cost. > > OK, that's my hand-waving for the day done. > > -- > Craig Ringer > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Brad Nicholson <bnichols@ca.afilias.info> wrote: > Just like replication, pooling has different approaches. I do > think that in both cases, having a solution that works, easily, > out of the "box" will meet the needs of most users. Any thoughts on the "minimalist" solution I suggested a couple weeks ago?: http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php So far, there has been no comment by anyone.... -Kevin
On Fri, 9 Jul 2010, Kevin Grittner wrote: > Any thoughts on the "minimalist" solution I suggested a couple weeks > ago?: > > http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php > http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php > > So far, there has been no comment by anyone.... Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? A proper connection pool provides the following advantages over this: 1. Pool can be on a separate machine or machines, spreading load. 2. Pool has a lightweight footprint per connection, whereas Postgres doesn't. 3. A large amount of the overhead is sometimes connection setup, which this would not solve. A pool has cheap setup. 4. This could cause Postgres backends to be holding onto large amounts of memory while being prevented from doing anything, which is a bad use of resources. 5. A fair amount of the overhead is caused by context-switching between backends. The more backends, the less useful any CPU caches. 6. There are some internal workings of Postgres that involve keeping all the backends informed about something going on. The more backends, the greater this overhead is. (This was pretty bad with the sinval queue overflowing a while back, but a bit better now. It still causes some overhead). 7. That lock would have a metric *($!-load of contention. Matthew -- Unfortunately, university regulations probably prohibit me from eating small children in front of the lecture class. -- Computer Science Lecturer
In case there's any doubt, the questions below aren't rhetorical. Matthew Wakeling <matthew@flymine.org> wrote: > Interesting idea. As far as I can see, you are suggesting solving > the too many connections problem by allowing lots of connections, > but only allowing a certain number to do anything at a time? Right. > A proper connection pool provides the following advantages over > this: > > 1. Pool can be on a separate machine or machines, spreading load. Sure, but how would you do that with a built-in implementation? > 2. Pool has a lightweight footprint per connection, whereas > Postgres doesn't. I haven't compared footprint of, say, a pgpool connection on the database server to that of an idle PostgreSQL connection. Do you have any numbers? > 3. A large amount of the overhead is sometimes connection setup, > which this would not solve. A pool has cheap setup. This would probably be most useful where the client held a connection for a long time, not for the "login for each database transaction" approach. I'm curious how often you think application software uses that approach. > 4. This could cause Postgres backends to be holding onto large > amounts of memory while being prevented from doing anything, > which is a bad use of resources. Isn't this point 2 again? If not, what are you getting at? Again, do you have numbers for the comparison, assuming the connection pooler is running on the database server? > 5. A fair amount of the overhead is caused by context-switching > between backends. The more backends, the less useful any CPU > caches. Would this be true while a backend was blocked? Would this not be true for a connection pool client-side connection? > 6. There are some internal workings of Postgres that involve > keeping all the backends informed about something going on. The > more backends, the greater this overhead is. (This was pretty > bad with the sinval queue overflowing a while back, but a bit > better now. It still causes some overhead). Hmmm... I hadn't thought about that. Again, any numbers (e.g., profile information) on this? > 7. That lock would have a metric *($!-load of contention. Here I doubt you. It would be held for such short periods that I suspect that collisions would be relatively infrequent compared to some of the other locks we use. As noted in the email, it may actually normally be an "increment and test" within an existing locked block. Also, assuming that any "built in" connection pool would run on the database server, why would you think the contention for this would be worse than for whatever is monitoring connection count in the pooler? -Kevin
If your app is running under Tomcat, connection pooling is extremely easy to set up from there: It has connection pooling mechanisms built in. Request your db connections using said mechanisms, instead of doing it manually, make a couple of changes to server.xml, and the problem goes away. Hundreds, if not thousands of concurrent users might end up running with less than 10 connections.
>>> Harpreet singh Wadhwa <harpreet@openbravo.com> 7/9/2010 3:55 AM >>>
Thanx you all for the replies.
I got a gist on where should I head towards
like I should rely a bit on postgres for performance and rest on my
tomcat and application.
And will try connection pooling on postgres part.
And if I come back for any query (related to this topic) then this
time it will be more precise (with real time data of my testing). ;-)
Regards
haps
On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 09/07/10 12:42, Tom Lane wrote:
>> Samuel Gendler <sgendler@ideasculptor.com> writes:
>>> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
>>> <craig@postnewspapers.com.au> wrote:
>>>> If you're not using a connection pool, start using one.
>>
>>> I see this issue and subsequent advice cross this list awfully
>>> frequently. Is there in architectural reason why postgres itself
>>> cannot pool incoming connections in order to eliminate the requirement
>>> for an external pool?
>>
>> Perhaps not, but there's no obvious benefit either. Since there's
>> More Than One Way To Do It, it seems more practical to keep that as a
>> separate problem that can be solved by a choice of add-on packages.
>
> Admittedly I'm relatively ignorant of the details, but I increasingly
> think PostgreSQL will need a different big architectural change in the
> coming years, as the typical performance characteristics of machines
> change:
>
> It'll need to separate "running queries" from "running processes", or
> start threading backends, so that one way or the other a single query
> can benefit from the capabilities of multiple CPUs. The same separation,
> or a move to async I/O, might be needed to get one query to concurrently
> read multiple partitions of a table, or otherwise get maximum benefit
> from high-capacity I/O subsystems when running just a few big, expensive
> queries.
>
> Otherwise I'm wondering if PostgreSQL will begin really suffering in
> performance on workloads where queries are big and expensive but there
> are relatively few of them running at a time.
>
> My point? *if* I'm not full of hot air and there's some truth to my
> blather above, any change like that might be accompanied by a move to
> separate query execution state from connection state, so that idle
> connections have a much lower resource cost.
>
> OK, that's my hand-waving for the day done.
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Fri, 9 Jul 2010, Kevin Grittner wrote: >> Interesting idea. As far as I can see, you are suggesting solving >> the too many connections problem by allowing lots of connections, >> but only allowing a certain number to do anything at a time? > > Right. I think in some situations, this arrangement would be an advantage. However, I do not think it will suit the majority of situations, and could reduce the performance when the user doesn't need the functionality, either because they have a pool already, or they don't have many connections. No, I don't have any numbers. >> 1. Pool can be on a separate machine or machines, spreading load. > > Sure, but how would you do that with a built-in implementation? That's my point exactly. If you have an external pool solution, you can put it somewhere else - maybe on multiple somewhere elses. >> 3. A large amount of the overhead is sometimes connection setup, >> which this would not solve. A pool has cheap setup. > > This would probably be most useful where the client held a > connection for a long time, not for the "login for each database > transaction" approach. I'm curious how often you think application > software uses that approach. What you say is true. I don't know how often that is, but it seems to be those times that people come crying to the mailing list. >> 4. This could cause Postgres backends to be holding onto large >> amounts of memory while being prevented from doing anything, >> which is a bad use of resources. > > Isn't this point 2 again? Kind of. Yes. Point 2 was simple overhead. This point was that the backend may have done a load of query-related allocation, and then been stopped. >> 7. That lock would have a metric *($!-load of contention. > > Here I doubt you. It would be held for such short periods that I > suspect that collisions would be relatively infrequent compared to > some of the other locks we use. As noted in the email, it may > actually normally be an "increment and test" within an existing > locked block. Fair enough. It may be much less of a problem than I had previously thought. Matthew -- Change is inevitable, except from vending machines.
Matthew Wakeling <matthew@flymine.org> wrote: > On Fri, 9 Jul 2010, Kevin Grittner wrote: >>> Interesting idea. As far as I can see, you are suggesting >>> solving the too many connections problem by allowing lots of >>> connections, but only allowing a certain number to do anything >>> at a time? >> >> Right. > > I think in some situations, this arrangement would be an > advantage. However, I do not think it will suit the majority of > situations, and could reduce the performance when the user doesn't > need the functionality, either because they have a pool already, > or they don't have many connections. Oh, totally agreed, except that I think we can have essentially nil impact if they don't exceed a configured limit. In my experience, pooling is more effective the closer you put it to the client. I suppose the strongest argument that could be made against building in some sort of pooling is that it doesn't encourage people to look for client-side solutions. However, we seem to get a lot of posts from people who don't do this, are not able to easily manage it, and who would benefit from even a simple solution like this. -Kevin
If anything was built in the database to handle such connections, I'd recommend a big, bold warning, recommending the use of client-side pooling if available. For something like, say, a web-server, pooling connections to the database provides a massive performance advantage regardless of how good the database is at handling way more active queries than the hardware can handle: The assignment of a connection to a thread tends to be at least an order of magnitude cheaper than establishing a new connection for each new thread, and destroying it when it dies. This is especially true if the client architecture relies in relatively short lived threads.
While there are a few cases where pooling is counter productive, this only happens in relatively few scenarios. This is why every java application server out there wil strongly recommend using its own facilities to connect to a database: The performance is almost always better, and it provides less headaches to the DBAs.
Now, if remote clients are accessing your database directly, setting up a pool inbetween might not be as straightforward or give you the same gains across the board, and that might be the only case where letting the db do its own pooling makes sense.
>>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> 7/9/2010 12:52 PM >>>
Matthew Wakeling <matthew@flymine.org> wrote:
> On Fri, 9 Jul 2010, Kevin Grittner wrote:
>>> Interesting idea. As far as I can see, you are suggesting
>>> solving the too many connections problem by allowing lots of
>>> connections, but only allowing a certain number to do anything
>>> at a time?
>>
>> Right.
>
> I think in some situations, this arrangement would be an
> advantage. However, I do not think it will suit the majority of
> situations, and could reduce the performance when the user doesn't
> need the functionality, either because they have a pool already,
> or they don't have many connections.
Oh, totally agreed, except that I think we can have essentially nil
impact if they don't exceed a configured limit. In my experience,
pooling is more effective the closer you put it to the client. I
suppose the strongest argument that could be made against building
in some sort of pooling is that it doesn't encourage people to look
for client-side solutions. However, we seem to get a lot of posts
from people who don't do this, are not able to easily manage it, and
who would benefit from even a simple solution like this.
-Kevin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
>>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> 7/9/2010 12:52 PM >>>
Matthew Wakeling <matthew@flymine.org> wrote:
> On Fri, 9 Jul 2010, Kevin Grittner wrote:
>>> Interesting idea. As far as I can see, you are suggesting
>>> solving the too many connections problem by allowing lots of
>>> connections, but only allowing a certain number to do anything
>>> at a time?
>>
>> Right.
>
> I think in some situations, this arrangement would be an
> advantage. However, I do not think it will suit the majority of
> situations, and could reduce the performance when the user doesn't
> need the functionality, either because they have a pool already,
> or they don't have many connections.
Oh, totally agreed, except that I think we can have essentially nil
impact if they don't exceed a configured limit. In my experience,
pooling is more effective the closer you put it to the client. I
suppose the strongest argument that could be made against building
in some sort of pooling is that it doesn't encourage people to look
for client-side solutions. However, we seem to get a lot of posts
from people who don't do this, are not able to easily manage it, and
who would benefit from even a simple solution like this.
-Kevin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
"Jorge Montero" <jorge_montero@homedecorators.com> wrote: > If anything was built in the database to handle such connections, > I'd recommend a big, bold warning, recommending the use of client- > side pooling if available. +1 -Kevin
On Fri, Jul 9, 2010 at 12:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Samuel Gendler <sgendler@ideasculptor.com> writes: >> On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer >> <craig@postnewspapers.com.au> wrote: >>> If you're not using a connection pool, start using one. > >> I see this issue and subsequent advice cross this list awfully >> frequently. Is there in architectural reason why postgres itself >> cannot pool incoming connections in order to eliminate the requirement >> for an external pool? > > Perhaps not, but there's no obvious benefit either. Since there's > More Than One Way To Do It, it seems more practical to keep that as a > separate problem that can be solved by a choice of add-on packages. I'm not buying it. A separate connection pooler increases overhead and management complexity, and, I believe, limits our ability to implement optimizations like parallel query execution. I'm glad there are good ones available, but the fact that they're absolutely necessary for good performance in some environments is not a feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Matthew Wakeling wrote: > If you have an external pool solution, you can put it somewhere else - > maybe on multiple somewhere elses. This is the key point to observe: if you're at the point where you have so many connections that you need a pool, the last place you want to put that is on the overloaded database server itself. Therefore, it must be an external piece of software to be effective, rather than being part of the server itself. Database servers are relatively expensive computing hardware due to size/quantity/quality of disks required. You can throw a pooler (or poolers) on any cheap 1U server. This is why a built-in pooler, while interesting, is not particularly functional for how people normally scale up real-world deployments. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> wrote: > if you're at the point where you have so many connections that you > need a pool, the last place you want to put that is on the > overloaded database server itself. Therefore, it must be an > external piece of software to be effective, rather than being part > of the server itself. It *is* the last place you want to put it, but putting it there can be much better than not putting it *anywhere*, which is what we've often seen. -Kevin
On 10/07/10 00:56, Brad Nicholson wrote: > On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote: > >> >> Perhaps not, but there's no obvious benefit either. Since there's >> More Than One Way To Do It, it seems more practical to keep that as a >> separate problem that can be solved by a choice of add-on packages. >> > This sounds similar to the approach to taken with Replication for years > before being moved into core. > > Just like replication, pooling has different approaches. I do think > that in both cases, having a solution that works, easily, out of the > "box" will meet the needs of most users. > > There is also the issue of perception/adoption here as well. One of my > colleagues mentioned that at PG East that he repeatedly heard people > talking (negatively) about the over reliance on add-on packages to deal > with core DB functionality. > > It would be interesting to know more about what they thought an 'over reliance' was and which packages they meant. While clearly in the case of replication something needed to be done to make it better and easier, it is not obvious that the situation with connection pools is analogous. For instance we make extensive use of PgBouncer, and it seems to do the job fine and is ridiculously easy to install and setup. So would having (something like) this in core be an improvement? Clearly if the 'in core' product is better then it is desirable... similarly if the packaged product is better... well let's have that then! I've certainly observed a 'fear of package installation' on the part of some folk, which is often a hangover from the 'Big IT shop' mentality where it requires blood signatures and child sacrifice to get anything new installed. regards Mark P.s Also note that Database Vendors like pooling integrated in the core of *their* product because it is another thing to charge a license for. Unfortunately this can also become an entrenched mentality of 'must be in core' on the part of consultants etc!
> It *is* the last place you want to put it, but putting it there can > be much better than not putting it *anywhere*, which is what we've > often seen. Well, what you proposed is an admission control mechanism, which is *different* from a connection pool, although the two overlap. A connection pool solves 4 problems when it's working: a) limiting the number of database server processes b) limiting the number of active concurrent queries c) reducing response times for allocating a new connection d) allowing management of connection routes to the database (redirection, failover, etc.) What you were proposing is only (b). While (b) alone is better than nothing, it only solves some kinds of problems. Database backend processes are *not* free, and in general when I see users with "too many connections" failures they are not because of too many concurrent queries, but rather because of too many idle connections (I've seen up to 1800 on a server). Simply adding (b) for crappy applications would make the problem worse, not better, because of the large number of pending queries which the developer would fail to deal with, or monitor. So while adding (b) to core alone would be very useful for some users, ironically it's generally for the more advanced users which are not the ones we're trying to help on this thread. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 10/07/2010 9:25 AM, Josh Berkus wrote: > >> It *is* the last place you want to put it, but putting it there can >> be much better than not putting it *anywhere*, which is what we've >> often seen. > > Well, what you proposed is an admission control mechanism, which is > *different* from a connection pool, although the two overlap. A > connection pool solves 4 problems when it's working: > > a) limiting the number of database server processes > b) limiting the number of active concurrent queries > c) reducing response times for allocating a new connection > d) allowing management of connection routes to the database > (redirection, failover, etc.) I agree with you: for most Pg users (a) is really, really important. As you know, in PostgreSQL each connection maintains not only general connection state (GUC settings, etc) and if in a transaction, transaction state, but also a query executor (full backend). That gets nasty not only in memory use, but in impact on active query performance, as all those query executors have to participate in global signalling for lock management etc. So an in-server pool that solved (b) but not (a) would IMO not be particularly useful for the majority of users. That said, I don't think it follows that (a) cannot be solved in-core. How much architectural change would be required to do it efficiently enough, though... -- Craig Ringer
On Fri, Jul 9, 2010 at 11:33 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 10/07/2010 9:25 AM, Josh Berkus wrote: >> >>> It *is* the last place you want to put it, but putting it there can >>> be much better than not putting it *anywhere*, which is what we've >>> often seen. >> >> Well, what you proposed is an admission control mechanism, which is >> *different* from a connection pool, although the two overlap. A >> connection pool solves 4 problems when it's working: >> >> a) limiting the number of database server processes >> b) limiting the number of active concurrent queries >> c) reducing response times for allocating a new connection >> d) allowing management of connection routes to the database >> (redirection, failover, etc.) > > I agree with you: for most Pg users (a) is really, really important. As you > know, in PostgreSQL each connection maintains not only general connection > state (GUC settings, etc) and if in a transaction, transaction state, but > also a query executor (full backend). That gets nasty not only in memory > use, but in impact on active query performance, as all those query executors > have to participate in global signalling for lock management etc. > > So an in-server pool that solved (b) but not (a) would IMO not be > particularly useful for the majority of users. > > That said, I don't think it follows that (a) cannot be solved in-core. How > much architectural change would be required to do it efficiently enough, > though... Right, let's not confuse Kevin's argument that we should have connection pooling in core with advocacy for any particular patch or feature suggestion that he may have offered on some other thread. A very simple in-core connection pooler might look something like this: when a session terminates, the backend doesn't exit. Instead, it waits for the postmaster to reassign it to a new connection, which the postmaster does in preference to starting new backends when possible. But if a backend doesn't get assigned a new connection within a certain period of time, then it goes ahead and exits anyway. You might argue that this is not really a connection pooler at all because there's no admission control, but the point is you're avoiding the overhead of creating and destroying backends unnecessarily. Of course, I'm also relying on the unsubstantiated assumption that it's possible to pass a socket connection between processes. Another approach to the performance problem is to try to find ways of reducing the overhead associated with having a large number of backends in the system. That's not a connection pooler either, but it might reduce the need for one. Still another approach is admission control based on transactions, backends, queries, memory usage, I/O, or what have you. None of these things are mutually exclusive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Sent from my iPhone On Jul 9, 2010, at 18:25, Josh Berkus <josh@agliodbs.com> wrote: > > So while adding (b) to core alone would be very useful for some users, > ironically it's generally for the more advanced users which are not > the > ones we're trying to help on this thread. It would seem from evidence presented on this thread that the more appropriate conversation would maybe be with package maintainers, to perhaps get them to include a connection pool or provide a package that comes with a pool preconfigured and installed, along with improving existing documentation so that it encourages the use of a pool as a first class installation choice since it seems to be something of a first class problem for a lot of novice users. Just to give some background on my perspective - my prior familiarity with a connection pool was entirely on the client side, where I've been using them for years go keep resource consumption down on the client.. But it never occurred to me to consider one on the other end of those connections, despite the fact that I usually have a cluster of app hosts all talking to the same db. I assumed low connection count was desirable, but not mandatory, since surely the db server limited its own resource consumption, much the way a well written client app will. I basically assumed that the postgres devs used the same logic I did when I pooled my connections at the client side in order to minimize resource consumption there. I've got no truck with the reasons presented against doing so, since they make perfectly good sense to me. However, I suspect there are lots of engineers like myself - folks working without the benefit of a dedicated dba or a dba who is new to the postgres platform - who make naive assumptions that aren't immediately or obviously corrected by the docs (I may be sticking my foot in my mouth here. I haven't read the standard docs in a very long time). With this issue in particular, the fix is fairly trivial and brings other benefits as well. But it sucks to discover it only after you've started to get errors on a running app, no matter how easy the fix. So probably this is really only a bug in communication and can be fixed there. That's great. Easier to fix bugs are hard to find. I have yet to contribute to postgres development, so I guess, if no one objects, I'll see what I can do about improving the documentation of this issue, both in the official docs and just making sure it gets better billing in other sources of postgres documentation. But you'll have to bear with me, as I do have a more-than-full-time other job, and no experience with the pg developer community other than a couple of weeks on the mailing lists. But I do like to contribute to projects I use. It always winds up making me a more proficient user. (for the record, if I wasn't limited to my phone at the moment I would actually check the state of existing documentation before sending this, so if I'm talking out of my ass on the lack of documentation, please go easy on me. I mean no offense) --sam >
On 2010-07-10 00:59, Greg Smith wrote:
> Matthew Wakeling wrote:
>> If you have an external pool solution, you can put it somewhere
>> else - maybe on multiple somewhere elses.
>
> This is the key point to observe: if you're at the point where you
> have so many connections that you need a pool, the last place you
> want to put that is on the overloaded database server itself.
> Therefore, it must be an external piece of software to be effective,
> rather than being part of the server itself. Database servers are
> relatively expensive computing hardware due to size/quantity/quality
> of disks required. You can throw a pooler (or poolers) on any cheap
> 1U server. This is why a built-in pooler, while interesting, is not
> particularly functional for how people normally scale up real-world
> deployments.
That may be totally correct for the 10% of the userbase
that are in a squeezed situation, but for the 90% that isn't (or isn't aware
of being there), the build-in would be a direct benefit. For the 20%
living near the "edge" it may be the difference between "just working" and
extra hassle.
I think it is a fair assumption that the majority of PG's users solves
the problems without an connection pooler, and the question
is if it is beneficial to let them scale better without doing anything?
I have also provided a case where Kevin proposal "might" be a
benefit but a connection pooler cannot solve it:
http://archives.postgresql.org/pgsql-hackers/2010-06/msg01438.php
(at least as I see it, but I'm fully aware that there is stuff I dont know of)
I dont think a build-in connection-poller (or similiar) would in any
way limit the actions and abillities of an external one?
* Both numbers wildly guessed..
--
Jesper
> Matthew Wakeling wrote:
>> If you have an external pool solution, you can put it somewhere
>> else - maybe on multiple somewhere elses.
>
> This is the key point to observe: if you're at the point where you
> have so many connections that you need a pool, the last place you
> want to put that is on the overloaded database server itself.
> Therefore, it must be an external piece of software to be effective,
> rather than being part of the server itself. Database servers are
> relatively expensive computing hardware due to size/quantity/quality
> of disks required. You can throw a pooler (or poolers) on any cheap
> 1U server. This is why a built-in pooler, while interesting, is not
> particularly functional for how people normally scale up real-world
> deployments.
That may be totally correct for the 10% of the userbase
that are in a squeezed situation, but for the 90% that isn't (or isn't aware
of being there), the build-in would be a direct benefit. For the 20%
living near the "edge" it may be the difference between "just working" and
extra hassle.
I think it is a fair assumption that the majority of PG's users solves
the problems without an connection pooler, and the question
is if it is beneficial to let them scale better without doing anything?
I have also provided a case where Kevin proposal "might" be a
benefit but a connection pooler cannot solve it:
http://archives.postgresql.org/pgsql-hackers/2010-06/msg01438.php
(at least as I see it, but I'm fully aware that there is stuff I dont know of)
I dont think a build-in connection-poller (or similiar) would in any
way limit the actions and abillities of an external one?
* Both numbers wildly guessed..
--
Jesper
Jesper Krogh wrote: > I dont think a build-in connection-poller (or similiar) would in any > way limit the actions and abillities of an external one? Two problems to recognize. First is that building something in has the potential to significantly limit use and therefore advancement of work on external pools, because of the "let's use the built in one instead of installing something extra" mentality. I'd rather have a great external project (which is what we have with pgBouncer) than a mediocre built-in one that becomes the preferred way just by nature of being in the core. If work on a core pooler was started right now, it would be years before that reached feature/performance parity, and during that time its existence would be a net loss compared to the current status quo for many who used it. The second problem is the limited amount of resources to work on improvements to PostgreSQL. If you want to improve the reach of PostgreSQL, I consider projects like materialized views and easy built-in partitioning to be orders of magnitude more useful things to work on than the marginal benefit of merging the features of the external pool software inside the database. I consider the whole topic a bit of a distraction compared to instead working on *any* of the highly rated ideas at http://postgresql.uservoice.com/forums/21853-general As a random technical note, I would recommend that anyone who is thinking about a pooler in core take a look at how pgBouncer uses libevent to respond to requests, a design model inspired by that of memcached. I haven't looked at it deeply yet, but my gut guess is that this proven successful model would be hard to graft on top of the existing PostgreSQL process design, and doing anything but that is unlikely to perform as well. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
> Right, let's not confuse Kevin's argument that we should have > connection pooling in core with advocacy for any particular patch or > feature suggestion that he may have offered on some other thread. A > very simple in-core connection pooler might look something like this: > when a session terminates, the backend doesn't exit. Instead, it > waits for the postmaster to reassign it to a new connection, which the > postmaster does in preference to starting new backends when possible. > But if a backend doesn't get assigned a new connection within a > certain period of time, then it goes ahead and exits anyway. This would, in my opinion, be an excellent option for PostgreSQL and would save a LOT of newbie pain. Going back to my list, it would help with both problems (a) and (c). It wouldn't be as good as pgbouncer, but it would be "good enough" for a lot of users. HOWEVER, there is the issue that such a mechanism isn't "free". There are issue with sharing backends around GUCs, user configuration, security, and prepared plans -- all issues which presently cause people difficulty with pgbouncer. I think the costs are worth it, but we'd have to make some attempt to tackle those issues as well. And, of course, we'd need to let DBAs turn the pooling off. I'd envision parameters: pool_connections true/false pool_connection_timeout 60s > I'm also relying on the unsubstantiated assumption that it's > possible to pass a socket connection between processes. Doesn't pgpool do this? > Still another approach is admission control based on transactions, > backends, queries, memory usage, I/O, or what have you. That's a different problem, and on its own doesn't help newbies. It's complimetary to pooling, though, so would be nice to have. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> I'm also relying on the unsubstantiated assumption that it's >> possible to pass a socket connection between processes. > Doesn't pgpool do this? No, and in fact that's exactly why the proposed implementation isn't ever going to be in core: it's not possible to do it portably. (And no, I'm not interested in hearing about how you can do it on platforms X, Y, and/or Z.) I agree with the comments to the effect that this is really a packaging and documentation problem. There is no need for us to re-invent the existing solutions, but there is a need for making sure that they are readily available and people know when to use them. regards, tom lane
> Two problems to recognize. First is that building something in has the > potential to significantly limit use and therefore advancement of work > on external pools, because of the "let's use the built in one instead of > installing something extra" mentality. I'd rather have a great external > project (which is what we have with pgBouncer) than a mediocre built-in > one that becomes the preferred way just by nature of being in the core. I would prefer having supplier A build a great product that seamlessly interfaces with supplier B's great product, rather than having supplier M$ buy A, develop a half-working brain-dead version of B into A and market it as the new hot stuff, sinking B in the process. Anyway, orthogonal feature sets (like database and pooler) implemented in separate applications fit the open source development model quite well I think. Merge everything in, you get PHP.
On Sat, 10 Jul 2010, Tom Lane wrote: >> Doesn't pgpool do this? > > No, and in fact that's exactly why the proposed implementation isn't > ever going to be in core: it's not possible to do it portably. I'm surprised. Doesn't apache httpd do this? Does it have to do a whole load of non-portable stuff? It seems to work on a whole load of platforms. Matthew -- I would like to think that in this day and age people would know better than to open executables in an e-mail. I'd also like to be able to flap my arms and fly to the moon. -- Tim Mullen
On 12/07/10 17:45, Matthew Wakeling wrote: > > I'm surprised. Doesn't apache httpd do this? Does it have to do a whole > load of non-portable stuff? It seems to work on a whole load of platforms. A lot of what Apache HTTPd does is handled via the Apache Portable Runtime (APR). It contains a lot of per-platform handlers for various functionality. http://apr.apache.org/docs/apr/1.4/modules.html I don't know if the socket passing is provided as part of APR or is part of Apache HTTPd its self, but I wouldn't be at all surprised if it was in APR. Personally I'm now swayed by arguments presented here that trying to push pooling into core isn't really desirable, and that better packaging/bundling of existing solutions would be better. Perhaps documenting the pluses/minuses of the current pooling options and providing clear recommendations on which to use for different use cases would help, since half the trouble is users not knowing they need a pool or being confused as to which to select. This discussion reminds me a bit of Hibernate's built-in client-side connection pool. It has one, but it's a unloved stepchild that even the Hibernate devs suggest should be avoided in favour of a couple of external 3rd party options. A built-in pool seems like a great idea, but there are multiple existing ones because they solve different problems in different ways. Unless a built-in one could solve ALL those needs, or be so vastly simpler (due to code re-use, easier configuration, etc) that it's worth building one that won't fit everyone's needs, then it's best to stick to the existing external options. So rather than asking "should core have a connection pool" perhaps what's needed is to ask "what can an in-core pool do that an external pool cannot do?" Admission control / resource limit features would be great to have in core, and can't really be done fully in external modules ... but could be designed in ways that would allow external poolers to add functionality on top. Josh Berkus has made some good points on why this isn't as easy as it looks, though: http://it.toolbox.com/blogs/database-soup/admission-control-and-its-discontents-39895 -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> wrote: > So rather than asking "should core have a connection pool" perhaps > what's needed is to ask "what can an in-core pool do that an > external pool cannot do?" (1) It can prevent the most pessimal performance problems resulting from lack of an external connection pool (or a badly configured one) by setting a single GUC. Configuration tools could suggest a good value during initial setup. (2) It can be used without installing and configuring a more sophisticated and complex product. (3) It might reduce latency because it avoids having to receive, parse, and resend data in both directions -- eliminating one "hop". I know the performance benefit would usually accrue to the external connection pooler, but there might be some circumstances where a built-in pool could win. (4) It's one more checkbox which can be ticked off on some RFPs. That said, I fully agree that if we can include good documentation on the external poolers and we can get packagers to include poolers in their distribution, that gets us a much bigger benefit. A built-in solution would only be worthwhile if it was simple enough and lightweight enough not to be a burden on execution time or maintenance. Maybe that's too big an if. -Kevin
On Mon, 2010-07-12 at 18:58 +0800, Craig Ringer wrote: > On 12/07/10 17:45, Matthew Wakeling wrote: > > > > I'm surprised. Doesn't apache httpd do this? Does it have to do a whole > > load of non-portable stuff? It seems to work on a whole load of platforms. > > A lot of what Apache HTTPd does is handled via the Apache Portable > Runtime (APR). It contains a lot of per-platform handlers for various > functionality. > > http://apr.apache.org/docs/apr/1.4/modules.html > > I don't know if the socket passing is provided as part of APR or is part > of Apache HTTPd its self, but I wouldn't be at all surprised if it was > in APR. > > Personally I'm now swayed by arguments presented here that trying to > push pooling into core isn't really desirable, and that better > packaging/bundling of existing solutions would be better. "better packaging/bundling of existing solutions" is good in it's own right,weather there will eventually be some support for pooling in core or not. > Perhaps documenting the pluses/minuses of the current pooling options > and providing clear recommendations on which to use for different use > cases would help, since half the trouble is users not knowing they need > a pool or being confused as to which to select. > > This discussion reminds me a bit of Hibernate's built-in client-side > connection pool. It has one, but it's a unloved stepchild that even the > Hibernate devs suggest should be avoided in favour of a couple of > external 3rd party options. Yes, pooling _is_ often better handled as a (set of) separate options, just because of the reason that here one size does definitely not fit all; And efficient in-core pooler probably will look very much like pgbouncer running in a separate thread spawned by postmaster anyway. Let's hope there will be some support in core for having user defined helper processes soon(ish), so tweaking pgbouncer to run as one will be reasonably easy :) > A built-in pool seems like a great idea, but there are multiple existing > ones because they solve different problems in different ways. Unless a > built-in one could solve ALL those needs, or be so vastly simpler (due > to code re-use, easier configuration, etc) that it's worth building one > that won't fit everyone's needs, then it's best to stick to the existing > external options. > > So rather than asking "should core have a connection pool" perhaps > what's needed is to ask "what can an in-core pool do that an external > pool cannot do?" Probably nothing. OTOH there are some things that an external pool can do that a built-in one can't, like running on a separate host and pooling more than 32000 client connections there. Cascaded pooling seems also impossible with built-in pooling > Admission control / resource limit features would be great to have in > core, and can't really be done fully in external modules ... but could > be designed in ways that would allow external poolers to add > functionality on top. Josh Berkus has made some good points on why this > isn't as easy as it looks, though: > > > http://it.toolbox.com/blogs/database-soup/admission-control-and-its-discontents-39895 > > -- > Craig Ringer > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Craig Ringer wrote: > It'll need to separate "running queries" from "running processes", or > start threading backends, so that one way or the other a single query > can benefit from the capabilities of multiple CPUs. The same separation, > or a move to async I/O, might be needed to get one query to concurrently > read multiple partitions of a table, or otherwise get maximum benefit > from high-capacity I/O subsystems when running just a few big, expensive > queries. > > Otherwise I'm wondering if PostgreSQL will begin really suffering in > performance on workloads where queries are big and expensive but there > are relatively few of them running at a time. Agreed. We certainly are going to have to go in that direction someday. We have TODO items for these. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Tom Lane <tgl@sss.pgh.pa.us> writes: > I agree with the comments to the effect that this is really a packaging > and documentation problem. There is no need for us to re-invent the > existing solutions, but there is a need for making sure that they are > readily available and people know when to use them. On this topic, I think we're getting back to the idea of having non-core daemon helpers that should get "supervised" the way postmaster already does with backends wrt starting and stoping them at the right time. So a supervisor daemon with a supervisor API that would have to support autovacuum as a use case, then things like pgagent, PGQ and pgbouncer, would be very welcome. What about starting a new thread about that? Or you already know you won't want to push the extensibility of PostgreSQL there? Regards, -- dim
On Tue, Jul 13, 2010 at 16:42, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I agree with the comments to the effect that this is really a packaging >> and documentation problem. There is no need for us to re-invent the >> existing solutions, but there is a need for making sure that they are >> readily available and people know when to use them. > > On this topic, I think we're getting back to the idea of having non-core > daemon helpers that should get "supervised" the way postmaster already > does with backends wrt starting and stoping them at the right time. > > So a supervisor daemon with a supervisor API that would have to support > autovacuum as a use case, then things like pgagent, PGQ and pgbouncer, > would be very welcome. > > What about starting a new thread about that? Or you already know you > won't want to push the extensibility of PostgreSQL there? +1 on this idea in general, if we can think up a good API - this seems very useful to me, and you have some good examples there of cases where it'd definitely be a help. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote: > Samuel Gendler <sgendler@ideasculptor.com> writes: > > On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer > > <craig@postnewspapers.com.au> wrote: > >> If you're not using a connection pool, start using one. > > > I see this issue and subsequent advice cross this list awfully > > frequently. Is there in architectural reason why postgres itself > > cannot pool incoming connections in order to eliminate the requirement > > for an external pool? > > Perhaps not, but there's no obvious benefit either. Since there's > More Than One Way To Do It, it seems more practical to keep that as a > separate problem that can be solved by a choice of add-on packages. One example where you need a separate connection pool is pooling really large number of connections, which you may want to do on another host than the database itself is running. For example pgbouncer had to add option to use incoming unix sockets, because they run into the IP socket port number limit (a little above 31k, or more exactly 63k/2. And unix sockets can be used only on local host . -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Thu, Jul 8, 2010 at 11:48 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > One example where you need a separate connection pool is pooling really > large number of connections, which you may want to do on another host > than the database itself is running. Definitely. Often it's best placed on the individual webservers that are making requests, each running its own pool.
Scott Marlowe <scott.marlowe@gmail.com> wrote: > Hannu Krosing <hannu@2ndquadrant.com> wrote: >> One example where you need a separate connection pool is pooling >> really large number of connections, which you may want to do on >> another host than the database itself is running. > > Definitely. Often it's best placed on the individual webservers > that are making requests, each running its own pool. Each running its own pool? You've just made a case for an admissions policy based on active database transactions or active queries (or both) on the server having a benefit when used with this pooling arrangement. This collection of pools can't know when the CPUs have enough to keep them busy and adding more will degrade performance. -Kevin
On Wed, 2010-07-14 at 08:58 -0500, Kevin Grittner wrote: > Scott Marlowe <scott.marlowe@gmail.com> wrote: > > Hannu Krosing <hannu@2ndquadrant.com> wrote: > >> One example where you need a separate connection pool is pooling > >> really large number of connections, which you may want to do on > >> another host than the database itself is running. > > > > Definitely. Often it's best placed on the individual webservers > > that are making requests, each running its own pool. > > Each running its own pool? You've just made a case for an > admissions policy based on active database transactions or active > queries (or both) on the server having a benefit when used with this > pooling arrangement. This collection of pools can't know when the > CPUs have enough to keep them busy and adding more will degrade > performance. I guess this setup is for OLTP load (read "lots of short transactions with low timeout limits"), where you can just open 2-5 connections per CPU for mostly-in-memory database, maybe a little more when disk accesses are involved. If you have more, then they just wait a few milliseconds, if you have less, you don't have anything else to run anyway. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hi,
Sorry, if posting here was not proper instead of starting new thread
(I am really not sure if its bad thing to do)
I would like to share my recent experience on implementation of
client side pooling using pgbouncer. By client side i mean that
the the pgbouncer process in not on same machine as postgresql server.
In first trial pgbouncer and postgresql were in same machine & phbouncer
was connecting to postgresql using unix domain sockets. But i shifted it
laters owing to high CPU usage > 50%. ( using top)
Now i have shifted pgbouncer into a virtual machine (openvz container)
in the application server hardware and all my applications on other virtual machines
(web applications) connect to pgbouncer on this virtual machine.
I tested the setup with pgbench in two scenarios
1. connecting to DB server directly
2. connecting to DB via pgbouncer
the no of clients was 10 ( -c 10) carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling factor -s 100.
since client count was less there was no queuing of requests in pgbouncer
i would prefer to say it was in 'passthrough' mode.
the result was that
1. direct ~ 2000 tps
2. via pgbouncer ~ 1200 tps
----------------------------------------------------------------------------------------------------------------------------------------------
Experience on deploying to production environment with real world load/usage pattern
----------------------------------------------------------------------------------------------------------------------------------------------
Pgbouncer was put in same machine as postgresql connecting via unix domain
to server and tcp sockets with clients.
1. There was drastic reduction in CPU loads from 30 to 10 ldavg
2. There were no clients waiting, pool size was 150 and number of active
connections was 100-120.
3. Application performance was worse (inspite of 0 clients waiting )
I am still waiting to see what is the effect of shifting out pgbounce from dbserver
to appserver, but with pgbench results i am not very hopeful. I am curious why
inspite of 0 clients waiting pgbounce introduces a drop in tps.
Warm Regds
Rajesh Kumar Mallah.
CTO - tradeindia.com.
Keywords: pgbouncer performance
Sorry, if posting here was not proper instead of starting new thread
(I am really not sure if its bad thing to do)
I would like to share my recent experience on implementation of
client side pooling using pgbouncer. By client side i mean that
the the pgbouncer process in not on same machine as postgresql server.
In first trial pgbouncer and postgresql were in same machine & phbouncer
was connecting to postgresql using unix domain sockets. But i shifted it
laters owing to high CPU usage > 50%. ( using top)
Now i have shifted pgbouncer into a virtual machine (openvz container)
in the application server hardware and all my applications on other virtual machines
(web applications) connect to pgbouncer on this virtual machine.
I tested the setup with pgbench in two scenarios
1. connecting to DB server directly
2. connecting to DB via pgbouncer
the no of clients was 10 ( -c 10) carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling factor -s 100.
since client count was less there was no queuing of requests in pgbouncer
i would prefer to say it was in 'passthrough' mode.
the result was that
1. direct ~ 2000 tps
2. via pgbouncer ~ 1200 tps
----------------------------------------------------------------------------------------------------------------------------------------------
Experience on deploying to production environment with real world load/usage pattern
----------------------------------------------------------------------------------------------------------------------------------------------
Pgbouncer was put in same machine as postgresql connecting via unix domain
to server and tcp sockets with clients.
1. There was drastic reduction in CPU loads from 30 to 10 ldavg
2. There were no clients waiting, pool size was 150 and number of active
connections was 100-120.
3. Application performance was worse (inspite of 0 clients waiting )
I am still waiting to see what is the effect of shifting out pgbounce from dbserver
to appserver, but with pgbench results i am not very hopeful. I am curious why
inspite of 0 clients waiting pgbounce introduces a drop in tps.
Warm Regds
Rajesh Kumar Mallah.
CTO - tradeindia.com.
Keywords: pgbouncer performance
On Mon, Jul 12, 2010 at 6:11 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> So rather than asking "should core have a connection pool" perhaps(1) It can prevent the most pessimal performance problems resulting
> what's needed is to ask "what can an in-core pool do that an
> external pool cannot do?"
from lack of an external connection pool (or a badly configured one)
by setting a single GUC. Configuration tools could suggest a good
value during initial setup.
(2) It can be used without installing and configuring a more
sophisticated and complex product.
(3) It might reduce latency because it avoids having to receive,
parse, and resend data in both directions -- eliminating one "hop".
I know the performance benefit would usually accrue to the external
connection pooler, but there might be some circumstances where a
built-in pool could win.
(4) It's one more checkbox which can be ticked off on some RFPs.
That said, I fully agree that if we can include good documentation
on the external poolers and we can get packagers to include poolers
in their distribution, that gets us a much bigger benefit. A
built-in solution would only be worthwhile if it was simple enough
and lightweight enough not to be a burden on execution time or
maintenance. Maybe that's too big an if.
-Kevin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
note: my postgresql server & pgbouncer were not in virtualised environment
in the first setup. Only application server has many openvz containers.
in the first setup. Only application server has many openvz containers.
Nice suggestion to try ,
I will put pgbouncer on raw hardware and run pgbench from same hardware.
regds
rajesh kumar mallah.
I will put pgbouncer on raw hardware and run pgbench from same hardware.
regds
rajesh kumar mallah.
Why in VM (openvz container) ?
Did you also try it in the same OS as your appserver ?
Perhaps even connecting from appserver via unix seckets ?
> and all my applications on other virtual machines
Rajesh Kumar Mallah wrote: > the no of clients was 10 ( -c 10) carrying out 10000 transactions > each (-t 10000) . > pgbench db was initilised with scaling factor -s 100. > > since client count was less there was no queuing of requests in pgbouncer > i would prefer to say it was in 'passthrough' mode. Of course pgbouncer is going decrease performance in this situation. You've added a whole layer to things that all traffic has to pass through, without a setup that gains any benefit from the connection pooling. Try making the client count 1000 instead if you want a useful test. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
ok ,
now the question is , is it possible to dig out from from postgresql database
server if connection pooling is needed ? In our case eg i have kept
max_connections = 300 if i reduce below 250 i get error "max connection reached....."
on connecting to db directly, if i put pgbouncer i get less performance
(even if no clients waiting)
without pooling the dbserver CPU usage increases but performance of apps
is also become good.
Regds
Rajesh Kumar Mallah.
now the question is , is it possible to dig out from from postgresql database
server if connection pooling is needed ? In our case eg i have kept
max_connections = 300 if i reduce below 250 i get error "max connection reached....."
on connecting to db directly, if i put pgbouncer i get less performance
(even if no clients waiting)
without pooling the dbserver CPU usage increases but performance of apps
is also become good.
Regds
Rajesh Kumar Mallah.
On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Rajesh Kumar Mallah wrote:Of course pgbouncer is going decrease performance in this situation. You've added a whole layer to things that all traffic has to pass through, without a setup that gains any benefit from the connection pooling. Try making the client count 1000 instead if you want a useful test.the no of clients was 10 ( -c 10) carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling factor -s 100.
since client count was less there was no queuing of requests in pgbouncer
i would prefer to say it was in 'passthrough' mode.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Dear Greg,
my max_client is 300 shall i test with client count 250 ?
if so what should be the scaling factor while initializing
the pgbench db?
Rajesh Kumar Mallah wrote:Of course pgbouncer is going decrease performance in this situation. You've added a whole layer to things that all traffic has to pass through, without a setup that gains any benefit from the connection pooling. Try making the client count 1000 instead if you want a useful test.the no of clients was 10 ( -c 10) carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling factor -s 100.
since client count was less there was no queuing of requests in pgbouncer
i would prefer to say it was in 'passthrough' mode.
Dear Greg,
my max_client is 300 shall i test with client count 250 ?
if so what should be the scaling factor while initializing
the pgbench db?
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
On Jul 9, 2010, at 8:33 PM, Craig Ringer wrote: > On 10/07/2010 9:25 AM, Josh Berkus wrote: >> >>> It *is* the last place you want to put it, but putting it there can >>> be much better than not putting it *anywhere*, which is what we've >>> often seen. >> >> Well, what you proposed is an admission control mechanism, which is >> *different* from a connection pool, although the two overlap. A >> connection pool solves 4 problems when it's working: >> >> a) limiting the number of database server processes >> b) limiting the number of active concurrent queries >> c) reducing response times for allocating a new connection >> d) allowing management of connection routes to the database >> (redirection, failover, etc.) > > I agree with you: for most Pg users (a) is really, really important. As > you know, in PostgreSQL each connection maintains not only general > connection state (GUC settings, etc) and if in a transaction, > transaction state, but also a query executor (full backend). That gets > nasty not only in memory use, but in impact on active query performance, > as all those query executors have to participate in global signalling > for lock management etc. > > So an in-server pool that solved (b) but not (a) would IMO not be > particularly useful for the majority of users. > > That said, I don't think it follows that (a) cannot be solved in-core. > How much architectural change would be required to do it efficiently > enough, though... > a, b, and c can all be handled in core. But that would be a radical re-architecture to do it right. Postgres assumes thatthe client connection, authentication, and query processing all happen in one place in one process on one thread. Mostserver software built and designed today avoids that model in order to decouple its critical resources from the # ofclient connections. Most server software designed today tries to control its resources and not let the behavior of clientsdictate resource usage. Even Apache HTTPD is undergoing a radical re-design so that it can handle more connections and more easily decouple connectionsfrom concurrent processing to keep up with competitors. I'm not saying that Postgres core should change -- again thats a radical re-architecture. But it should be recognized thatit is not like most other server applications -- it can't control its resources very well and needs help to do so. Fromusing a connection pool to manually setting work_mem differently for different clients or workloads, resource managementis not what it does well. It does a LOT of things very very well, just not that. > -- > Craig Ringer > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Sun, 2010-07-18 at 21:48 +0530, Rajesh Kumar Mallah wrote: > Hi, > > Sorry, if posting here was not proper instead of starting new thread > (I am really not sure if its bad thing to do) > > I would like to share my recent experience on implementation of > client side pooling using pgbouncer. By client side i mean that > the the pgbouncer process in not on same machine as postgresql server. > In first trial pgbouncer and postgresql were in same machine & > phbouncer > was connecting to postgresql using unix domain sockets. But i shifted > it > laters owing to high CPU usage > 50%. ( using top) > Now i have shifted pgbouncer into a virtual machine (openvz container) > in the application server hardware Why in VM (openvz container) ? Did you also try it in the same OS as your appserver ? Perhaps even connecting from appserver via unix seckets ? > and all my applications on other virtual machines > (web applications) connect to pgbouncer on this virtual machine. > > I tested the setup with pgbench in two scenarios > > 1. connecting to DB server directly > 2. connecting to DB via pgbouncer > > the no of clients was 10 ( -c 10) carrying out 10000 transactions > each (-t 10000) . > pgbench db was initilised with scaling factor -s 100. > > since client count was less there was no queuing of requests in > pgbouncer > i would prefer to say it was in 'passthrough' mode. > > the result was that > > 1. direct ~ 2000 tps > 2. via pgbouncer ~ 1200 tps Are you sure you are not measuring how much sunning pgbouncer slows down pgbench directly, by competing for CPU resources and not by adding latency to requests ? > ---------------------------------------------------------------------------------------------------------------------------------------------- > Experience on deploying to production environment with real world > load/usage pattern > ---------------------------------------------------------------------------------------------------------------------------------------------- > > Pgbouncer was put in same machine as postgresql connecting via unix > domain > to server and tcp sockets with clients. > > 1. There was drastic reduction in CPU loads from 30 to 10 ldavg > 2. There were no clients waiting, pool size was 150 and number of > active > connections was 100-120. > 3. Application performance was worse (inspite of 0 clients waiting ) > > > I am still waiting to see what is the effect of shifting out pgbounce > from dbserver > to appserver, but with pgbench results i am not very hopeful. I am > curious why inspite of 0 clients waiting pgbounce introduces a drop in > tps. If you have less clients than pgbouncer connections, you can't have any clients waiting in pgbouncer, as each of them is allocated it's own connection right away. What you were measuring was 1. pgbench and pgbouncer competeing for the same CPU 2. overhead from 2 hops to db (app-proxy-db) instead of 1 (app-db) > Warm Regds > Rajesh Kumar Mallah. > CTO - tradeindia.com. > > > > Keywords: pgbouncer performance > > > > > > > > > > > On Mon, Jul 12, 2010 at 6:11 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: > Craig Ringer <craig@postnewspapers.com.au> wrote: > > > > So rather than asking "should core have a connection pool" > perhaps > > what's needed is to ask "what can an in-core pool do that an > > external pool cannot do?" > > > (1) It can prevent the most pessimal performance problems > resulting > from lack of an external connection pool (or a badly > configured one) > by setting a single GUC. Configuration tools could suggest a > good > value during initial setup. > > (2) It can be used without installing and configuring a more > sophisticated and complex product. > > (3) It might reduce latency because it avoids having to > receive, > parse, and resend data in both directions -- eliminating one > "hop". > I know the performance benefit would usually accrue to the > external > connection pooler, but there might be some circumstances where > a > built-in pool could win. > > (4) It's one more checkbox which can be ticked off on some > RFPs. > > That said, I fully agree that if we can include good > documentation > on the external poolers and we can get packagers to include > poolers > in their distribution, that gets us a much bigger benefit. A > built-in solution would only be worthwhile if it was simple > enough > and lightweight enough not to be a burden on execution time or > maintenance. Maybe that's too big an if. > > -Kevin > > > -- > Sent via pgsql-performance mailing list > (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > >
Looks like ,
pgbench cannot be used for testing with pgbouncer if number of
pgbench clients exceeds pool_size + reserve_pool_size of pgbouncer.
pgbench keeps waiting doing nothing. I am using pgbench of postgresql 8.1.
Are there changes to pgbench in this aspect ?
regds
Rajesh Kumar Mallah.
pgbench cannot be used for testing with pgbouncer if number of
pgbench clients exceeds pool_size + reserve_pool_size of pgbouncer.
pgbench keeps waiting doing nothing. I am using pgbench of postgresql 8.1.
Are there changes to pgbench in this aspect ?
regds
Rajesh Kumar Mallah.
On Sun, Jul 18, 2010 at 11:38 PM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote:
On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith <greg@2ndquadrant.com> wrote:Rajesh Kumar Mallah wrote:Of course pgbouncer is going decrease performance in this situation. You've added a whole layer to things that all traffic has to pass through, without a setup that gains any benefit from the connection pooling. Try making the client count 1000 instead if you want a useful test.the no of clients was 10 ( -c 10) carrying out 10000 transactions each (-t 10000) .
pgbench db was initilised with scaling factor -s 100.
since client count was less there was no queuing of requests in pgbouncer
i would prefer to say it was in 'passthrough' mode.
Dear Greg,
my max_client is 300 shall i test with client count 250 ?
if so what should be the scaling factor while initializing
the pgbench db?
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
> pgbench cannot be used for testing with pgbouncer if number of > pgbench clients exceeds pool_size + reserve_pool_size of pgbouncer. > pgbench keeps waiting doing nothing. I am using pgbench of postgresql 8.1. > Are there changes to pgbench in this aspect ? Pgbench won't start actual transaction until all connections to PostgreSQL (in this case pgbounser I guess) successfully established. IMO You sholud try other benchmark tools. BTW, I think you should use -C option with pgbench for this kind of testing. -C establishes connection for each transaction, which is pretty much similar to the real world application which do not use connection pooling. You will be supprised how PostgreSQL connection overhead is large. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Thanks for the thought but it (-C) does not work .
BTW, I think you should use -C option with pgbench for this kind of
testing. -C establishes connection for each transaction, which is
pretty much similar to the real world application which do not use
connection pooling. You will be supprised how PostgreSQL connection
overhead is large.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
From: Rajesh Kumar Mallah <mallah.rajesh@gmail.com> Subject: Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning. Date: Mon, 19 Jul 2010 08:06:09 +0530 Message-ID: <AANLkTilggKbmC9H7wlhlCdQFm5RJth1-9DPF8GoLV9Hm@mail.gmail.com> > Thanks for the thought but it (-C) does not work . Still you need: pgbench's -c <= (pool_size + reserve_pool_size) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 12/07/10 17:45, Matthew Wakeling wrote: >> >> I'm surprised. Doesn't apache httpd do this? Does it have to do a whole >> load of non-portable stuff? It seems to work on a whole load of platforms. > > A lot of what Apache HTTPd does is handled via the Apache Portable > Runtime (APR). It contains a lot of per-platform handlers for various > functionality. Apache just has all of the worker processes call accept() on the socket, and whichever one the OS hands it off to gets the job. The problem is harder for us because a backend can't switch identities once it's been assigned to a database. I haven't heard an adequate explanation of why that couldn't be changed, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer <craig@postnewspapers.com.au> wrote: > So rather than asking "should core have a connection pool" perhaps > what's needed is to ask "what can an in-core pool do that an external > pool cannot do?" Avoid sending every connection through an extra hop. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > <craig@postnewspapers.com.au> wrote: > > So rather than asking "should core have a connection pool" perhaps > > what's needed is to ask "what can an in-core pool do that an external > > pool cannot do?" > > Avoid sending every connection through an extra hop. Let's extend this shall we: Avoid adding yet another network hop Remove of a point of failure Reduction of administrative overhead Integration into our core authentication mechanisms Greater flexibility in connection control And, having connection pooling in core does not eliminate the use of an external pool where it makes since. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Thu, Jul 22, 2010 at 02:33:43PM -0400, Robert Haas wrote: > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > <craig@postnewspapers.com.au> wrote: > > On 12/07/10 17:45, Matthew Wakeling wrote: > >> > >> I'm surprised. Doesn't apache httpd do this? Does it have to do a whole > >> load of non-portable stuff? It seems to work on a whole load of platforms. > > > > A lot of what Apache HTTPd does is handled via the Apache Portable > > Runtime (APR). It contains a lot of per-platform handlers for various > > functionality. > > Apache just has all of the worker processes call accept() on the > socket, and whichever one the OS hands it off to gets the job. As an inconsequential detail - afaik they keep the os from doing that by protecting it with a mutex for various reasons (speed - as some implementations wake up and see theres nothing to do, multiple sockets, fairness) > The problem is harder for us because a backend can't switch identities > once it's been assigned to a database. I haven't heard an adequate > explanation of why that couldn't be changed, though. Possibly it might decrease the performance significantly enough by reducing the cache locality (syscache, prepared plans)? Andres
On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > <craig@postnewspapers.com.au> wrote: > > So rather than asking "should core have a connection pool" perhaps > > what's needed is to ask "what can an in-core pool do that an external > > pool cannot do?" > > Avoid sending every connection through an extra hop. not really. in-core != magically-in-right-backend-process there will still be "an extra hop",only it will be local, between pooler and backend process. similar to what currently happens with pgbouncer when you deploy it on same server and use unix sockets > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company >
On Thu, 2010-07-22 at 12:15 -0700, Joshua D. Drake wrote: > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: > > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > > <craig@postnewspapers.com.au> wrote: > > > So rather than asking "should core have a connection pool" perhaps > > > what's needed is to ask "what can an in-core pool do that an external > > > pool cannot do?" > > > > Avoid sending every connection through an extra hop. > > Let's extend this shall we: > > Avoid adding yet another network hop postgreSQL is multi-process, so you either have a separate "pooler process" or need to put pooler functionality in postmaster, bothw ways you still have a two-hop scenario for connect. you may be able to pass the socket to child process and also keep it, but doing this for both client and db sides seems really convoluted. Or is there a prortable way to pass sockets back and forth between parent and child processes ? If so, then pgbouncer could use it as well. > Remove of a point of failure rather move the point of failure from external pooler to internal pooler ;) > Reduction of administrative overhead Possibly. But once you start actually using it, you still need to configure and monitor it and do other administrator-y tasks. > Integration into our core authentication mechanisms True, although for example having SSL on client side connection will be so slow that it hides any performance gains from pooling, at least for short-lived connections. > Greater flexibility in connection control Yes, poolers can be much more flexible than default postgresql. See for example pgbouncers PAUSE , RECONFIGURE and RESUME commands > And, having connection pooling in core does not eliminate the use of an > external pool where it makes since. Probably the easiest way to achieve "pooling in core" would be adding an option to start pgbouncer under postmaster control. You probably can't get much leaner than pgbouncer. > -- > PostgreSQL.org Major Contributor > Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 > Consulting, Training, Support, Custom Development, Engineering > http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt > > > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Jul 22, 2010, at 11:36 AM, Robert Haas wrote: > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > <craig@postnewspapers.com.au> wrote: >> So rather than asking "should core have a connection pool" perhaps >> what's needed is to ask "what can an in-core pool do that an external >> pool cannot do?" > > Avoid sending every connection through an extra hop. > Dynamically adjust settings based on resource usage of the DB. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <hannu@krosing.net> wrote: > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer >> <craig@postnewspapers.com.au> wrote: >> > So rather than asking "should core have a connection pool" perhaps >> > what's needed is to ask "what can an in-core pool do that an external >> > pool cannot do?" >> >> Avoid sending every connection through an extra hop. > > not really. in-core != magically-in-right-backend-process Well, how about if we arrange it so it IS in the right backend process? I don't believe magic is required. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <andres@anarazel.de> wrote: >> The problem is harder for us because a backend can't switch identities >> once it's been assigned to a database. I haven't heard an adequate >> explanation of why that couldn't be changed, though. > Possibly it might decrease the performance significantly enough by > reducing the cache locality (syscache, prepared plans)? Those things are backend-local. The worst case scenario is you've got to flush them all when you reinitialize, in which case you still save the overhead of creating a new process. The best case scenario is that you can keep some of them around, in which case, great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > <craig@postnewspapers.com.au> wrote: > > So rather than asking "should core have a connection pool" perhaps > > what's needed is to ask "what can an in-core pool do that an external > > pool cannot do?" > > Avoid sending every connection through an extra hop. Let's extend this shall we: Avoid adding yet another network hop Remove of a point of failure Reduction of administrative overhead Integration into our core authentication mechanisms Greater flexibility in connection control And, having connection pooling in core does not eliminate the use of an external pool where it makes since. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Thu, Jul 22, 2010 at 02:44:04PM -0700, Scott Carey wrote: > On Jul 22, 2010, at 11:36 AM, Robert Haas wrote: > > On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > > <craig@postnewspapers.com.au> wrote: > >> So rather than asking "should core have a connection pool" perhaps > >> what's needed is to ask "what can an in-core pool do that an external > >> pool cannot do?" > > > > Avoid sending every connection through an extra hop. > > > > Dynamically adjust settings based on resource usage of the DB. > Relatively minor, but it would be convenient to avoid having to query $external_pooler to determine the client_addr of an incoming connection. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Attachment
On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote: > > > Let's extend this shall we: > > > > Avoid adding yet another network hop > > postgreSQL is multi-process, so you either have a separate "pooler > process" or need to put pooler functionality in postmaster, bothw ways > you still have a two-hop scenario for connect. you may be able to pass > the socket to child process and also keep it, but doing this for both > client and db sides seems really convoluted. Which means, right now there is three hops. Reducing one is good. > Or is there a prortable way to pass sockets back and forth between > parent and child processes ? > > If so, then pgbouncer could use it as well. > > > Remove of a point of failure > > rather move the point of failure from external pooler to internal > pooler ;) Yes but at that point, it doesn't matter. > > > Reduction of administrative overhead > > Possibly. But once you start actually using it, you still need to > configure and monitor it and do other administrator-y tasks. Yes, but it is inclusive. > > > Integration into our core authentication mechanisms > > True, although for example having SSL on client side connection will be > so slow that it hides any performance gains from pooling, at least for > short-lived connections. Yes, but right now you can't use *any* pooler with LDAP for example. We could if pooling was in core. Your SSL argument doesn't really work because its true with or without pooling. > > Greater flexibility in connection control > > Yes, poolers can be much more flexible than default postgresql. See for > example pgbouncers PAUSE , RECONFIGURE and RESUME commands :D > > > And, having connection pooling in core does not eliminate the use of an > > external pool where it makes since. > > Probably the easiest way to achieve "pooling in core" would be adding an > option to start pgbouncer under postmaster control. Yeah but that won't happen. Also I think we may have a libevent dependency that we have to work out. > > You probably can't get much leaner than pgbouncer. Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but even it has limitations (such as auth). Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote: > On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <hannu@krosing.net> wrote: > > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: > >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > >> <craig@postnewspapers.com.au> wrote: > >> > So rather than asking "should core have a connection pool" perhaps > >> > what's needed is to ask "what can an in-core pool do that an external > >> > pool cannot do?" > >> > >> Avoid sending every connection through an extra hop. > > > > not really. in-core != magically-in-right-backend-process > > Well, how about if we arrange it so it IS in the right backend > process? I don't believe magic is required. Do you have any design in mind, how you can make it so ? --------------- Hannu
On Fri, Jul 23, 2010 at 11:58 AM, Hannu Krosing <hannu@krosing.net> wrote: > On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote: >> On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <hannu@krosing.net> wrote: >> > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: >> >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer >> >> <craig@postnewspapers.com.au> wrote: >> >> > So rather than asking "should core have a connection pool" perhaps >> >> > what's needed is to ask "what can an in-core pool do that an external >> >> > pool cannot do?" >> >> >> >> Avoid sending every connection through an extra hop. >> > >> > not really. in-core != magically-in-right-backend-process >> >> Well, how about if we arrange it so it IS in the right backend >> process? I don't believe magic is required. > > Do you have any design in mind, how you can make it so ? Well, if we could change the backends so that they could fully reinitialize themselves (disconnect from a database to which they are bound, etc.), I don't see why we couldn't use the Apache approach. There's a danger of memory leaks but that's why Apache has MaxRequestsPerChild, and it works pretty darn well. Of course, passing file descriptors would be even nicer (you could pass the connection off to a child that was already bound to the correct database, perhaps) but has pointed out more than once, that's not portable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote: > > > Let's extend this shall we: > > > > Avoid adding yet another network hop > > postgreSQL is multi-process, so you either have a separate "pooler > process" or need to put pooler functionality in postmaster, bothw ways > you still have a two-hop scenario for connect. you may be able to pass > the socket to child process and also keep it, but doing this for both > client and db sides seems really convoluted. Which means, right now there is three hops. Reducing one is good. > Or is there a prortable way to pass sockets back and forth between > parent and child processes ? > > If so, then pgbouncer could use it as well. > > > Remove of a point of failure > > rather move the point of failure from external pooler to internal > pooler ;) Yes but at that point, it doesn't matter. > > > Reduction of administrative overhead > > Possibly. But once you start actually using it, you still need to > configure and monitor it and do other administrator-y tasks. Yes, but it is inclusive. > > > Integration into our core authentication mechanisms > > True, although for example having SSL on client side connection will be > so slow that it hides any performance gains from pooling, at least for > short-lived connections. Yes, but right now you can't use *any* pooler with LDAP for example. We could if pooling was in core. Your SSL argument doesn't really work because its true with or without pooling. > > Greater flexibility in connection control > > Yes, poolers can be much more flexible than default postgresql. See for > example pgbouncers PAUSE , RECONFIGURE and RESUME commands :D > > > And, having connection pooling in core does not eliminate the use of an > > external pool where it makes since. > > Probably the easiest way to achieve "pooling in core" would be adding an > option to start pgbouncer under postmaster control. Yeah but that won't happen. Also I think we may have a libevent dependency that we have to work out. > > You probably can't get much leaner than pgbouncer. Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but even it has limitations (such as auth). Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Fri, Jul 23, 2010 at 01:28:53PM -0400, Robert Haas wrote: > On Fri, Jul 23, 2010 at 11:58 AM, Hannu Krosing <hannu@krosing.net> wrote: > > On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote: > >> On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing <hannu@krosing.net> wrote: > >> > On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: > >> >> On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer > >> >> <craig@postnewspapers.com.au> wrote: > >> >> > So rather than asking "should core have a connection pool" perhaps > >> >> > what's needed is to ask "what can an in-core pool do that an external > >> >> > pool cannot do?" > >> >> > >> >> Avoid sending every connection through an extra hop. > >> > > >> > not really. in-core != magically-in-right-backend-process > >> > >> Well, how about if we arrange it so it IS in the right backend > >> process? I don't believe magic is required. > > > > Do you have any design in mind, how you can make it so ? > > Well, if we could change the backends so that they could fully > reinitialize themselves (disconnect from a database to which they are > bound, etc.), I don't see why we couldn't use the Apache approach. > There's a danger of memory leaks but that's why Apache has > MaxRequestsPerChild, and it works pretty darn well. Of course, > passing file descriptors would be even nicer (you could pass the > connection off to a child that was already bound to the correct > database, perhaps) but has pointed out more than once, that's not > portable. Its not *that bad* though. To my knowledge its 2 or 3 implementations that one would need to implement to support most if not all platforms. - sendmsg/cmsg/SCM_RIGHTS based implementation (most if not all *nixes including solaris, linux, (free|open|net)bsd, OSX, AIX, HPUX, others) - WSADuplicateSocket (windows) - if needed: STREAMS based stuff (I_SENDFD) (at least solaris, hpux, aix, tru64, irix, unixware allow this) Note that I am still not convinced that its a good idea... Andres
Joshua Tolley wrote: > Relatively minor, but it would be convenient to avoid having to query > $external_pooler to determine the client_addr of an incoming connection. > You suggest this as a minor concern, but I consider it to be one of the most compelling arguments in favor of in-core pooling. A constant pain with external poolers is the need to then combine two sources of data in order to track connections fully, which is something that everyone runs into eventually and finds annoying. It's one of the few things that doesn't go away no matter how much fiddling you do with pgBouncer, it's always getting in the way a bit. And it seems to seriously bother systems administrators and developers, not just the DBAs. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Sat, Jul 24, 2010 at 01:23:08AM -0400, Greg Smith wrote: > Joshua Tolley wrote: > >Relatively minor, but it would be convenient to avoid having to query > >$external_pooler to determine the client_addr of an incoming connection. > > You suggest this as a minor concern, but I consider it to be one of > the most compelling arguments in favor of in-core pooling. A > constant pain with external poolers is the need to then combine two > sources of data in order to track connections fully, which is > something that everyone runs into eventually and finds annoying. > It's one of the few things that doesn't go away no matter how much > fiddling you do with pgBouncer, it's always getting in the way a > bit. And it seems to seriously bother systems administrators and > developers, not just the DBAs. But you have to admit that this problem won't vanish as people will continue to use poolers on other machines for resource reasons. So providing a capability to do something sensible here seems to be useful independent of in-core pooling. Andres
On 24/07/10 01:28, Robert Haas wrote: > Well, if we could change the backends so that they could fully > reinitialize themselves (disconnect from a database to which they are > bound, etc.), I don't see why we couldn't use the Apache approach. This would offer the bonus on the side that it'd be more practical to implement database changes for a connection, akin to MySQL's "USE". Inefficient, sure, but possible. I don't care about that current limitation very much. I think anyone changing databases all the time probably has the wrong design and should be using schema. I'm sure there are times it'd be good to be able to switch databases on one connection, though. My question with all this remains: is it worth the effort when external poolers already solve the problem. Can PostgreSQL offer tools and interfaces to permit external poolers to solve the problems they have, rather than trying to bring them in-core? For example, with auth, can the Pg server offer features to help poolers implement passthrough authentication against the real Pg server? Perhaps Pg could expose auth features over SQL, permitting appropriately privileged users to verify credentials with SQL-level calls. Poolers could pass supplied user credentials through to the real Pg server for verification. For bonus points, an SQL interface could be provided that lets the super-priveleged auth managing connection be used to change the login role of another running backend/connection, so the pooler could hand out connections with different login user ids without having to maintain a pool per user id. ( That'd probably also permit implementation of a "CHANGE USER" command, where the client changed login roles on the fly by passing the credentials of the new role. That'd be *awesome* for application server connection pools. ) -- Craig Ringer
On 24/07/10 13:23, Greg Smith wrote: > Joshua Tolley wrote: >> Relatively minor, but it would be convenient to avoid having to query >> $external_pooler to determine the client_addr of an incoming connection. >> > > You suggest this as a minor concern, but I consider it to be one of the > most compelling arguments in favor of in-core pooling. A constant pain > with external poolers is the need to then combine two sources of data in > order to track connections fully, which is something that everyone runs > into eventually and finds annoying. It's one of the few things that > doesn't go away no matter how much fiddling you do with pgBouncer, it's > always getting in the way a bit. And it seems to seriously bother > systems administrators and developers, not just the DBAs. Putting a pooler in core won't inherently fix this, and won't remove the need to solve it for cases where the pooler can't be on the same machine. 9.0 has application_name to let apps identify themselves. Perhaps a "pooled_client_ip", to be set by a pooler rather than the app, could be added to address this problem in a way that can be used by all poolers new and existing, not just any new in-core pooling system. If a privileged set of pooler functions is was considered, as per my other recent mail, the pooler could use a management connection to set the client ip before handing the connection to the client, so the client couldn't change pooled_client_ip its self by accident or through malice. But even without that, it'd be awfully handy. -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > 9.0 has application_name to let apps identify themselves. Perhaps a > "pooled_client_ip", to be set by a pooler rather than the app, could be > added to address this problem in a way that can be used by all poolers > new and existing, not just any new in-core pooling system. X-Forwarded-For ? -- dim
On Fri, 2010-07-23 at 09:52 -0700, Joshua D. Drake wrote: > On Thu, 2010-07-22 at 20:56 +0100, Hannu Krosing wrote: > > > > > Let's extend this shall we: > > > > > > Avoid adding yet another network hop > > > > postgreSQL is multi-process, so you either have a separate "pooler > > process" or need to put pooler functionality in postmaster, bothw ways > > you still have a two-hop scenario for connect. you may be able to pass > > the socket to child process and also keep it, but doing this for both > > client and db sides seems really convoluted. > > Which means, right now there is three hops. Reducing one is good. No, it is still two, as postmaster passes the socket to spwaned child postgresql process after login. the process is as follows Client --connects--> postmaster --spawns--> postgreSQL server process then socket is passed to be used directly so the use is Client --talks-to---> postgreSQL server process when using spooler it becomes Client --connects-to--> Spooler --passes-requests-to--> postgreSQL I see no way to have spooler select the postgreSQL process, pass the client connection in a way that taks directly to postgrSQL server process AND be able to get the server connection back once the client is finishe with either the request, transaction or connection (depending on pooling mode). > > > Or is there a prortable way to pass sockets back and forth between > > parent and child processes ? > > > > If so, then pgbouncer could use it as well. > > > > > Remove of a point of failure > > > > rather move the point of failure from external pooler to internal > > pooler ;) > > Yes but at that point, it doesn't matter. > > > > > > Reduction of administrative overhead > > > > Possibly. But once you start actually using it, you still need to > > configure and monitor it and do other administrator-y tasks. > > Yes, but it is inclusive. > > > > > > Integration into our core authentication mechanisms > > > > True, although for example having SSL on client side connection will be > > so slow that it hides any performance gains from pooling, at least for > > short-lived connections. > > Yes, but right now you can't use *any* pooler with LDAP for example. We > could if pooling was in core. Your SSL argument doesn't really work > because its true with or without pooling. As main slowdown in SSL is connection setup, so you can get the network security and pooling speedup if you run pool on client side and make the pooler-server connection over SSL. > > > Greater flexibility in connection control > > > > Yes, poolers can be much more flexible than default postgresql. See for > > example pgbouncers PAUSE , RECONFIGURE and RESUME commands > > :D > > > > > > And, having connection pooling in core does not eliminate the use of an > > > external pool where it makes since. > > > > Probably the easiest way to achieve "pooling in core" would be adding an > > option to start pgbouncer under postmaster control. > > Yeah but that won't happen. I guess it could happen as part of opening up the "postgresql controlled process" part to be configurable and able to run third party stuff. Another thing to run under postmaster control would be pgqd . > Also I think we may have a libevent > dependency that we have to work out. > > > > > You probably can't get much leaner than pgbouncer. > > Oh don't get me wrong. I love pgbouncer. It is my recommended pooler but > even it has limitations (such as auth). As pgbouncer is single-threaded and the main goal has been performance there is not much enthusiasm about having _any_ auth method included which cant be completed in a few cpu cycles. It may be possible to add threads to wait for LDAP/Kerberos/... response or do SSL handshakes, but i have not seen any interest from Marko to do it himself. Maybe there is a way to modularise the auth part of postmaster in a way that could be used from third party products through some nice API which postmaster-controlled pgbouncer can start using. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Sat, 2010-07-24 at 14:36 +0800, Craig Ringer wrote: > On 24/07/10 13:23, Greg Smith wrote: > > Joshua Tolley wrote: > >> Relatively minor, but it would be convenient to avoid having to query > >> $external_pooler to determine the client_addr of an incoming connection. > >> > > > > You suggest this as a minor concern, but I consider it to be one of the > > most compelling arguments in favor of in-core pooling. A constant pain > > with external poolers is the need to then combine two sources of data in > > order to track connections fully, which is something that everyone runs > > into eventually and finds annoying. It's one of the few things that > > doesn't go away no matter how much fiddling you do with pgBouncer, it's > > always getting in the way a bit. And it seems to seriously bother > > systems administrators and developers, not just the DBAs. > > > Putting a pooler in core won't inherently fix this, and won't remove the > need to solve it for cases where the pooler can't be on the same machine. > > 9.0 has application_name to let apps identify themselves. Perhaps a > "pooled_client_ip", to be set by a pooler rather than the app, could be > added to address this problem in a way that can be used by all poolers > new and existing, not just any new in-core pooling system. > > If a privileged set of pooler functions is was considered, as per my > other recent mail, the pooler could use a management connection to set > the client ip before handing the connection to the client, so the client > couldn't change pooled_client_ip its self by accident or through malice. > But even without that, it'd be awfully handy. Or maybe we can add some command extensions to the protocol for passing extra info, so that instead of sending just the "(run_query:query)" command over socket we could send both the extra info and execute "(set_params:(proxy_client_ip:a.b.c.d)(proxy_client_post:n)(something else))(run_query:query)" in one packet (for performance) and have these things be available in logging and pg_stat_activity I see no need to try to somehow restrict these if you can always be sure that they are set by the direct client. proxy can decide to pass some of these from the real client but it would be a decision made by proxy, not mandated by some proxying rules. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Sat, Jul 24, 2010 at 2:23 AM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 24/07/10 01:28, Robert Haas wrote: > >> Well, if we could change the backends so that they could fully >> reinitialize themselves (disconnect from a database to which they are >> bound, etc.), I don't see why we couldn't use the Apache approach. > > This would offer the bonus on the side that it'd be more practical to > implement database changes for a connection, akin to MySQL's "USE". > Inefficient, sure, but possible. Yep. > I don't care about that current limitation very much. I think anyone > changing databases all the time probably has the wrong design and should > be using schema. I'm sure there are times it'd be good to be able to > switch databases on one connection, though. I pretty much agree with this. I think this is merely slightly nice on its own, but I think it might be a building-block to other things that we might want to do down the road. Markus Wanner's Postgres-R replication uses worker processes; autovacuum does as well; and then there's parallel query. I can't help thinking that not needing to fork a new backend every time you want to connect to a new database has got to be useful. > My question with all this remains: is it worth the effort when external > poolers already solve the problem. Whether it's worth the effort is something anyone who is thinking about working on this will have to decide for themselves. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <andres@anarazel.de> wrote: >>> The problem is harder for us because a backend can't switch identities >>> once it's been assigned to a database. �I haven't heard an adequate >>> explanation of why that couldn't be changed, though. >> Possibly it might decrease the performance significantly enough by >> reducing the cache locality (syscache, prepared plans)? > Those things are backend-local. The worst case scenario is you've got > to flush them all when you reinitialize, in which case you still save > the overhead of creating a new process. "Flushing them all" is not zero-cost; it's not too hard to believe that it could actually be slower than forking a clean new backend. What's much worse, it's not zero-bug. We've got little bitty caches all over the backend, including (no doubt) some caching behavior in third-party code that wouldn't get the word about whatever API you invented to deal with this. regards, tom lane
On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <andres@anarazel.de> wrote: >>>> The problem is harder for us because a backend can't switch identities >>>> once it's been assigned to a database. I haven't heard an adequate >>>> explanation of why that couldn't be changed, though. > >>> Possibly it might decrease the performance significantly enough by >>> reducing the cache locality (syscache, prepared plans)? > >> Those things are backend-local. The worst case scenario is you've got >> to flush them all when you reinitialize, in which case you still save >> the overhead of creating a new process. > > "Flushing them all" is not zero-cost; it's not too hard to believe that > it could actually be slower than forking a clean new backend. I'm not so sure I believe that. Is a sinval overrun slower than forking a clean new backend? Is DISCARD ALL slower that forking a clean new backend? How much white space is there between either of those and what would be needed here? I guess it could be slower, but I wouldn't want to assume that without evidence. > What's much worse, it's not zero-bug. We've got little bitty caches > all over the backend, including (no doubt) some caching behavior in > third-party code that wouldn't get the word about whatever API you > invented to deal with this. I think this is probably the biggest issue with the whole idea, and I agree there would be some pain involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Flushing them all" is not zero-cost; it's not too hard to believe that >> it could actually be slower than forking a clean new backend. > I'm not so sure I believe that. I'm not asserting it's true, just suggesting it's entirely possible. Other than the fork() cost itself and whatever authentication activity there might be, practically all the startup cost of a new backend can be seen as cache-populating activities. You'd have to redo all of that, *plus* pay the costs of getting rid of the previous cache entries. Maybe the latter costs less than a fork(), or maybe not. fork() is pretty cheap on modern Unixen. >> What's much worse, it's not zero-bug. > I think this is probably the biggest issue with the whole idea, and I > agree there would be some pain involved. Yeah, if it weren't for that I'd say "sure let's try it". But I'm afraid we'd be introducing significant headaches in return for a gain that's quite speculative. regards, tom lane
On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> "Flushing them all" is not zero-cost; it's not too hard to believe that >>> it could actually be slower than forking a clean new backend. > >> I'm not so sure I believe that. > > I'm not asserting it's true, just suggesting it's entirely possible. > Other than the fork() cost itself and whatever authentication activity > there might be, practically all the startup cost of a new backend can be > seen as cache-populating activities. You'd have to redo all of that, > *plus* pay the costs of getting rid of the previous cache entries. > Maybe the latter costs less than a fork(), or maybe not. fork() is > pretty cheap on modern Unixen. > >>> What's much worse, it's not zero-bug. > >> I think this is probably the biggest issue with the whole idea, and I >> agree there would be some pain involved. > > Yeah, if it weren't for that I'd say "sure let's try it". But I'm > afraid we'd be introducing significant headaches in return for a gain > that's quite speculative. I agree that the gain is minimal of itself; after all, how often do you need to switch databases, and what's the big deal if the postmaster has to fork a new backend? Where I see it as a potentially big win is when it comes to things like parallel query. I can't help thinking that's going to be a lot less efficient if you're forever forking new backends. Perhaps the point here is that you'd actually sort of like to NOT flush all those caches unless it turns out that you're switching databases - many installations probably operate with essentially one big database, so chances are good that even if you distributed connections / parallel queries to backends round-robin, you'd potentially save quite a bit of overhead. Of course, for the guy who has TWO big databases, you might hope to be a little smarter, but that's another problem altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Other than the fork() cost itself and whatever authentication activity >> there might be, practically all the startup cost of a new backend can be >> seen as cache-populating activities. �You'd have to redo all of that, >> *plus* pay the costs of getting rid of the previous cache entries. >> Maybe the latter costs less than a fork(), or maybe not. �fork() is >> pretty cheap on modern Unixen. > I agree that the gain is minimal of itself; after all, how often do > you need to switch databases, and what's the big deal if the > postmaster has to fork a new backend? Where I see it as a potentially > big win is when it comes to things like parallel query. I can't help > thinking that's going to be a lot less efficient if you're forever > forking new backends. Color me unconvinced. To do parallel queries with pre-existing worker processes, you'd need to hook up with a worker that was (at least) in your own database, and then somehow feed it the query plan that it needs to execute. I'm thinking fork() could easily be cheaper. But obviously this is all speculation (... and Windows is going to be a whole 'nother story in any case ...) regards, tom lane
On 28/07/10 04:40, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund <andres@anarazel.de> wrote: >>>> The problem is harder for us because a backend can't switch identities >>>> once it's been assigned to a database. I haven't heard an adequate >>>> explanation of why that couldn't be changed, though. > >>> Possibly it might decrease the performance significantly enough by >>> reducing the cache locality (syscache, prepared plans)? > >> Those things are backend-local. The worst case scenario is you've got >> to flush them all when you reinitialize, in which case you still save >> the overhead of creating a new process. > > "Flushing them all" is not zero-cost; it's not too hard to believe that > it could actually be slower than forking a clean new backend. > > What's much worse, it's not zero-bug. We've got little bitty caches > all over the backend, including (no doubt) some caching behavior in > third-party code that wouldn't get the word about whatever API you > invented to deal with this. In addition to caches, there may be some places where memory is just expected to leak. Since it's a one-off allocation, nobody really cares; why bother cleaning it up when it's quicker to just let the OS do it when the backend terminates? Being able to reset a backend for re-use would require that per-session memory use be as neatly managed as per-query and per-transaction memory, with no leaked stragglers left lying around. Such cleanup (and management) has its own costs. Plus, you have a potentially increasingly fragmented memory map to deal with the longer the backend lives. Overall, there are plenty of advantages to letting the OS clean it up. ... however, if the requirement is introduced that a given backend may only be re-used for connections to the same database, lots of things get simpler. You have to be able to change the current user (which would be a bonus anyway), reset GUCs, etc, but how much else is there to do? That way you can maintain per-database pools of idle workers (apache prefork style) with ageing-out of backends that're unused. Wouldn't this do the vast majority of what most pools are needed for anyway? And wouldn't it potentially save quite a bit of load by avoiding having backends constantly switching databases, flushing caches, etc? -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
28.07.10 04:56, Tom Lane написав(ла): > > I'm not asserting it's true, just suggesting it's entirely possible. > Other than the fork() cost itself and whatever authentication activity > there might be, practically all the startup cost of a new backend can be > seen as cache-populating activities. You'd have to redo all of that, > *plus* pay the costs of getting rid of the previous cache entries. > Maybe the latter costs less than a fork(), or maybe not. fork() is > pretty cheap on modern Unixen. > > Actually as for me, the problem is that one can't raise number of database connections high without overloading CPU/memory/disk, so external pooling is needed. If postgresql had something like max_active_queries setting that limit number of connections that are not in IDLE [in transaction] state, one could raise max connections high (and I don't think idle process itself has much overhead) and limit max_active_queries to get maximum performance and won't use external pooling. Of course this won't help if the numbers are really high, but could work out the most common cases. Best regards, Vitalii Tymchyshyn
On 7/27/10 6:56 PM, Tom Lane wrote: > Yeah, if it weren't for that I'd say "sure let's try it". But I'm > afraid we'd be introducing significant headaches in return for a gain > that's quite speculative. Well, the *gain* isn't speculative. For example, I am once again dealing with the issue that PG backend processes on Solaris never give up their RAM, resulting in pathological swapping situations if you have many idle connections. This requires me to install pgpool, which is overkill (since it has load balancing, replication, and more) just to make sure that connections get recycled so that I don't have 300 idle connections eating up 8GB of RAM. Relative to switching databases, I'd tend to say that, like pgbouncer and pgpool, we don't need to support that. Each user/database combo can have their own "pool". While not ideal, this would be good enough for 90% of users. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Wed, Jul 28, 2010 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 7/27/10 6:56 PM, Tom Lane wrote: >> Yeah, if it weren't for that I'd say "sure let's try it". But I'm >> afraid we'd be introducing significant headaches in return for a gain >> that's quite speculative. > > Well, the *gain* isn't speculative. For example, I am once again > dealing with the issue that PG backend processes on Solaris never give > up their RAM, resulting in pathological swapping situations if you have > many idle connections. This requires me to install pgpool, which is > overkill (since it has load balancing, replication, and more) just to > make sure that connections get recycled so that I don't have 300 idle > connections eating up 8GB of RAM. > > Relative to switching databases, I'd tend to say that, like pgbouncer > and pgpool, we don't need to support that. Each user/database combo can > have their own "pool". While not ideal, this would be good enough for > 90% of users. However, if we don't support that, we can't do any sort of pooling-ish thing without the ability to pass file descriptors between processes; and Tom seems fairly convinced there's no portable way to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > However, if we don't support that, we can't do any sort of pooling-ish > thing without the ability to pass file descriptors between processes; > and Tom seems fairly convinced there's no portable way to do that. Well, what it would come down to is: are we prepared to not support pooling on platforms without such a capability? It's certainly possible to do it on many modern platforms, but I don't believe we can make it happen everywhere. Generally we've tried to avoid having major features that don't work everywhere ... regards, tom lane
On Wed, Jul 28, 2010 at 04:10:08PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > However, if we don't support that, we can't do any sort of pooling-ish > > thing without the ability to pass file descriptors between processes; > > and Tom seems fairly convinced there's no portable way to do that. > > Well, what it would come down to is: are we prepared to not support > pooling on platforms without such a capability? It's certainly possible > to do it on many modern platforms, but I don't believe we can make it > happen everywhere. Generally we've tried to avoid having major features > that don't work everywhere ... Which platforms do you have in mind here? All of the platforms I found documented to be supported seem to support at least one of SCM_RIGHTS, WSADuplicateSocket or STREAMS/FD_INSERT. Most if not all beside windows support SCM_RIGHTS. The ones I am dubious about support FD_INSERT... Andres
On Wed, Jul 28, 2010 at 4:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> However, if we don't support that, we can't do any sort of pooling-ish >> thing without the ability to pass file descriptors between processes; >> and Tom seems fairly convinced there's no portable way to do that. > > Well, what it would come down to is: are we prepared to not support > pooling on platforms without such a capability? It's certainly possible > to do it on many modern platforms, but I don't believe we can make it > happen everywhere. Generally we've tried to avoid having major features > that don't work everywhere ... I suppose it depends on the magnitude of the benefit. And how many platforms aren't covered. And how much code is required. In short, until someone writes a patch, who knows? I think the core question we should be thinking about is what would be the cleanest method of resetting a backend - either for the same database or for a different one, whichever seems easier. And by cleanest, I mean least likely to introduce bugs. If we can get to the point where we have something to play around with, even if it's kind of kludgey or doesn't quite work, it'll give us some idea of whether further effort is worthwhile and how it should be directed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
> introduce bugs. If we can get to the point where we have something to > play around with, even if it's kind of kludgey or doesn't quite work, > it'll give us some idea of whether further effort is worthwhile and > how it should be directed. Should I put this on the TODO list, then, in hopes that someone steps forward? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com