Thread: Urgent: 10K or more connections
Hi all, I have received a question via the Advocacy site and I am not knowledgeable enough to answer. Can you help? The question is: can PostgreSQL handle between 10'000 and 40'000 simultaneous connections? The persone asking the question has to choose between Oracle and PostgreSQL, and my guess is that they would be relieved if they could go with PostgreSQL. Do you have any additional advice I could transmit to this person about handling that many connections. I'm sure any help we can provide will be an additional selling point. Thanks. -------- Francois Home page: http://www.monpetitcoin.com/ "We waste our time spending money we don't have to buy things we don't need to impress people we don't like"
On Fri, 18 Jul 2003, Francois Suter wrote: > Hi all, > > I have received a question via the Advocacy site and I am not knowledgeable > enough to answer. Can you help? > > The question is: can PostgreSQL handle between 10'000 and 40'000 > simultaneous connections? The persone asking the question has to choose > between Oracle and PostgreSQL, and my guess is that they would be relieved > if they could go with PostgreSQL. > > Do you have any additional advice I could transmit to this person about > handling that many connections. I'm sure any help we can provide will be an > additional selling point. Wow! That's quite a few connections. I would say that 10,000 connections is a lot for ANY database to hold open. Can this person use connection pooling? Or do they need an actual 10,000 parallel accesses to get things done? If they can't use connection pooling, or connection pooling only gets them down to 10k to 40k connections, then that's a huge system. I wouldn't run something that big on Oracle or Postgresql, I'd use a farm of mainframes running something like TPF like the airlines do.
Francois Suter <dba@paragraf.ch> writes: > Hi all, > > I have received a question via the Advocacy site and I am not knowledgeable > enough to answer. Can you help? > > The question is: can PostgreSQL handle between 10'000 and 40'000 > simultaneous connections? The persone asking the question has to choose > between Oracle and PostgreSQL, and my guess is that they would be relieved > if they could go with PostgreSQL. On a big enough system, sure. Each PG connection backend is a separate process, so you'd need to make sure the process table was big enough, open file and shared memory limits set high, etc. You'd want a really big machine, hopefully 64-bit like a Sparc or IA64, with lots of memory. But you'd want that for Oracle, too. You'd definitely want to spend a lot of time tuning and testing for that activity level, but again, you'd do that for Oracle too. -Doug
On 18 Jul 2003, Doug McNaught wrote: > Francois Suter <dba@paragraf.ch> writes: > > > Hi all, > > > > I have received a question via the Advocacy site and I am not knowledgeable > > enough to answer. Can you help? > > > > The question is: can PostgreSQL handle between 10'000 and 40'000 > > simultaneous connections? The persone asking the question has to choose > > between Oracle and PostgreSQL, and my guess is that they would be relieved > > if they could go with PostgreSQL. > > On a big enough system, sure. Each PG connection backend is a > separate process, so you'd need to make sure the process table was big > enough, open file and shared memory limits set high, etc. You'd want > a really big machine, hopefully 64-bit like a Sparc or IA64, with lots > of memory. But you'd want that for Oracle, too. > > You'd definitely want to spend a lot of time tuning and testing for > that activity level, but again, you'd do that for Oracle too. I'm gonna go out on a limb and guess that if you want 10k concurrent connections, you're likely gonna be spending some time here on the list getting postgresql to perform in that environment. I.e. little inefficiencies in shared memory access and IPC are gonna cause this to crawl even on a Sun E10k with 64 CPUs and 64 gigs of ram. But I'm sure that with a few tweaks to the code here and there it's doable, just don't expect it to work "out of the box".
> I have received a question via the Advocacy site and I am not > knowledgeable enough to answer. Can you help? > > The question is: can PostgreSQL handle between 10'000 and 40'000 > simultaneous connections? The persone asking the question has to > choose between Oracle and PostgreSQL, and my guess is that they > would be relieved if they could go with PostgreSQL. > > Do you have any additional advice I could transmit to this person > about handling that many connections. I'm sure any help we can > provide will be an additional selling point. Actually, this begs the question: are there any "reverse DB" proxy servers around that people have used? Having a reverse libpq proxy server would _rock_. Some light weight multi-threaded proxy that relays active connections to the backend and holds idle connections more efficiently than PostgreSQL... well... it'd be a life saver in sooooo many situations. Granted it'd have its short comings (connections would persist to the backend along transactions, once committed, the front end would "detatch" from the backend that it was using), but this is achitecturally similar to what MS and ORA do to handle gazillions of connections to a database that in reality, can only handle a few hundred (maybe a thousand or two) active connections. -sc -- Sean Chittenden
On Fri, 18 Jul 2003, Sean Chittenden wrote: > > I have received a question via the Advocacy site and I am not > > knowledgeable enough to answer. Can you help? > > > > The question is: can PostgreSQL handle between 10'000 and 40'000 > > simultaneous connections? The persone asking the question has to > > choose between Oracle and PostgreSQL, and my guess is that they > > would be relieved if they could go with PostgreSQL. > > > > Do you have any additional advice I could transmit to this person > > about handling that many connections. I'm sure any help we can > > provide will be an additional selling point. > > Actually, this begs the question: are there any "reverse DB" proxy > servers around that people have used? Having a reverse libpq proxy > server would _rock_. Some light weight multi-threaded proxy that > relays active connections to the backend and holds idle connections > more efficiently than PostgreSQL... well... it'd be a life saver in > sooooo many situations. Granted it'd have its short comings > (connections would persist to the backend along transactions, once > committed, the front end would "detatch" from the backend that it was > using), but this is achitecturally similar to what MS and ORA do to > handle gazillions of connections to a database that in reality, can > only handle a few hundred (maybe a thousand or two) active > connections. I thin usogres does this. not sure though, I haven't played with it, just heard of it.
> > > I have received a question via the Advocacy site and I am not > > > knowledgeable enough to answer. Can you help? > > > > > > The question is: can PostgreSQL handle between 10'000 and 40'000 > > > simultaneous connections? The persone asking the question has to > > > choose between Oracle and PostgreSQL, and my guess is that they > > > would be relieved if they could go with PostgreSQL. > > > > > > Do you have any additional advice I could transmit to this person > > > about handling that many connections. I'm sure any help we can > > > provide will be an additional selling point. > > > > Actually, this begs the question: are there any "reverse DB" proxy > > servers around that people have used? Having a reverse libpq proxy > > server would _rock_. Some light weight multi-threaded proxy that > > relays active connections to the backend and holds idle connections > > more efficiently than PostgreSQL... well... it'd be a life saver in > > sooooo many situations. Granted it'd have its short comings > > (connections would persist to the backend along transactions, once > > committed, the front end would "detatch" from the backend that it was > > using), but this is achitecturally similar to what MS and ORA do to > > handle gazillions of connections to a database that in reality, can > > only handle a few hundred (maybe a thousand or two) active > > connections. > > I thin usogres does this. not sure though, I haven't played with it, just > heard of it. Hrm... Usogres Usogres is Synchronizing Option for postGRESql. That's the meaning of name for this system. What is Usogres? Real-time Backup Utility Usogres is the system which executes database duplication for PostgreSQL(http://www.postgresql.org/) which replication wasn't realized now. It is fundamentally different from replication even if it is said that it executes database duplication. Persistently, it is defined to be making the same database in real-time. Looks like it's closer to a replication solution to me, but you may be able to stick one of these infront of PostgreSQL and have it handle higher numbers of connections. ::shrug:: I may have to check it out and see... -sc -- Sean Chittenden
Sean Chittenden wrote: >>I have received a question via the Advocacy site and I am not >>knowledgeable enough to answer. Can you help? >> >>The question is: can PostgreSQL handle between 10'000 and 40'000 >>simultaneous connections? The persone asking the question has to >>choose between Oracle and PostgreSQL, and my guess is that they >>would be relieved if they could go with PostgreSQL. >> >>Do you have any additional advice I could transmit to this person >>about handling that many connections. I'm sure any help we can >>provide will be an additional selling point. >> >> > >Actually, this begs the question: are there any "reverse DB" proxy >servers around that people have used? Having a reverse libpq proxy >server would _rock_. Some light weight multi-threaded proxy that >relays active connections to the backend and holds idle connections >more efficiently than PostgreSQL... well... it'd be a life saver in >sooooo many situations. Granted it'd have its short comings >(connections would persist to the backend along transactions, once >committed, the front end would "detatch" from the backend that it was >using), but this is achitecturally similar to what MS and ORA do to >handle gazillions of connections to a database that in reality, can >only handle a few hundred (maybe a thousand or two) active >connections. > > There are 1000's of references to postgresql and connection pooling. http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql Maybe somthing there will work.
> There are 1000's of references to postgresql and connection pooling. > > http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql > > Maybe somthing there will work. Those are all application level connection pooling links. I'm thinking about something that's done on the database side like ORA Listener and passes active connections back to the backend that way it's completely transparent and applies to every libpq app regardless of the language, application, etc. -sc -- Sean Chittenden
"scott.marlowe" <scott.marlowe@ihs.com> writes: > But I'm sure that with a few tweaks to the code here and there it's > doable, just don't expect it to work "out of the box". I think you'd be sticking your neck out to assume that 10k concurrent connections would perform well, even after tweaking. I'd worry first about whether the OS can handle 10k processes (which among other things would probably require order-of-300k open file descriptors...). Maybe Solaris is built to do that but the Unixen I've dealt with would go belly up. After that you'd have to look at Postgres' internal issues --- contention on access to the PROC array would probably become a significant factor, for example, and we'd have to do some redesign to avoid linear scans of the PROC array where possible. I don't doubt that we could support 10k concurrent *users*, given connection pooling of some kind. I'm dubious about 10k concurrent database sessions though. regards, tom lane
On Friday 18 July 2003 01:28 pm, Sean Chittenden wrote: > > There are 1000's of references to postgresql and connection pooling. > > > > http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql > > > > Maybe somthing there will work. > > Those are all application level connection pooling links. I'm > thinking about something that's done on the database side like ORA > Listener and passes active connections back to the backend that way > it's completely transparent and applies to every libpq app regardless > of the language, application, etc. > > -sc Perhaps this is the answer? I've never used it, but I remember seeing ti on freshmeat.net. http://sqlrelay.sourceforge.net/ Regards, Jeff Davis
> > > There are 1000's of references to postgresql and connection pooling. > > > > > > http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql > > > > > > Maybe somthing there will work. > > > > Those are all application level connection pooling links. I'm > > thinking about something that's done on the database side like ORA > > Listener and passes active connections back to the backend that way > > it's completely transparent and applies to every libpq app regardless > > of the language, application, etc. > > > > -sc > > Perhaps this is the answer? I've never used it, but I remember > seeing ti on freshmeat.net. > > http://sqlrelay.sourceforge.net/ Very cool... too bad rudiments (prerequisit) doesn't compile with newer versions of GCC and this requires you to develop your apps using sqlrelay's API and not libpq.... though it's better than nothing. :) -sc -- Sean Chittenden
On Fri, 18 Jul 2003, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > But I'm sure that with a few tweaks to the code here and there it's > > doable, just don't expect it to work "out of the box". > > I think you'd be sticking your neck out to assume that 10k concurrent > connections would perform well, even after tweaking. I'd worry first > about whether the OS can handle 10k processes (which among other things > would probably require order-of-300k open file descriptors...). Maybe > Solaris is built to do that but the Unixen I've dealt with would go > belly up. After that you'd have to look at Postgres' internal issues > --- contention on access to the PROC array would probably become a > significant factor, for example, and we'd have to do some redesign to > avoid linear scans of the PROC array where possible. > This page describes all the problems and strategies a web server would use to handle 10k concurrent connections. This is the kind of thing that can bring an otherwise performant OS to it's knees. And this is just to grab some data off disk and shovel it out over HTTP, consider how much more work a database must do. http://www.kegel.com/c10k.html Kris Jurka
Sean Chittenden <sean@chittenden.org> writes: > Some light weight multi-threaded proxy that > relays active connections to the backend and holds idle connections > more efficiently than PostgreSQL... What excuse is there for postgres connections being heavyweight to begin with? The only real resource they ought to represent is a single TCP connection. Servers that manage 10,000 TCP connections are a dime a dozen these days. Any database context that has to be stored for the connection, the state of binary/text or autocommit mode or whatever, will have to be maintained by any pooling interface anyways. And I think both of those examples are now much cleaner more or less stateless per-request flags anyways. Basically what I'm asking is, hypothetically, if postgres were implemented using threads instead of processes, are there any per-connection resources that really couldn't be completely disposed of when the connection was completely idle between (ie at the start of) transactions? Ideally if every per-connection resource could be completely disposed of whenever the connection was completely idle then you wouldn't need a whole extra layer for the communication to traverse and a whole extra layer of complexity for the protocol semantics to be maintained. A multithreaded server could easily handle 10k-40k mostly idle connections without any unusual resource needs. -- greg
> > > But I'm sure that with a few tweaks to the code here and there > > > it's doable, just don't expect it to work "out of the box". > > > > I think you'd be sticking your neck out to assume that 10k > > concurrent connections would perform well, even after tweaking. > > I'd worry first about whether the OS can handle 10k processes > > (which among other things would probably require order-of-300k > > open file descriptors...). Maybe Solaris is built to do that but > > the Unixen I've dealt with would go belly up. After that you'd > > have to look at Postgres' internal issues --- contention on access > > to the PROC array would probably become a significant factor, for > > example, and we'd have to do some redesign to avoid linear scans > > of the PROC array where possible. > > This page describes all the problems and strategies a web server > would use to handle 10k concurrent connections. This is the kind of > thing that can bring an otherwise performant OS to it's knees. And > this is just to grab some data off disk and shovel it out over HTTP, > consider how much more work a database must do. > > http://www.kegel.com/c10k.html *lightning strikes/apple falls on head* Whoa! It's tough, sure, but _far_ from impossible. My bread and butter is big web __fill_in_the_blank__ and I routinely handle ~60K to some of my web server _instances_. Kegel's page and analysis are a bit dated and most of it's still true and applicable. Using kqueue(2) on FreeBSD, it's pretty easy to have bazillions of concurrent connections and maintain low latency rates when identifying processes that are ready to be worked on (avoiding select(2) is _required_ to get above a few thousand). On Linux or Slowaris, poll(2) can be substituted for kqueue(2) and on other OSes that are less fortunate, select(2) will suffice and no one would be the wiser (except for in the scalability dept.). With OSes that allow passing of FD's between existing processes (iirc PostgreSQL fork()'s with the connection, it doesn't pass FD's around) and making use of a given platform's alternatives to select(2), it's very plausible to imagine a world where a backend hands an idle connection back to the parent process for safe keeping/process load balancing. Arguably, that function call should've been added to the most recent libpq(3) update that way the feature could be added to the backend and libs wouldn't have to be updated in order for the feature to be available. Now that I think about it, this is significantly easier to accomplish than adding mmap(2) to the backend (~500-1,000 lines of code) ... To preserve ABI compatibility, a persistent flag would likely be set to a PGconn (ex: PQpersistConnection(PGconn *conn, bool value)) or an environment variable/symlink file (ex: /etc/malloc.conf) could be used to globally apply this to all libpq apps. When an app is done with a backend, the backend calls PQfinish() as per normal, except instead of closing the connection, it would send a message to the backend informing it that the client is done with the connection for now. On the DB, the postmaster would kill off the given backend, and launch X number of idle backends for the given database that the now idle connection thinks its connected to. In doing this, when the connection gets used again and the most expensive parts of the DB connection process is already done (fork()ing, populating the backend, and if using SSL, going through the asymmetric crypto routines - they're killer on machines without off a hardware backed /dev/crypto). For web farms with many diverse applications that can't make use of a database connection pooling app, this'd be dynamite. When a process dies, the connection would finally close in reality and the backend postmaster would reap the connection. Using kqueue(2), it's very plausible that a PostgreSQL instance could handle about ~50K idle connections. poll(2) could do probably about ~10K connections without any real degradation in performance. Those stuck with select(2), you'd be lucky to get beyond a few hundred idle. If you're unfortunate enough to be apart of the Win32 crowd... *smack* why are you running a high volume DB server on Win32? You should know better. This is hugely valuable to me now... hrm, guess I have my new summer project. -sc -- Sean Chittenden
> > Some light weight multi-threaded proxy that relays active > > connections to the backend and holds idle connections more > > efficiently than PostgreSQL... > > What excuse is there for postgres connections being heavyweight to > begin with? The only real resource they ought to represent is a > single TCP connection. Servers that manage 10,000 TCP connections > are a dime a dozen these days. > > Any database context that has to be stored for the connection, the > state of binary/text or autocommit mode or whatever, will have to be > maintained by any pooling interface anyways. And I think both of > those examples are now much cleaner more or less stateless > per-request flags anyways. > > Basically what I'm asking is, hypothetically, if postgres were > implemented using threads instead of processes, are there any > per-connection resources that really couldn't be completely disposed > of when the connection was completely idle between (ie at the start > of) transactions? > > Ideally if every per-connection resource could be completely > disposed of whenever the connection was completely idle then you > wouldn't need a whole extra layer for the communication to traverse > and a whole extra layer of complexity for the protocol semantics to > be maintained. A multithreaded server could easily handle 10k-40k > mostly idle connections without any unusual resource needs. PostgreSQL will never be single proc, multi-threaded, and I don't think it should be for reliability's sake. See my above post, however, as I think I may have a better way to handle "lots of connections" without using threads. -sc -- Sean Chittenden
Sean Chittenden wrote: >PostgreSQL will never be single proc, multi-threaded, and I don't >think it should be for reliability's sake. See my above post, >however, as I think I may have a better way to handle "lots of >connections" without using threads. -sc > never is a VERY long time ... Also, the single proc/multiple proc thing does not have to be exclusive. Meaning you could "tune" the system so that it could do either. I have developed a single process server that handled thousands of connections. I've also developed a single process database (a while back) that handled multiple connections but I'm not sure I would do it the "hard" way again as the cost of writing the code for keeping context was not insignificant, although there are much better ways of doing it than how I did it 15 years ago. What you talk about is very fundamental and I would love to have another go at it .... however you're right that this won't happen any time soon. Connection pooling is a fundamentally flawed way of overcoming this problem. A different design could render a significantly higher feasable connection count. G
> >PostgreSQL will never be single proc, multi-threaded, and I don't > >think it should be for reliability's sake. See my above post, > >however, as I think I may have a better way to handle "lots of > >connections" without using threads. -sc > > never is a VERY long time ... Also, the single proc/multiple proc > thing does not have to be exclusive. Meaning you could "tune" the > system so that it could do either. True. This topic has come up a zillion times in the past though. The memory segmentation and reliability that independent processes give you is huge and the biggest reason why _if_ PostgreSQL does spontaneously wedge itself (like MySQL does all too often), you're only having to cope with a single DB connection being corrupt, invalid, etc. Imagine a threaded model where the process was horked and you loose 1000 connections worth of data in a SEGV. *shudder* Unix is reliable at the cost of memory segmentation... something that I dearly believe in. If that weren't worth anything, then I'd run everything in kernel and avoid the context switching, which is pretty expensive. > I have developed a single process server that handled thousands of > connections. I've also developed a single process database (a while > back) that handled multiple connections but I'm not sure I would do > it the "hard" way again as the cost of writing the code for keeping > context was not insignificant, although there are much better ways > of doing it than how I did it 15 years ago. Not saying it's not possible, just that at this point, reliability is more paramount than handling additional connections. With copy on write VM's being abundant these days, a lot of the size that you see with PostgreSQL is shared. Memory profiling and increasing the number of read only pages would be an extremely interesting exercise that could yield some slick results in terms of reducing the memory foot print of PG's children. > What you talk about is very fundamental and I would love to have > another go at it .... however you're right that this won't happen > any time soon. Connection pooling is a fundamentally flawed way of > overcoming this problem. A different design could render a > significantly higher feasable connection count. Surprisingly, it's not that complex at least handling a large number of FDs and figuring out which ones have data on them and need to be passed to a backend. I'm actually using the model for monitoring FD's from thttpd and reapplying bits where appropriate. It's abstraction of kqueue()/poll()/select() is nice enough to not want to reinvent the wheel (same with its license). Hopefully ripping through the incoming data and figuring out which backend pool to send a connection to won't be that bad, but I have next to no experience with writing that kind of code and my Stevens is hidden away in one of 23 boxes from a move earlier this month. I only know that Apache 1.3 does this with obviously huge success on basically every *nix so it can't be too hard. -sc -- Sean Chittenden
Sean Chittenden <sean@chittenden.org> writes: > it's very plausible to imagine a world where a backend hands an idle > connection back to the parent process for safe keeping/process load > balancing. And your current database, user authorization, prepared statements, SET values, cached plpgsql plans, etc etc go where exactly? The notion that a Postgres session can be replaced by a lightweight object is just not workable IMHO; we've developed far too many features that require persistent state on the backend side. For applications that don't need those features (or, more realistically, want the same persistent state for all transactions they engage in), client-side connection pooling solves the problem. It seems very unlikely that apps that are too diverse to share a client-side pool would be able to share a backend session if only the connection mechanism were a bit different. regards, tom lane
Sean Chittenden wrote: >>>PostgreSQL will never be single proc, multi-threaded, and I don't >>>think it should be for reliability's sake. See my above post, >>>however, as I think I may have a better way to handle "lots of >>>connections" without using threads. -sc >>> >>> >>never is a VERY long time ... Also, the single proc/multiple proc >>thing does not have to be exclusive. Meaning you could "tune" the >>system so that it could do either. >> >> > >True. This topic has come up a zillion times in the past though. The >memory segmentation and reliability that independent processes give >you is huge and the biggest reason why _if_ PostgreSQL does >spontaneously wedge itself (like MySQL does all too often), you're >only having to cope with a single DB connection being corrupt, >invalid, etc. Imagine a threaded model where the process was horked >and you loose 1000 connections worth of data in a SEGV. *shudder* >Unix is reliable at the cost of memory segmentation... something that >I dearly believe in. If that weren't worth anything, then I'd run >everything in kernel and avoid the context switching, which is pretty >expensive. > > Yep, but if you design it right, you can have both. A rare occasion where you can have the cake and eat it too. >>I have developed a single process server that handled thousands of >>connections. I've also developed a single process database (a while >>back) that handled multiple connections but I'm not sure I would do >>it the "hard" way again as the cost of writing the code for keeping >>context was not insignificant, although there are much better ways >>of doing it than how I did it 15 years ago. >> >> > >Not saying it's not possible, just that at this point, reliability is >more paramount than handling additional connections. With copy on >write VM's being abundant these days, a lot of the size that you see >with PostgreSQL is shared. Memory profiling and increasing the number >of read only pages would be an extremely interesting exercise that >could yield some slick results in terms of reducing the memory foot >print of PG's children. > > Context switching and cache thrashing are the killers in a multiple process model. There is a 6-10x performance penalty for running in separate processes vs running in a single process (and single thread) which I observed when doing benchmarking on a streaming server. Perhaps a better scheduler (like the O(1) scheduler in Linux 2.6.* would improve that but I just don't know. >>What you talk about is very fundamental and I would love to have >>another go at it .... however you're right that this won't happen >>any time soon. Connection pooling is a fundamentally flawed way of >>overcoming this problem. A different design could render a >>significantly higher feasable connection count. >> >> > >Surprisingly, it's not that complex at least handling a large number >of FDs and figuring out which ones have data on them and need to be >passed to a backend. I'm actually using the model for monitoring FD's >from thttpd and reapplying bits where appropriate. It's abstraction >of kqueue()/poll()/select() is nice enough to not want to reinvent the >wheel (same with its license). Hopefully ripping through the incoming >data and figuring out which backend pool to send a connection to won't >be that bad, but I have next to no experience with writing that kind >of code and my Stevens is hidden away in one of 23 boxes from a move >earlier this month. I only know that Apache 1.3 does this with >obviously huge success on basically every *nix so it can't be too >hard. > > No epoll ?
> > it's very plausible to imagine a world where a backend hands an > > idle connection back to the parent process for safe > > keeping/process load balancing. > > And your current database, user authorization, prepared statements, > SET values, cached plpgsql plans, etc etc go where exactly? No where, everything remains as is. I actually think you'll appreciate the simplicity of this once I'm done explaining how I'm going about this. I'm tweaking the way that ServerLoop(), pq_close()/proc_exit(), and PQfinish() work so that the backend will pass the FD of the connection back to the postmaster before dying. Once the backend is dead/while dying, the postmaster will fire up a new backend (or three, GUC configurable) of the same database, but doesn't pass the FD to the new backend until an FD is ready to do work. fork(), in theory, is done before a connection is initiated. I'm hoping to move as much of the backend initialization to happen before the FD is passed to the backend that way the time between a client making a connection and a backend being ready to serve the request is as small as possible. I've broken this down into a few parts to make things more palatable though, see the end of the email for details. > The notion that a Postgres session can be replaced by a lightweight > object is just not workable IMHO; we've developed far too many > features that require persistent state on the backend side. :) Sure it is, hear me out. I never thought I'd blend the concepts from Apache and thttpd in a database, of all places. I do in my own webservers, but... well, it never even occurred to me to apply this to PostgreSQL. > For applications that don't need those features (or, more > realistically, want the same persistent state for all transactions > they engage in), client-side connection pooling solves the problem. > It seems very unlikely that apps that are too diverse to share a > client-side pool would be able to share a backend session if only > the connection mechanism were a bit different. On my network, I have C progs, Ruby, Perl, PHP, a few JDBC connections (*puke*) all competing for database resources, many inside of Apache, many outside of Apache in the form of agents. Believe me, nipping this problem at the libpq end of things is the way to go. Java's a lost cause in terms of wanting any performance, so I don't care if my JDBC users have to wait as long as they are now for a backend to fire up. The way that I've broken things down into phases. Here's what I'd like to do in each phase: Phase I: Connection pooling a) Tweak ServerLoop() and postmaster startup so that it has a realistic connection. On select(2) it's 32, on poll(2) it's the max number of FD's allowed per proc, and kqueue(2)... well, the sky is the limit. This is all correctly bounded by a process's resource limits and the kernel's limits. I'm about 40% done with this. I've finished the connection pool and have provided generic wrappers around select(), poll(), and kqueue(). The next thing I need to do is tweak ServerLoop() so that any connections in the idle connection pool are handed off to a backend. Handling of new connections isn't going to change right now. b) Change invocations of pq_close() over to a new function called pq_handoff() if a connection is marked as persistent. pq_handoff() passes the FD back to the postmaster then proceeds to die. pq_handoff() is only called when PQfinish() is called by the client. I need to make sure that the client sends something when it calls PQfinish(), but I'm 90% sure it must having looked at just the backend code (Tom, could you give a 'yeah or 'neah on this if my assertion is right?). In this step, tweak libpq() so that it's possible to mark a connection as persistent. A global mechanism will be available in the form of both an environment variable (LIBPQPERSIST) or a symlink file that get's readlink()'ed (ex: ln -s 'p=dbname' /etc/libpq.conf). d) Ensure that a local UNIX socket is in use/alive in a protected area for the sake of passing FD's around. Sticking this in the $PGDATA/data directory would be wise to prevent other users on a system from stealing FD's (which is pretty rare and requires massive hacker foo). See the send(2), sendto(2), and sendmsg(2) API for details. e) #ifdef everything so that it won't ever work on Win32 and can be turned off/on at configure time. At this point, unless I've missed a feature that OpenSSL provides to aid with this, I'm pretty sure that connection passing will not work with SSL connections (for now) as you'd have to pass the connection's state back to the postmaster. At this point, everything is well contained and at the _very_ least persistent clients get to avoid a TCP connection setup/tear down. New connections get handled identically as they are now: only an extra bit of code seeing if there are any connections with data on them is added to the current flow of things. I may end up tweaking the way the backend handles listening for new connections, however, and may replace it with the above abstracted bits. kqueue()/poll() is just sooo much more efficient than select() and when listening in a non-blocking way and bouncing back and forth between the two, it could amount to a bit of savings in terms of # of system calls and reduce connection startup latency for people on reasonably modern OSes. Phase II: Preemptive backend pools a) The postmaster gets schooled on pools of backend processes. I'm undecided about how to handle process pools, however. Part of me thinks that the backend should pre-init itself for a given database, and wait for its FD to be passed to it for auth. By having it already init'ed for a given db, startup times will drop further. Problem is, how do you do this on DB servers with lots of different DBs? Some of the DBs (template1 comes to mind) should never have pools of procs waiting, but some should. I'd like to have this kind of a config stuffed into the backend in a system catalog, actually, but I'm leery of doing so without guidance from someone with ueber knowledge of Pg's internals, which leads me to the alternative: have a bunch of procs waiting around, but not init'ed to any given DB. Certainly the simpler approach and may be what I settle on for now. Opening the can of worms for sticking configuration bits in a system catalog isn't something I'm interested in playing with for the time being (though the idea is really appealing to me). b) BackendFork() code gets split up into a few pieces to handle not having a connection up front. Splitting it into two functions, BackendFork() and BackendInit() will probably be sufficient. Phase III: Beautification a) Clean things up so that SSL connections work with persistent connections. By far and away the most expensive part of SSL connections is the asymmetric key handling and it'd be really great if persistent connections could only have to worry about symmetric crypto, which is vastly cheaper. b) Other cleanup that I'm sure Tom will point out along the way. And that's about it. Phase II and Phase I could be done independently. Phase III I'm leaving as a misc catch all. That's my analysis of what needs to be done. The connection pooling bit isn't that bad, but it's also the part that's the most strait forward and the bits that I'm quite familiar with. Phase II is a bit more murky about and I'll probably have a few questions about when I get there. Comments? The whole point of this is to be able to handle large numbers of connections and reduce the startup time for each connection if its persistent by having an already established TCP connection as well as an already fork()'ed backend (and hopefully initialized for a given DB) waiting to serve an active connection. -sc -- Sean Chittenden
Sean Chittenden <sean@chittenden.org> writes: > No where, everything remains as is. I actually think you'll > appreciate the simplicity of this once I'm done explaining how I'm > going about this. I don't think you heard a word I said :-( You appear to be adding a whole lot of platform dependency (passing FDs around other than by fork() is not portable) in return for loss of functionality (applications can't rely on session state anymore) and highly dubious performance gain (just because a backend has started does not mean it's built up a reasonable working set of cache entries; you can't expect that firing off a new backend for every transaction is going to be anything but a huge performance loss, even if you assume its visible session state is exactly what the application needs). Also it sounds to me like the postmaster will now become a performance bottleneck, since it will need to be involved in every transaction start. You're welcome to pursue this if you like, but I'll bet lunch that you are wasting your time. regards, tom lane