Re: Urgent: 10K or more connections - Mailing list pgsql-general
From | Sean Chittenden |
---|---|
Subject | Re: Urgent: 10K or more connections |
Date | |
Msg-id | 20030719204713.GH24507@perrin.int.nxad.com Whole thread Raw |
In response to | Re: Urgent: 10K or more connections (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Urgent: 10K or more connections
|
List | pgsql-general |
> > it's very plausible to imagine a world where a backend hands an > > idle connection back to the parent process for safe > > keeping/process load balancing. > > And your current database, user authorization, prepared statements, > SET values, cached plpgsql plans, etc etc go where exactly? No where, everything remains as is. I actually think you'll appreciate the simplicity of this once I'm done explaining how I'm going about this. I'm tweaking the way that ServerLoop(), pq_close()/proc_exit(), and PQfinish() work so that the backend will pass the FD of the connection back to the postmaster before dying. Once the backend is dead/while dying, the postmaster will fire up a new backend (or three, GUC configurable) of the same database, but doesn't pass the FD to the new backend until an FD is ready to do work. fork(), in theory, is done before a connection is initiated. I'm hoping to move as much of the backend initialization to happen before the FD is passed to the backend that way the time between a client making a connection and a backend being ready to serve the request is as small as possible. I've broken this down into a few parts to make things more palatable though, see the end of the email for details. > The notion that a Postgres session can be replaced by a lightweight > object is just not workable IMHO; we've developed far too many > features that require persistent state on the backend side. :) Sure it is, hear me out. I never thought I'd blend the concepts from Apache and thttpd in a database, of all places. I do in my own webservers, but... well, it never even occurred to me to apply this to PostgreSQL. > For applications that don't need those features (or, more > realistically, want the same persistent state for all transactions > they engage in), client-side connection pooling solves the problem. > It seems very unlikely that apps that are too diverse to share a > client-side pool would be able to share a backend session if only > the connection mechanism were a bit different. On my network, I have C progs, Ruby, Perl, PHP, a few JDBC connections (*puke*) all competing for database resources, many inside of Apache, many outside of Apache in the form of agents. Believe me, nipping this problem at the libpq end of things is the way to go. Java's a lost cause in terms of wanting any performance, so I don't care if my JDBC users have to wait as long as they are now for a backend to fire up. The way that I've broken things down into phases. Here's what I'd like to do in each phase: Phase I: Connection pooling a) Tweak ServerLoop() and postmaster startup so that it has a realistic connection. On select(2) it's 32, on poll(2) it's the max number of FD's allowed per proc, and kqueue(2)... well, the sky is the limit. This is all correctly bounded by a process's resource limits and the kernel's limits. I'm about 40% done with this. I've finished the connection pool and have provided generic wrappers around select(), poll(), and kqueue(). The next thing I need to do is tweak ServerLoop() so that any connections in the idle connection pool are handed off to a backend. Handling of new connections isn't going to change right now. b) Change invocations of pq_close() over to a new function called pq_handoff() if a connection is marked as persistent. pq_handoff() passes the FD back to the postmaster then proceeds to die. pq_handoff() is only called when PQfinish() is called by the client. I need to make sure that the client sends something when it calls PQfinish(), but I'm 90% sure it must having looked at just the backend code (Tom, could you give a 'yeah or 'neah on this if my assertion is right?). In this step, tweak libpq() so that it's possible to mark a connection as persistent. A global mechanism will be available in the form of both an environment variable (LIBPQPERSIST) or a symlink file that get's readlink()'ed (ex: ln -s 'p=dbname' /etc/libpq.conf). d) Ensure that a local UNIX socket is in use/alive in a protected area for the sake of passing FD's around. Sticking this in the $PGDATA/data directory would be wise to prevent other users on a system from stealing FD's (which is pretty rare and requires massive hacker foo). See the send(2), sendto(2), and sendmsg(2) API for details. e) #ifdef everything so that it won't ever work on Win32 and can be turned off/on at configure time. At this point, unless I've missed a feature that OpenSSL provides to aid with this, I'm pretty sure that connection passing will not work with SSL connections (for now) as you'd have to pass the connection's state back to the postmaster. At this point, everything is well contained and at the _very_ least persistent clients get to avoid a TCP connection setup/tear down. New connections get handled identically as they are now: only an extra bit of code seeing if there are any connections with data on them is added to the current flow of things. I may end up tweaking the way the backend handles listening for new connections, however, and may replace it with the above abstracted bits. kqueue()/poll() is just sooo much more efficient than select() and when listening in a non-blocking way and bouncing back and forth between the two, it could amount to a bit of savings in terms of # of system calls and reduce connection startup latency for people on reasonably modern OSes. Phase II: Preemptive backend pools a) The postmaster gets schooled on pools of backend processes. I'm undecided about how to handle process pools, however. Part of me thinks that the backend should pre-init itself for a given database, and wait for its FD to be passed to it for auth. By having it already init'ed for a given db, startup times will drop further. Problem is, how do you do this on DB servers with lots of different DBs? Some of the DBs (template1 comes to mind) should never have pools of procs waiting, but some should. I'd like to have this kind of a config stuffed into the backend in a system catalog, actually, but I'm leery of doing so without guidance from someone with ueber knowledge of Pg's internals, which leads me to the alternative: have a bunch of procs waiting around, but not init'ed to any given DB. Certainly the simpler approach and may be what I settle on for now. Opening the can of worms for sticking configuration bits in a system catalog isn't something I'm interested in playing with for the time being (though the idea is really appealing to me). b) BackendFork() code gets split up into a few pieces to handle not having a connection up front. Splitting it into two functions, BackendFork() and BackendInit() will probably be sufficient. Phase III: Beautification a) Clean things up so that SSL connections work with persistent connections. By far and away the most expensive part of SSL connections is the asymmetric key handling and it'd be really great if persistent connections could only have to worry about symmetric crypto, which is vastly cheaper. b) Other cleanup that I'm sure Tom will point out along the way. And that's about it. Phase II and Phase I could be done independently. Phase III I'm leaving as a misc catch all. That's my analysis of what needs to be done. The connection pooling bit isn't that bad, but it's also the part that's the most strait forward and the bits that I'm quite familiar with. Phase II is a bit more murky about and I'll probably have a few questions about when I get there. Comments? The whole point of this is to be able to handle large numbers of connections and reduce the startup time for each connection if its persistent by having an already established TCP connection as well as an already fork()'ed backend (and hopefully initialized for a given DB) waiting to serve an active connection. -sc -- Sean Chittenden
pgsql-general by date: