Re: Urgent: 10K or more connections - Mailing list pgsql-general

From Sean Chittenden
Subject Re: Urgent: 10K or more connections
Date
Msg-id 20030719204713.GH24507@perrin.int.nxad.com
Whole thread Raw
In response to Re: Urgent: 10K or more connections  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Urgent: 10K or more connections
List pgsql-general
> > it's very plausible to imagine a world where a backend hands an
> > idle connection back to the parent process for safe
> > keeping/process load balancing.
>
> And your current database, user authorization, prepared statements,
> SET values, cached plpgsql plans, etc etc go where exactly?

No where, everything remains as is.  I actually think you'll
appreciate the simplicity of this once I'm done explaining how I'm
going about this.

I'm tweaking the way that ServerLoop(), pq_close()/proc_exit(), and
PQfinish() work so that the backend will pass the FD of the connection
back to the postmaster before dying.  Once the backend is dead/while
dying, the postmaster will fire up a new backend (or three, GUC
configurable) of the same database, but doesn't pass the FD to the new
backend until an FD is ready to do work.  fork(), in theory, is done
before a connection is initiated.  I'm hoping to move as much of the
backend initialization to happen before the FD is passed to the
backend that way the time between a client making a connection and a
backend being ready to serve the request is as small as possible.

I've broken this down into a few parts to make things more palatable
though, see the end of the email for details.

> The notion that a Postgres session can be replaced by a lightweight
> object is just not workable IMHO; we've developed far too many
> features that require persistent state on the backend side.

:) Sure it is, hear me out.  I never thought I'd blend the concepts
from Apache and thttpd in a database, of all places.  I do in my own
webservers, but... well, it never even occurred to me to apply this to
PostgreSQL.

> For applications that don't need those features (or, more
> realistically, want the same persistent state for all transactions
> they engage in), client-side connection pooling solves the problem.
> It seems very unlikely that apps that are too diverse to share a
> client-side pool would be able to share a backend session if only
> the connection mechanism were a bit different.

On my network, I have C progs, Ruby, Perl, PHP, a few JDBC connections
(*puke*) all competing for database resources, many inside of Apache,
many outside of Apache in the form of agents.  Believe me, nipping
this problem at the libpq end of things is the way to go.  Java's a
lost cause in terms of wanting any performance, so I don't care if my
JDBC users have to wait as long as they are now for a backend to fire
up.


The way that I've broken things down into phases.  Here's what I'd
like to do in each phase:

Phase I: Connection pooling

      a) Tweak ServerLoop() and postmaster startup so that it has a
     realistic connection.  On select(2) it's 32, on poll(2) it's
     the max number of FD's allowed per proc, and
     kqueue(2)... well, the sky is the limit.  This is all
     correctly bounded by a process's resource limits and the
     kernel's limits.  I'm about 40% done with this.  I've
     finished the connection pool and have provided generic
     wrappers around select(), poll(), and kqueue().  The next
     thing I need to do is tweak ServerLoop() so that any
     connections in the idle connection pool are handed off to a
     backend.  Handling of new connections isn't going to change
     right now.

      b) Change invocations of pq_close() over to a new function
         called pq_handoff() if a connection is marked as persistent.
         pq_handoff() passes the FD back to the postmaster then
         proceeds to die.  pq_handoff() is only called when PQfinish()
         is called by the client.  I need to make sure that the client
         sends something when it calls PQfinish(), but I'm 90% sure it
         must having looked at just the backend code (Tom, could you
         give a 'yeah or 'neah on this if my assertion is right?).  In
         this step, tweak libpq() so that it's possible to mark a
         connection as persistent.  A global mechanism will be
         available in the form of both an environment variable
         (LIBPQPERSIST) or a symlink file that get's readlink()'ed
         (ex: ln -s 'p=dbname' /etc/libpq.conf).

      d) Ensure that a local UNIX socket is in use/alive in a
         protected area for the sake of passing FD's around.  Sticking
         this in the $PGDATA/data directory would be wise to prevent
         other users on a system from stealing FD's (which is pretty
         rare and requires massive hacker foo).  See the send(2),
         sendto(2), and sendmsg(2) API for details.

      e) #ifdef everything so that it won't ever work on Win32 and can
         be turned off/on at configure time.  At this point, unless
         I've missed a feature that OpenSSL provides to aid with this,
         I'm pretty sure that connection passing will not work with
         SSL connections (for now) as you'd have to pass the
         connection's state back to the postmaster.

At this point, everything is well contained and at the _very_ least
persistent clients get to avoid a TCP connection setup/tear down.  New
connections get handled identically as they are now: only an extra bit
of code seeing if there are any connections with data on them is added
to the current flow of things.  I may end up tweaking the way the
backend handles listening for new connections, however, and may
replace it with the above abstracted bits.  kqueue()/poll() is just
sooo much more efficient than select() and when listening in a
non-blocking way and bouncing back and forth between the two, it could
amount to a bit of savings in terms of # of system calls and reduce
connection startup latency for people on reasonably modern OSes.

Phase II: Preemptive backend pools

      a) The postmaster gets schooled on pools of backend processes.
     I'm undecided about how to handle process pools, however.
     Part of me thinks that the backend should pre-init itself for
     a given database, and wait for its FD to be passed to it for
     auth.  By having it already init'ed for a given db, startup
     times will drop further.  Problem is, how do you do this on
     DB servers with lots of different DBs?  Some of the DBs
     (template1 comes to mind) should never have pools of procs
     waiting, but some should.  I'd like to have this kind of a
     config stuffed into the backend in a system catalog,
     actually, but I'm leery of doing so without guidance from
     someone with ueber knowledge of Pg's internals, which leads
     me to the alternative: have a bunch of procs waiting around,
     but not init'ed to any given DB.  Certainly the simpler
     approach and may be what I settle on for now.  Opening the
     can of worms for sticking configuration bits in a system
     catalog isn't something I'm interested in playing with for
     the time being (though the idea is really appealing to me).

      b) BackendFork() code gets split up into a few pieces to handle
         not having a connection up front.  Splitting it into two
         functions, BackendFork() and BackendInit() will probably be
         sufficient.

Phase III: Beautification

      a) Clean things up so that SSL connections work with persistent
         connections.  By far and away the most expensive part of SSL
         connections is the asymmetric key handling and it'd be really
         great if persistent connections could only have to worry
         about symmetric crypto, which is vastly cheaper.

      b) Other cleanup that I'm sure Tom will point out along the way.

And that's about it.  Phase II and Phase I could be done
independently.  Phase III I'm leaving as a misc catch all.  That's my
analysis of what needs to be done.  The connection pooling bit isn't
that bad, but it's also the part that's the most strait forward and
the bits that I'm quite familiar with.  Phase II is a bit more murky
about and I'll probably have a few questions about when I get there.

Comments?  The whole point of this is to be able to handle large
numbers of connections and reduce the startup time for each connection
if its persistent by having an already established TCP connection as
well as an already fork()'ed backend (and hopefully initialized for a
given DB) waiting to serve an active connection.

-sc

--
Sean Chittenden

pgsql-general by date:

Previous
From: Andrew Gould
Date:
Subject: Re: Cywin? Or wait for 7.4 and Windows port?
Next
From: "Arcadius A."
Date:
Subject: Help with privilege or pg_hba.conf