Thread: Urgent: 10K or more connections

Urgent: 10K or more connections

From
Francois Suter
Date:
Hi all,

I have received a question via the Advocacy site and I am not knowledgeable
enough to answer. Can you help?

The question is: can PostgreSQL handle between 10'000 and 40'000
simultaneous connections? The persone asking the question has to choose
between Oracle and PostgreSQL, and my guess is that they would be relieved
if they could go with PostgreSQL.

Do you have any additional advice I could transmit to this person about
handling that many connections. I'm sure any help we can provide will be an
additional selling point.

Thanks.

--------
Francois

Home page: http://www.monpetitcoin.com/
"We waste our time spending money we don't have to buy things we don't need
to impress people we don't like"


Re: Urgent: 10K or more connections

From
"scott.marlowe"
Date:
On Fri, 18 Jul 2003, Francois Suter wrote:

> Hi all,
>
> I have received a question via the Advocacy site and I am not knowledgeable
> enough to answer. Can you help?
>
> The question is: can PostgreSQL handle between 10'000 and 40'000
> simultaneous connections? The persone asking the question has to choose
> between Oracle and PostgreSQL, and my guess is that they would be relieved
> if they could go with PostgreSQL.
>
> Do you have any additional advice I could transmit to this person about
> handling that many connections. I'm sure any help we can provide will be an
> additional selling point.

Wow!  That's quite a few connections.  I would say that 10,000 connections
is a lot for ANY database to hold open.

Can this person use connection pooling?  Or do they need an actual 10,000
parallel accesses to get things done?  If they can't use connection
pooling, or connection pooling only gets them down to 10k to 40k
connections, then that's a huge system.  I wouldn't run something that big
on Oracle or Postgresql, I'd use a farm of mainframes running something
like TPF like the airlines do.


Re: Urgent: 10K or more connections

From
Doug McNaught
Date:
Francois Suter <dba@paragraf.ch> writes:

> Hi all,
>
> I have received a question via the Advocacy site and I am not knowledgeable
> enough to answer. Can you help?
>
> The question is: can PostgreSQL handle between 10'000 and 40'000
> simultaneous connections? The persone asking the question has to choose
> between Oracle and PostgreSQL, and my guess is that they would be relieved
> if they could go with PostgreSQL.

On a big enough system, sure.  Each PG connection backend is a
separate process, so you'd need to make sure the process table was big
enough, open file and shared memory limits set high, etc.  You'd want
a really big machine, hopefully 64-bit like a Sparc or IA64, with lots
of memory.  But you'd want that for Oracle, too.

You'd definitely want to spend a lot of time tuning and testing for
that activity level, but again, you'd do that for Oracle too.

-Doug

Re: Urgent: 10K or more connections

From
"scott.marlowe"
Date:
On 18 Jul 2003, Doug McNaught wrote:

> Francois Suter <dba@paragraf.ch> writes:
>
> > Hi all,
> >
> > I have received a question via the Advocacy site and I am not knowledgeable
> > enough to answer. Can you help?
> >
> > The question is: can PostgreSQL handle between 10'000 and 40'000
> > simultaneous connections? The persone asking the question has to choose
> > between Oracle and PostgreSQL, and my guess is that they would be relieved
> > if they could go with PostgreSQL.
>
> On a big enough system, sure.  Each PG connection backend is a
> separate process, so you'd need to make sure the process table was big
> enough, open file and shared memory limits set high, etc.  You'd want
> a really big machine, hopefully 64-bit like a Sparc or IA64, with lots
> of memory.  But you'd want that for Oracle, too.
>
> You'd definitely want to spend a lot of time tuning and testing for
> that activity level, but again, you'd do that for Oracle too.

I'm gonna go out on a limb and guess that if you want 10k concurrent
connections, you're likely gonna be spending some time here on the list
getting postgresql to perform in that environment.  I.e. little
inefficiencies in shared memory access and IPC are gonna cause this to
crawl even on a Sun E10k with 64 CPUs and 64 gigs of ram.

But I'm sure that with a few tweaks to the code here and there it's
doable, just don't expect it to work "out of the box".


Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> I have received a question via the Advocacy site and I am not
> knowledgeable enough to answer. Can you help?
>
> The question is: can PostgreSQL handle between 10'000 and 40'000
> simultaneous connections? The persone asking the question has to
> choose between Oracle and PostgreSQL, and my guess is that they
> would be relieved if they could go with PostgreSQL.
>
> Do you have any additional advice I could transmit to this person
> about handling that many connections. I'm sure any help we can
> provide will be an additional selling point.

Actually, this begs the question: are there any "reverse DB" proxy
servers around that people have used?  Having a reverse libpq proxy
server would _rock_.  Some light weight multi-threaded proxy that
relays active connections to the backend and holds idle connections
more efficiently than PostgreSQL... well... it'd be a life saver in
sooooo many situations.  Granted it'd have its short comings
(connections would persist to the backend along transactions, once
committed, the front end would "detatch" from the backend that it was
using), but this is achitecturally similar to what MS and ORA do to
handle gazillions of connections to a database that in reality, can
only handle a few hundred (maybe a thousand or two) active
connections.

-sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
"scott.marlowe"
Date:
On Fri, 18 Jul 2003, Sean Chittenden wrote:

> > I have received a question via the Advocacy site and I am not
> > knowledgeable enough to answer. Can you help?
> >
> > The question is: can PostgreSQL handle between 10'000 and 40'000
> > simultaneous connections? The persone asking the question has to
> > choose between Oracle and PostgreSQL, and my guess is that they
> > would be relieved if they could go with PostgreSQL.
> >
> > Do you have any additional advice I could transmit to this person
> > about handling that many connections. I'm sure any help we can
> > provide will be an additional selling point.
>
> Actually, this begs the question: are there any "reverse DB" proxy
> servers around that people have used?  Having a reverse libpq proxy
> server would _rock_.  Some light weight multi-threaded proxy that
> relays active connections to the backend and holds idle connections
> more efficiently than PostgreSQL... well... it'd be a life saver in
> sooooo many situations.  Granted it'd have its short comings
> (connections would persist to the backend along transactions, once
> committed, the front end would "detatch" from the backend that it was
> using), but this is achitecturally similar to what MS and ORA do to
> handle gazillions of connections to a database that in reality, can
> only handle a few hundred (maybe a thousand or two) active
> connections.

I thin usogres does this.  not sure though, I haven't played with it, just
heard of it.


Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> > > I have received a question via the Advocacy site and I am not
> > > knowledgeable enough to answer. Can you help?
> > >
> > > The question is: can PostgreSQL handle between 10'000 and 40'000
> > > simultaneous connections? The persone asking the question has to
> > > choose between Oracle and PostgreSQL, and my guess is that they
> > > would be relieved if they could go with PostgreSQL.
> > >
> > > Do you have any additional advice I could transmit to this person
> > > about handling that many connections. I'm sure any help we can
> > > provide will be an additional selling point.
> >
> > Actually, this begs the question: are there any "reverse DB" proxy
> > servers around that people have used?  Having a reverse libpq proxy
> > server would _rock_.  Some light weight multi-threaded proxy that
> > relays active connections to the backend and holds idle connections
> > more efficiently than PostgreSQL... well... it'd be a life saver in
> > sooooo many situations.  Granted it'd have its short comings
> > (connections would persist to the backend along transactions, once
> > committed, the front end would "detatch" from the backend that it was
> > using), but this is achitecturally similar to what MS and ORA do to
> > handle gazillions of connections to a database that in reality, can
> > only handle a few hundred (maybe a thousand or two) active
> > connections.
>
> I thin usogres does this.  not sure though, I haven't played with it, just
> heard of it.

Hrm...

Usogres

 Usogres is Synchronizing Option for postGRESql.  That's the meaning
 of name for this system.

What is Usogres?

 Real-time Backup Utility Usogres is the system which executes
 database duplication for PostgreSQL(http://www.postgresql.org/) which
 replication wasn't realized now.  It is fundamentally different from
 replication even if it is said that it executes database duplication.
 Persistently, it is defined to be making the same database in
 real-time.


Looks like it's closer to a replication solution to me, but you may be
able to stick one of these infront of PostgreSQL and have it handle
higher numbers of connections.  ::shrug:: I may have to check it out
and see... -sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
Gianni Mariani
Date:
Sean Chittenden wrote:

>>I have received a question via the Advocacy site and I am not
>>knowledgeable enough to answer. Can you help?
>>
>>The question is: can PostgreSQL handle between 10'000 and 40'000
>>simultaneous connections? The persone asking the question has to
>>choose between Oracle and PostgreSQL, and my guess is that they
>>would be relieved if they could go with PostgreSQL.
>>
>>Do you have any additional advice I could transmit to this person
>>about handling that many connections. I'm sure any help we can
>>provide will be an additional selling point.
>>
>>
>
>Actually, this begs the question: are there any "reverse DB" proxy
>servers around that people have used?  Having a reverse libpq proxy
>server would _rock_.  Some light weight multi-threaded proxy that
>relays active connections to the backend and holds idle connections
>more efficiently than PostgreSQL... well... it'd be a life saver in
>sooooo many situations.  Granted it'd have its short comings
>(connections would persist to the backend along transactions, once
>committed, the front end would "detatch" from the backend that it was
>using), but this is achitecturally similar to what MS and ORA do to
>handle gazillions of connections to a database that in reality, can
>only handle a few hundred (maybe a thousand or two) active
>connections.
>
>

There are 1000's of references to postgresql and connection pooling.

http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql

Maybe somthing there will work.




Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> There are 1000's of references to postgresql and connection pooling.
>
> http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql
>
> Maybe somthing there will work.

Those are all application level connection pooling links.  I'm
thinking about something that's done on the database side like ORA
Listener and passes active connections back to the backend that way
it's completely transparent and applies to every libpq app regardless
of the language, application, etc.

-sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> But I'm sure that with a few tweaks to the code here and there it's
> doable, just don't expect it to work "out of the box".

I think you'd be sticking your neck out to assume that 10k concurrent
connections would perform well, even after tweaking.  I'd worry first
about whether the OS can handle 10k processes (which among other things
would probably require order-of-300k open file descriptors...).  Maybe
Solaris is built to do that but the Unixen I've dealt with would go
belly up.  After that you'd have to look at Postgres' internal issues
--- contention on access to the PROC array would probably become a
significant factor, for example, and we'd have to do some redesign to
avoid linear scans of the PROC array where possible.

I don't doubt that we could support 10k concurrent *users*, given
connection pooling of some kind.  I'm dubious about 10k concurrent
database sessions though.

            regards, tom lane

Re: Urgent: 10K or more connections

From
Jeff Davis
Date:
On Friday 18 July 2003 01:28 pm, Sean Chittenden wrote:
> > There are 1000's of references to postgresql and connection pooling.
> >
> > http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql
> >
> > Maybe somthing there will work.
>
> Those are all application level connection pooling links.  I'm
> thinking about something that's done on the database side like ORA
> Listener and passes active connections back to the backend that way
> it's completely transparent and applies to every libpq app regardless
> of the language, application, etc.
>
> -sc

Perhaps this is the answer? I've never used it, but I remember seeing ti on
freshmeat.net.

http://sqlrelay.sourceforge.net/

Regards,
    Jeff Davis



Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> > > There are 1000's of references to postgresql and connection pooling.
> > >
> > > http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pooling+postgresql
> > >
> > > Maybe somthing there will work.
> >
> > Those are all application level connection pooling links.  I'm
> > thinking about something that's done on the database side like ORA
> > Listener and passes active connections back to the backend that way
> > it's completely transparent and applies to every libpq app regardless
> > of the language, application, etc.
> >
> > -sc
>
> Perhaps this is the answer? I've never used it, but I remember
> seeing ti on freshmeat.net.
>
> http://sqlrelay.sourceforge.net/

Very cool... too bad rudiments (prerequisit) doesn't compile with
newer versions of GCC and this requires you to develop your apps using
sqlrelay's API and not libpq.... though it's better than nothing.  :)
-sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
Kris Jurka
Date:

On Fri, 18 Jul 2003, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > But I'm sure that with a few tweaks to the code here and there it's
> > doable, just don't expect it to work "out of the box".
>
> I think you'd be sticking your neck out to assume that 10k concurrent
> connections would perform well, even after tweaking.  I'd worry first
> about whether the OS can handle 10k processes (which among other things
> would probably require order-of-300k open file descriptors...).  Maybe
> Solaris is built to do that but the Unixen I've dealt with would go
> belly up.  After that you'd have to look at Postgres' internal issues
> --- contention on access to the PROC array would probably become a
> significant factor, for example, and we'd have to do some redesign to
> avoid linear scans of the PROC array where possible.
>

This page describes all the problems and strategies a web server would use
to handle 10k concurrent connections.  This is the kind of thing that can
bring an otherwise performant OS to it's knees.  And this is just to grab
some data off disk and shovel it out over HTTP, consider how much more
work a database must do.

http://www.kegel.com/c10k.html

Kris Jurka


Re: Urgent: 10K or more connections

From
Greg Stark
Date:
Sean Chittenden <sean@chittenden.org> writes:

> Some light weight multi-threaded proxy that
> relays active connections to the backend and holds idle connections
> more efficiently than PostgreSQL...

What excuse is there for postgres connections being heavyweight to begin with?
The only real resource they ought to represent is a single TCP connection.
Servers that manage 10,000 TCP connections are a dime a dozen these days.

Any database context that has to be stored for the connection, the state of
binary/text or autocommit mode or whatever, will have to be maintained by any
pooling interface anyways. And I think both of those examples are now much
cleaner more or less stateless per-request flags anyways.

Basically what I'm asking is, hypothetically, if postgres were implemented
using threads instead of processes, are there any per-connection resources
that really couldn't be completely disposed of when the connection was
completely idle between (ie at the start of) transactions?

Ideally if every per-connection resource could be completely disposed of
whenever the connection was completely idle then you wouldn't need a whole
extra layer for the communication to traverse and a whole extra layer of
complexity for the protocol semantics to be maintained. A multithreaded server
could easily handle 10k-40k mostly idle connections without any unusual
resource needs.


--
greg

Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> > > But I'm sure that with a few tweaks to the code here and there
> > > it's doable, just don't expect it to work "out of the box".
> >
> > I think you'd be sticking your neck out to assume that 10k
> > concurrent connections would perform well, even after tweaking.
> > I'd worry first about whether the OS can handle 10k processes
> > (which among other things would probably require order-of-300k
> > open file descriptors...).  Maybe Solaris is built to do that but
> > the Unixen I've dealt with would go belly up.  After that you'd
> > have to look at Postgres' internal issues --- contention on access
> > to the PROC array would probably become a significant factor, for
> > example, and we'd have to do some redesign to avoid linear scans
> > of the PROC array where possible.
>
> This page describes all the problems and strategies a web server
> would use to handle 10k concurrent connections.  This is the kind of
> thing that can bring an otherwise performant OS to it's knees.  And
> this is just to grab some data off disk and shovel it out over HTTP,
> consider how much more work a database must do.
>
> http://www.kegel.com/c10k.html

*lightning strikes/apple falls on head*

Whoa!  It's tough, sure, but _far_ from impossible.  My bread and
butter is big web __fill_in_the_blank__ and I routinely handle ~60K to
some of my web server _instances_.  Kegel's page and analysis are a
bit dated and most of it's still true and applicable.  Using kqueue(2)
on FreeBSD, it's pretty easy to have bazillions of concurrent
connections and maintain low latency rates when identifying processes
that are ready to be worked on (avoiding select(2) is _required_ to
get above a few thousand).  On Linux or Slowaris, poll(2) can be
substituted for kqueue(2) and on other OSes that are less fortunate,
select(2) will suffice and no one would be the wiser (except for in
the scalability dept.).

With OSes that allow passing of FD's between existing processes (iirc
PostgreSQL fork()'s with the connection, it doesn't pass FD's around)
and making use of a given platform's alternatives to select(2), it's
very plausible to imagine a world where a backend hands an idle
connection back to the parent process for safe keeping/process load
balancing.  Arguably, that function call should've been added to the
most recent libpq(3) update that way the feature could be added to the
backend and libs wouldn't have to be updated in order for the feature
to be available.

Now that I think about it, this is significantly easier to accomplish
than adding mmap(2) to the backend (~500-1,000 lines of code) ...  To
preserve ABI compatibility, a persistent flag would likely be set to a
PGconn (ex: PQpersistConnection(PGconn *conn, bool value)) or an
environment variable/symlink file (ex: /etc/malloc.conf) could be used
to globally apply this to all libpq apps.  When an app is done with a
backend, the backend calls PQfinish() as per normal, except instead of
closing the connection, it would send a message to the backend
informing it that the client is done with the connection for now.  On
the DB, the postmaster would kill off the given backend, and launch X
number of idle backends for the given database that the now idle
connection thinks its connected to.  In doing this, when the
connection gets used again and the most expensive parts of the DB
connection process is already done (fork()ing, populating the backend,
and if using SSL, going through the asymmetric crypto routines -
they're killer on machines without off a hardware backed /dev/crypto).
For web farms with many diverse applications that can't make use of a
database connection pooling app, this'd be dynamite.  When a process
dies, the connection would finally close in reality and the backend
postmaster would reap the connection.

Using kqueue(2), it's very plausible that a PostgreSQL instance could
handle about ~50K idle connections.  poll(2) could do probably about
~10K connections without any real degradation in performance.  Those
stuck with select(2), you'd be lucky to get beyond a few hundred idle.
If you're unfortunate enough to be apart of the Win32 crowd...
*smack* why are you running a high volume DB server on Win32?  You
should know better.


This is hugely valuable to me now...  hrm, guess I have my new summer
project.

-sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> > Some light weight multi-threaded proxy that relays active
> > connections to the backend and holds idle connections more
> > efficiently than PostgreSQL...
>
> What excuse is there for postgres connections being heavyweight to
> begin with?  The only real resource they ought to represent is a
> single TCP connection.  Servers that manage 10,000 TCP connections
> are a dime a dozen these days.
>
> Any database context that has to be stored for the connection, the
> state of binary/text or autocommit mode or whatever, will have to be
> maintained by any pooling interface anyways. And I think both of
> those examples are now much cleaner more or less stateless
> per-request flags anyways.
>
> Basically what I'm asking is, hypothetically, if postgres were
> implemented using threads instead of processes, are there any
> per-connection resources that really couldn't be completely disposed
> of when the connection was completely idle between (ie at the start
> of) transactions?
>
> Ideally if every per-connection resource could be completely
> disposed of whenever the connection was completely idle then you
> wouldn't need a whole extra layer for the communication to traverse
> and a whole extra layer of complexity for the protocol semantics to
> be maintained. A multithreaded server could easily handle 10k-40k
> mostly idle connections without any unusual resource needs.

PostgreSQL will never be single proc, multi-threaded, and I don't
think it should be for reliability's sake.  See my above post,
however, as I think I may have a better way to handle "lots of
connections" without using threads.  -sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
Gianni Mariani
Date:
Sean Chittenden wrote:

>PostgreSQL will never be single proc, multi-threaded, and I don't
>think it should be for reliability's sake.  See my above post,
>however, as I think I may have a better way to handle "lots of
>connections" without using threads.  -sc
>

never is a VERY long time ...  Also, the single proc/multiple proc thing
does not have to be exclusive.  Meaning you could "tune" the system so
that it could do either.

I have developed a single process server that handled thousands of
connections.  I've also developed a single process database (a while
back) that handled multiple connections but I'm not sure I would do it
the "hard" way again as the cost of writing the code for keeping context
was not insignificant, although there are much better ways of doing it
than how I did it 15 years ago.

What you talk about is very fundamental and I would love to have another
go at it ....  however you're right that this won't happen any time
soon.  Connection pooling is a fundamentally flawed way of overcoming
this problem.  A different design could render a significantly higher
feasable connection count.

G



Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> >PostgreSQL will never be single proc, multi-threaded, and I don't
> >think it should be for reliability's sake.  See my above post,
> >however, as I think I may have a better way to handle "lots of
> >connections" without using threads.  -sc
>
> never is a VERY long time ...  Also, the single proc/multiple proc
> thing does not have to be exclusive.  Meaning you could "tune" the
> system so that it could do either.

True.  This topic has come up a zillion times in the past though.  The
memory segmentation and reliability that independent processes give
you is huge and the biggest reason why _if_ PostgreSQL does
spontaneously wedge itself (like MySQL does all too often), you're
only having to cope with a single DB connection being corrupt,
invalid, etc.  Imagine a threaded model where the process was horked
and you loose 1000 connections worth of data in a SEGV.  *shudder*
Unix is reliable at the cost of memory segmentation... something that
I dearly believe in.  If that weren't worth anything, then I'd run
everything in kernel and avoid the context switching, which is pretty
expensive.

> I have developed a single process server that handled thousands of
> connections.  I've also developed a single process database (a while
> back) that handled multiple connections but I'm not sure I would do
> it the "hard" way again as the cost of writing the code for keeping
> context was not insignificant, although there are much better ways
> of doing it than how I did it 15 years ago.

Not saying it's not possible, just that at this point, reliability is
more paramount than handling additional connections.  With copy on
write VM's being abundant these days, a lot of the size that you see
with PostgreSQL is shared.  Memory profiling and increasing the number
of read only pages would be an extremely interesting exercise that
could yield some slick results in terms of reducing the memory foot
print of PG's children.

> What you talk about is very fundamental and I would love to have
> another go at it ....  however you're right that this won't happen
> any time soon.  Connection pooling is a fundamentally flawed way of
> overcoming this problem.  A different design could render a
> significantly higher feasable connection count.

Surprisingly, it's not that complex at least handling a large number
of FDs and figuring out which ones have data on them and need to be
passed to a backend.  I'm actually using the model for monitoring FD's
from thttpd and reapplying bits where appropriate.  It's abstraction
of kqueue()/poll()/select() is nice enough to not want to reinvent the
wheel (same with its license).  Hopefully ripping through the incoming
data and figuring out which backend pool to send a connection to won't
be that bad, but I have next to no experience with writing that kind
of code and my Stevens is hidden away in one of 23 boxes from a move
earlier this month.  I only know that Apache 1.3 does this with
obviously huge success on basically every *nix so it can't be too
hard.

-sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> it's very plausible to imagine a world where a backend hands an idle
> connection back to the parent process for safe keeping/process load
> balancing.

And your current database, user authorization, prepared statements,
SET values, cached plpgsql plans, etc etc go where exactly?

The notion that a Postgres session can be replaced by a lightweight
object is just not workable IMHO; we've developed far too many features
that require persistent state on the backend side.

For applications that don't need those features (or, more realistically,
want the same persistent state for all transactions they engage in),
client-side connection pooling solves the problem.  It seems very
unlikely that apps that are too diverse to share a client-side pool
would be able to share a backend session if only the connection
mechanism were a bit different.

            regards, tom lane

Re: Urgent: 10K or more connections

From
Gianni Mariani
Date:
Sean Chittenden wrote:

>>>PostgreSQL will never be single proc, multi-threaded, and I don't
>>>think it should be for reliability's sake.  See my above post,
>>>however, as I think I may have a better way to handle "lots of
>>>connections" without using threads.  -sc
>>>
>>>
>>never is a VERY long time ...  Also, the single proc/multiple proc
>>thing does not have to be exclusive.  Meaning you could "tune" the
>>system so that it could do either.
>>
>>
>
>True.  This topic has come up a zillion times in the past though.  The
>memory segmentation and reliability that independent processes give
>you is huge and the biggest reason why _if_ PostgreSQL does
>spontaneously wedge itself (like MySQL does all too often), you're
>only having to cope with a single DB connection being corrupt,
>invalid, etc.  Imagine a threaded model where the process was horked
>and you loose 1000 connections worth of data in a SEGV.  *shudder*
>Unix is reliable at the cost of memory segmentation... something that
>I dearly believe in.  If that weren't worth anything, then I'd run
>everything in kernel and avoid the context switching, which is pretty
>expensive.
>
>
Yep, but if you design it right, you can have both.  A rare occasion
where you can have the cake and eat it too.

>>I have developed a single process server that handled thousands of
>>connections.  I've also developed a single process database (a while
>>back) that handled multiple connections but I'm not sure I would do
>>it the "hard" way again as the cost of writing the code for keeping
>>context was not insignificant, although there are much better ways
>>of doing it than how I did it 15 years ago.
>>
>>
>
>Not saying it's not possible, just that at this point, reliability is
>more paramount than handling additional connections.  With copy on
>write VM's being abundant these days, a lot of the size that you see
>with PostgreSQL is shared.  Memory profiling and increasing the number
>of read only pages would be an extremely interesting exercise that
>could yield some slick results in terms of reducing the memory foot
>print of PG's children.
>
>
Context switching and cache thrashing are the killers in a multiple
process model.  There is a 6-10x performance penalty for running in
separate processes vs running in a single process (and single thread)
which I observed when doing benchmarking on a streaming server.  Perhaps
a better scheduler (like the O(1) scheduler in Linux 2.6.* would improve
that but I just don't know.

>>What you talk about is very fundamental and I would love to have
>>another go at it ....  however you're right that this won't happen
>>any time soon.  Connection pooling is a fundamentally flawed way of
>>overcoming this problem.  A different design could render a
>>significantly higher feasable connection count.
>>
>>
>
>Surprisingly, it's not that complex at least handling a large number
>of FDs and figuring out which ones have data on them and need to be
>passed to a backend.  I'm actually using the model for monitoring FD's
>from thttpd and reapplying bits where appropriate.  It's abstraction
>of kqueue()/poll()/select() is nice enough to not want to reinvent the
>wheel (same with its license).  Hopefully ripping through the incoming
>data and figuring out which backend pool to send a connection to won't
>be that bad, but I have next to no experience with writing that kind
>of code and my Stevens is hidden away in one of 23 boxes from a move
>earlier this month.  I only know that Apache 1.3 does this with
>obviously huge success on basically every *nix so it can't be too
>hard.
>
>
No epoll ?




Re: Urgent: 10K or more connections

From
Sean Chittenden
Date:
> > it's very plausible to imagine a world where a backend hands an
> > idle connection back to the parent process for safe
> > keeping/process load balancing.
>
> And your current database, user authorization, prepared statements,
> SET values, cached plpgsql plans, etc etc go where exactly?

No where, everything remains as is.  I actually think you'll
appreciate the simplicity of this once I'm done explaining how I'm
going about this.

I'm tweaking the way that ServerLoop(), pq_close()/proc_exit(), and
PQfinish() work so that the backend will pass the FD of the connection
back to the postmaster before dying.  Once the backend is dead/while
dying, the postmaster will fire up a new backend (or three, GUC
configurable) of the same database, but doesn't pass the FD to the new
backend until an FD is ready to do work.  fork(), in theory, is done
before a connection is initiated.  I'm hoping to move as much of the
backend initialization to happen before the FD is passed to the
backend that way the time between a client making a connection and a
backend being ready to serve the request is as small as possible.

I've broken this down into a few parts to make things more palatable
though, see the end of the email for details.

> The notion that a Postgres session can be replaced by a lightweight
> object is just not workable IMHO; we've developed far too many
> features that require persistent state on the backend side.

:) Sure it is, hear me out.  I never thought I'd blend the concepts
from Apache and thttpd in a database, of all places.  I do in my own
webservers, but... well, it never even occurred to me to apply this to
PostgreSQL.

> For applications that don't need those features (or, more
> realistically, want the same persistent state for all transactions
> they engage in), client-side connection pooling solves the problem.
> It seems very unlikely that apps that are too diverse to share a
> client-side pool would be able to share a backend session if only
> the connection mechanism were a bit different.

On my network, I have C progs, Ruby, Perl, PHP, a few JDBC connections
(*puke*) all competing for database resources, many inside of Apache,
many outside of Apache in the form of agents.  Believe me, nipping
this problem at the libpq end of things is the way to go.  Java's a
lost cause in terms of wanting any performance, so I don't care if my
JDBC users have to wait as long as they are now for a backend to fire
up.


The way that I've broken things down into phases.  Here's what I'd
like to do in each phase:

Phase I: Connection pooling

      a) Tweak ServerLoop() and postmaster startup so that it has a
     realistic connection.  On select(2) it's 32, on poll(2) it's
     the max number of FD's allowed per proc, and
     kqueue(2)... well, the sky is the limit.  This is all
     correctly bounded by a process's resource limits and the
     kernel's limits.  I'm about 40% done with this.  I've
     finished the connection pool and have provided generic
     wrappers around select(), poll(), and kqueue().  The next
     thing I need to do is tweak ServerLoop() so that any
     connections in the idle connection pool are handed off to a
     backend.  Handling of new connections isn't going to change
     right now.

      b) Change invocations of pq_close() over to a new function
         called pq_handoff() if a connection is marked as persistent.
         pq_handoff() passes the FD back to the postmaster then
         proceeds to die.  pq_handoff() is only called when PQfinish()
         is called by the client.  I need to make sure that the client
         sends something when it calls PQfinish(), but I'm 90% sure it
         must having looked at just the backend code (Tom, could you
         give a 'yeah or 'neah on this if my assertion is right?).  In
         this step, tweak libpq() so that it's possible to mark a
         connection as persistent.  A global mechanism will be
         available in the form of both an environment variable
         (LIBPQPERSIST) or a symlink file that get's readlink()'ed
         (ex: ln -s 'p=dbname' /etc/libpq.conf).

      d) Ensure that a local UNIX socket is in use/alive in a
         protected area for the sake of passing FD's around.  Sticking
         this in the $PGDATA/data directory would be wise to prevent
         other users on a system from stealing FD's (which is pretty
         rare and requires massive hacker foo).  See the send(2),
         sendto(2), and sendmsg(2) API for details.

      e) #ifdef everything so that it won't ever work on Win32 and can
         be turned off/on at configure time.  At this point, unless
         I've missed a feature that OpenSSL provides to aid with this,
         I'm pretty sure that connection passing will not work with
         SSL connections (for now) as you'd have to pass the
         connection's state back to the postmaster.

At this point, everything is well contained and at the _very_ least
persistent clients get to avoid a TCP connection setup/tear down.  New
connections get handled identically as they are now: only an extra bit
of code seeing if there are any connections with data on them is added
to the current flow of things.  I may end up tweaking the way the
backend handles listening for new connections, however, and may
replace it with the above abstracted bits.  kqueue()/poll() is just
sooo much more efficient than select() and when listening in a
non-blocking way and bouncing back and forth between the two, it could
amount to a bit of savings in terms of # of system calls and reduce
connection startup latency for people on reasonably modern OSes.

Phase II: Preemptive backend pools

      a) The postmaster gets schooled on pools of backend processes.
     I'm undecided about how to handle process pools, however.
     Part of me thinks that the backend should pre-init itself for
     a given database, and wait for its FD to be passed to it for
     auth.  By having it already init'ed for a given db, startup
     times will drop further.  Problem is, how do you do this on
     DB servers with lots of different DBs?  Some of the DBs
     (template1 comes to mind) should never have pools of procs
     waiting, but some should.  I'd like to have this kind of a
     config stuffed into the backend in a system catalog,
     actually, but I'm leery of doing so without guidance from
     someone with ueber knowledge of Pg's internals, which leads
     me to the alternative: have a bunch of procs waiting around,
     but not init'ed to any given DB.  Certainly the simpler
     approach and may be what I settle on for now.  Opening the
     can of worms for sticking configuration bits in a system
     catalog isn't something I'm interested in playing with for
     the time being (though the idea is really appealing to me).

      b) BackendFork() code gets split up into a few pieces to handle
         not having a connection up front.  Splitting it into two
         functions, BackendFork() and BackendInit() will probably be
         sufficient.

Phase III: Beautification

      a) Clean things up so that SSL connections work with persistent
         connections.  By far and away the most expensive part of SSL
         connections is the asymmetric key handling and it'd be really
         great if persistent connections could only have to worry
         about symmetric crypto, which is vastly cheaper.

      b) Other cleanup that I'm sure Tom will point out along the way.

And that's about it.  Phase II and Phase I could be done
independently.  Phase III I'm leaving as a misc catch all.  That's my
analysis of what needs to be done.  The connection pooling bit isn't
that bad, but it's also the part that's the most strait forward and
the bits that I'm quite familiar with.  Phase II is a bit more murky
about and I'll probably have a few questions about when I get there.

Comments?  The whole point of this is to be able to handle large
numbers of connections and reduce the startup time for each connection
if its persistent by having an already established TCP connection as
well as an already fork()'ed backend (and hopefully initialized for a
given DB) waiting to serve an active connection.

-sc

--
Sean Chittenden

Re: Urgent: 10K or more connections

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> No where, everything remains as is.  I actually think you'll
> appreciate the simplicity of this once I'm done explaining how I'm
> going about this.

I don't think you heard a word I said :-(

You appear to be adding a whole lot of platform dependency (passing
FDs around other than by fork() is not portable) in return for loss
of functionality (applications can't rely on session state anymore)
and highly dubious performance gain (just because a backend has started
does not mean it's built up a reasonable working set of cache entries;
you can't expect that firing off a new backend for every transaction
is going to be anything but a huge performance loss, even if you assume
its visible session state is exactly what the application needs).

Also it sounds to me like the postmaster will now become a performance
bottleneck, since it will need to be involved in every transaction start.

You're welcome to pursue this if you like, but I'll bet lunch that
you are wasting your time.

            regards, tom lane