Re: Urgent: 10K or more connections - Mailing list pgsql-general

From Sean Chittenden
Subject Re: Urgent: 10K or more connections
Date
Msg-id 20030719092556.GG24507@perrin.int.nxad.com
Whole thread Raw
In response to Re: Urgent: 10K or more connections  (Gianni Mariani <gianni@mariani.ws>)
Responses Re: Urgent: 10K or more connections
List pgsql-general
> >PostgreSQL will never be single proc, multi-threaded, and I don't
> >think it should be for reliability's sake.  See my above post,
> >however, as I think I may have a better way to handle "lots of
> >connections" without using threads.  -sc
>
> never is a VERY long time ...  Also, the single proc/multiple proc
> thing does not have to be exclusive.  Meaning you could "tune" the
> system so that it could do either.

True.  This topic has come up a zillion times in the past though.  The
memory segmentation and reliability that independent processes give
you is huge and the biggest reason why _if_ PostgreSQL does
spontaneously wedge itself (like MySQL does all too often), you're
only having to cope with a single DB connection being corrupt,
invalid, etc.  Imagine a threaded model where the process was horked
and you loose 1000 connections worth of data in a SEGV.  *shudder*
Unix is reliable at the cost of memory segmentation... something that
I dearly believe in.  If that weren't worth anything, then I'd run
everything in kernel and avoid the context switching, which is pretty
expensive.

> I have developed a single process server that handled thousands of
> connections.  I've also developed a single process database (a while
> back) that handled multiple connections but I'm not sure I would do
> it the "hard" way again as the cost of writing the code for keeping
> context was not insignificant, although there are much better ways
> of doing it than how I did it 15 years ago.

Not saying it's not possible, just that at this point, reliability is
more paramount than handling additional connections.  With copy on
write VM's being abundant these days, a lot of the size that you see
with PostgreSQL is shared.  Memory profiling and increasing the number
of read only pages would be an extremely interesting exercise that
could yield some slick results in terms of reducing the memory foot
print of PG's children.

> What you talk about is very fundamental and I would love to have
> another go at it ....  however you're right that this won't happen
> any time soon.  Connection pooling is a fundamentally flawed way of
> overcoming this problem.  A different design could render a
> significantly higher feasable connection count.

Surprisingly, it's not that complex at least handling a large number
of FDs and figuring out which ones have data on them and need to be
passed to a backend.  I'm actually using the model for monitoring FD's
from thttpd and reapplying bits where appropriate.  It's abstraction
of kqueue()/poll()/select() is nice enough to not want to reinvent the
wheel (same with its license).  Hopefully ripping through the incoming
data and figuring out which backend pool to send a connection to won't
be that bad, but I have next to no experience with writing that kind
of code and my Stevens is hidden away in one of 23 boxes from a move
earlier this month.  I only know that Apache 1.3 does this with
obviously huge success on basically every *nix so it can't be too
hard.

-sc

--
Sean Chittenden

pgsql-general by date:

Previous
From: Gianni Mariani
Date:
Subject: Re: Urgent: 10K or more connections
Next
From: Ron Johnson
Date:
Subject: Re: What about a comp.databases.postgresql usenet