Re: PostgreSQL pre-fork speedup - Mailing list pgsql-hackers

From sdv mailer
Subject Re: PostgreSQL pre-fork speedup
Date
Msg-id 20040504065945.11335.qmail@web60205.mail.yahoo.com
Whole thread Raw
In response to Re: PostgreSQL pre-fork speedup  (Greg Stark <gsstark@mit.edu>)
Responses Re: PostgreSQL pre-fork speedup  (Andrew Sullivan <ajs@crankycanuck.ca>)
List pgsql-hackers
We used to run persistent connection until the DB
servers got maxed out because of too many idle
connections sucking up all the memory. Web servers run
different loads than database servers and persistent
connections are notorious for crashing your DB.

Connection pooling (eg. SQLRelay) didn't work either
because we needed to connect to hundreds of DB servers
from each web server. Imagine having 200+ open
connections on the web server and how many more of
these connections remain idle. The situation gets
worse when you multiply by an even greater number of
web servers connected to all these database servers.
Do the math! We're talking large server farm here, not
2 or 3 machines. 

Saving that X ms can be substantial for large number
of simultaneous connections and shouldn't be
neglected, otherwise why have persistent connection or
connection pooling in the first place. Imagine every
query uses up that X ms of time just for
connecting/forking. It adds up to a lot from
experience.

I think pre-forking can be beneficial and is a lot
simpler than to rewrite a multi-threaded DB server.
Pre-forking would not consume as much memory as
persistent connections because it scales with the
database load and NOT with the web server load. I'm
guessing pre-forking will benefit more on systems
where launching a new process is expensive (Win32,
certain UNIXes).

Here's a snippet from one of the Apache's conferences:

"Traditionally TCP/IP servers fork a new child to
handle incoming requests from clients. However, in the
situation of a busy web site, the overhead of forking
a huge number of children will simply suffocate the
server. As a consequence, Apache uses a different
technique. It forks a fixed number of children right
from the beginning. The children service incoming
requests independently, using different address
spaces. Apache can dynamically control the number of
children it forks based on current load. This design
has worked well and proved to be both reliable and
efficient; one of its best features is that the server
can survive the death of children and is also
reliable. It is also more efficient than the canonical
UNIX model of forking a new child for every request."

Beside solving my own problems, having a pre-fork
solution will benefit PostgreSQL too. MySQL is
reputated for having a fast connection and people know
it because you cannot avoid simple queries (e.g.
counters, session retrieval, etc). The truth of the
matter is many people still operate on
connect/query/disconnect model running simple queries
and if you can satisfy these people then it can be a
big marketing win for PostgreSQL. 

Many web hosting companies out there don't allow
persistent connection, which is where MySQL shines.
Over and over again, we hear people say how MySQL is
fast for the Web because it can connect and execute
simple queries quickly. Take for instance
http://www-css.fnal.gov/dsg/external/freeware/pgsql-vs-mysql.html

"MySQL handles connections very fast, thus making it
suitable to use MySQL for Web - if you have hundreds
of CGIs connecting/disconnecting all the time you'd
like to avoid long startup procedures."

and
http://www-css.fnal.gov/dsg/external/freeware/Repl_mysql_vs_psql.html

"MySQL handles connections and simple SELECTs very
fast."

Likely, PostgreSQL is just as fast but if people don't
see that on the first try running a simple query, then
MySQL already won the war when it comes to speed.

Other benchmark I came across:

http://www.randomnetworks.com/joseph/blog/?eid=101





    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


pgsql-hackers by date:

Previous
From: mike g
Date:
Subject: Re: I need Help
Next
From: Fabien COELHO
Date:
Subject: Re: inconsistent owners in newly created databases?