Thread: PostgreSQL pre-fork speedup

PostgreSQL pre-fork speedup

From
sdv mailer
Date:
I had lots of trouble posting so you may receive this
more than once. My apologies..
------------------------------

Hi,

I know the issue of pre-fork PostgreSQL has been
discussed previously.
Someone mentionned pre-fork can be implemented when
schemas become available
in PostgreSQL because there will be less of the need
to run multiple
databases.

I think Oracle 7 uses pre-forking and it helps speed
up the startup time
considerably. Often, there are cases where connection
pooling or persistent
connection cannot be used efficiently (e.g. replicated
or splitted databases
over hundreds of machines or where persistent
connection opens up too many
idle connections). Instead, there's a big need to
create a new connection on
every query and with PostgreSQL needing to fork on
every incoming connection
can be quite slow.

This could be a big win since even a moderate
improvement at the connection
level will affect almost every user. Any chance of
that happening for 7.5?

Thanks.



    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Paul Ramsey
Date:
sdv mailer wrote:

> Instead, there's a big need to
> create a new connection on
> every query and with PostgreSQL needing to fork on
> every incoming connection
> can be quite slow.

Really? My general experience has beent that forking/connection setup 
times are very good with PgSQL. Do not assume your Oracle experience 
transfers directly over -- Oracle has very large connection time 
overheads, PgSQL does not.

> This could be a big win since even a moderate
> improvement at the connection
> level will affect almost every user. Any chance of
> that happening for 7.5?

Only if you do it yourself, probably. The calculation of the developers 
appears to be that the amount of time spent by the database on 
fork/connect will generally be dwarfed by the amount of time spent by 
the database actually doing work (this being a database, the actual 
workloads required of the backend are much higher than, say, for a web 
server). So the operational benefit of adding the complexity of a 
pre-fork system is not very high. And if you have the rare workload 
where a pre-fork actually *would* speed things up a great deal, you can 
solve the problem yourself with a connection-pooling middleware.

--       __     /     | Paul Ramsey     | Refractions Research     \_


Re: PostgreSQL pre-fork speedup

From
Tom Lane
Date:
Paul Ramsey <pramsey@refractions.net> writes:
> ... So the operational benefit of adding the complexity of a 
> pre-fork system is not very high.

In particular, most of the connection startup overhead work cannot be
performed until we've identified which database to connect to (since
it largely consists of bootstrapping access to the system catalogs
in that database).  If you want that work to be done in advance of
receiving a client connection request, life is much more complicated
than it would be for something like Apache.

There is considerable discussion of this point in the pgsql-hackers
archives.  I'd suggest reading the past threads before trying to do
anything yourself.
        regards, tom lane


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Forking consumes a large amount of CPU when you have
many simultaneous connections and adds up to the
latency. Particularly MySQL users may think
PostgreSQL's connection time is much slower because
these users tend to perform relatively simple queries.

In my case, connection pooling and persistent
connection is useless for a large server farm
consisting of hundreds of partitioned and replicated
servers doing only simple queries.

Below is a benchmark of MySQL 3.2 and PostgreSQL 7.4
doing multiple connects/disconnects within the same
server (AMD 1.2GHz, 512MB, Linux 2.4). If forking is
the issue then pre-forking will give a big boost
especially for simple queries:

MySQL time
----------
0.012786865234375
0.011546850204468
0.01167106628418

<?php
$time_start = getmicrotime();
for ($i = 0; $i < 20; $i++) {       $DBH = mysql_connect('127.0.0.1');       mysql_select_db('test1');
mysql_close($DBH);
}
$Time = getmicrotime() - $time_start;
?>

MySQL time (with simple query)
------------------------------
0.015650987625122
0.01443886756897
0.014433860778809

<?php
$time_start = getmicrotime();
for ($i = 0; $i < 20; $i++) {       $DBH = mysql_connect('127.0.0.1');       mysql_select_db('test1');$Res =
mysql_query('SELECT* FROM table1 WHERE id =
 
1', $DBH);       mysql_close($DBH);
}
$Time = getmicrotime() - $time_start;
?>


PostgreSQL time
---------------
0.15319013595581
0.14930582046509
0.14920592308044

<?php
$time_start = getmicrotime();
for ($i = 0; $i < 20; $i++) {       $DBH = pg_connect('dbname=test1
host=127.0.0.1');       pg_close($DBH);
}
$Time = getmicrotime() - $time_start;
?>

PostgreSQL time (with simple query)
------------------------------------
0.19016313552856
0.18785095214844
0.18786096572876

<?php
$time_start = getmicrotime();
for ($i = 0; $i < 20; $i++) {       $DBH = pg_connect('dbname=test1
host=127.0.0.1');$Res = pg_query($DBH, 'SELECT * FROM table1 WHERE id
= 1');       pg_close($DBH);
}
$Time = getmicrotime() - $time_start;
?>






    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Greg Stark
Date:
sdv mailer <sdvmailer@yahoo.com> writes:

> Forking consumes a large amount of CPU when you have
> many simultaneous connections and adds up to the
> latency. Particularly MySQL users may think
> PostgreSQL's connection time is much slower because
> these users tend to perform relatively simple queries.

Frankly I think part of the reason you'll get little traction on this front is
that some people consider an application that puts building a database
connection into a critical path, especially one that does only a single simple
query and disconnects, a stupid design.

If it's a connection time is critical element it's trivial to move it outside
the critical path and reuse connections. You'll get much better performance
that way as well since both Postgres and Linux will have more time to observe
how the process behaves and adjust cache and schedule behaviour.

> In my case, connection pooling and persistent
> connection is useless for a large server farm
> consisting of hundreds of partitioned and replicated
> servers doing only simple queries.

Well have you tried either? It would involve having a lot more idle
connections but then that's effectively the same as "preforking" anyways. Not
only would they be "preforked" but they wouldn't have to be shut down and
recreated repeatedly at all, even outside the critical path.

If the idle connections consume too much memory to be feasible (like, say, if
you have thousands of database servers but only a small unpredictable subset
of which are busy at any time) then "preforking" wouldn't really help much
either and suffer the same problem.

-- 
greg



Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
We used to run persistent connection until the DB
servers got maxed out because of too many idle
connections sucking up all the memory. Web servers run
different loads than database servers and persistent
connections are notorious for crashing your DB.

Connection pooling (eg. SQLRelay) didn't work either
because we needed to connect to hundreds of DB servers
from each web server. Imagine having 200+ open
connections on the web server and how many more of
these connections remain idle. The situation gets
worse when you multiply by an even greater number of
web servers connected to all these database servers.
Do the math! We're talking large server farm here, not
2 or 3 machines. 

Saving that X ms can be substantial for large number
of simultaneous connections and shouldn't be
neglected, otherwise why have persistent connection or
connection pooling in the first place. Imagine every
query uses up that X ms of time just for
connecting/forking. It adds up to a lot from
experience.

I think pre-forking can be beneficial and is a lot
simpler than to rewrite a multi-threaded DB server.
Pre-forking would not consume as much memory as
persistent connections because it scales with the
database load and NOT with the web server load. I'm
guessing pre-forking will benefit more on systems
where launching a new process is expensive (Win32,
certain UNIXes).

Here's a snippet from one of the Apache's conferences:

"Traditionally TCP/IP servers fork a new child to
handle incoming requests from clients. However, in the
situation of a busy web site, the overhead of forking
a huge number of children will simply suffocate the
server. As a consequence, Apache uses a different
technique. It forks a fixed number of children right
from the beginning. The children service incoming
requests independently, using different address
spaces. Apache can dynamically control the number of
children it forks based on current load. This design
has worked well and proved to be both reliable and
efficient; one of its best features is that the server
can survive the death of children and is also
reliable. It is also more efficient than the canonical
UNIX model of forking a new child for every request."

Beside solving my own problems, having a pre-fork
solution will benefit PostgreSQL too. MySQL is
reputated for having a fast connection and people know
it because you cannot avoid simple queries (e.g.
counters, session retrieval, etc). The truth of the
matter is many people still operate on
connect/query/disconnect model running simple queries
and if you can satisfy these people then it can be a
big marketing win for PostgreSQL. 

Many web hosting companies out there don't allow
persistent connection, which is where MySQL shines.
Over and over again, we hear people say how MySQL is
fast for the Web because it can connect and execute
simple queries quickly. Take for instance
http://www-css.fnal.gov/dsg/external/freeware/pgsql-vs-mysql.html

"MySQL handles connections very fast, thus making it
suitable to use MySQL for Web - if you have hundreds
of CGIs connecting/disconnecting all the time you'd
like to avoid long startup procedures."

and
http://www-css.fnal.gov/dsg/external/freeware/Repl_mysql_vs_psql.html

"MySQL handles connections and simple SELECTs very
fast."

Likely, PostgreSQL is just as fast but if people don't
see that on the first try running a simple query, then
MySQL already won the war when it comes to speed.

Other benchmark I came across:

http://www.randomnetworks.com/joseph/blog/?eid=101





    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
"Peter Galbavy"
Date:
sdv mailer wrote:
> We used to run persistent connection until the DB
> servers got maxed out because of too many idle
> connections sucking up all the memory. Web servers run
> different loads than database servers and persistent
> connections are notorious for crashing your DB.

And this translates from your experiences with mysql to postgresql ? You
haven't made it clear which platforms and what level of concurrent
connections gave you this behaviour. Tom Lane has already explained that
most of the connection time is probably used in configuring the connection
based on the database required etc.

> Connection pooling (eg. SQLRelay) didn't work either
> because we needed to connect to hundreds of DB servers
> from each web server. Imagine having 200+ open
> connections on the web server and how many more of
> these connections remain idle. The situation gets
> worse when you multiply by an even greater number of
> web servers connected to all these database servers.
> Do the math! We're talking large server farm here, not
> 2 or 3 machines.

And "preforking" makes this different, how ? Perhaps having a pool of
processes ready to be handed a query to a specific database, where you
configure N connections to db1, M to db2 etc. still means lots of resource
usage. In effect a preforked database server *is* an idle connection, just
without the TCP establishment and teardown sequence which is negligable on
modern platforms - and even if it were not negligable, it would be
effectively identical regardless of the chosen DB platform.

> I think pre-forking can be beneficial and is a lot
> simpler than to rewrite a multi-threaded DB server.

This is open source, feel free to do a proof on concept (or pay someone to
do a proof of concept), run the numbers and see if your assertions work for
real. Many others here with more experience than myself of running thousands
of connections at once don't appear to think so. My limited expereience with
many hundreds of "idle" connections is that it is not particularly taxing at
all on any even semi-modern hardware (PIII/512MB etc).

Peter



Re: PostgreSQL pre-fork speedup

From
Rod Taylor
Date:
> And "preforking" makes this different, how ? Perhaps having a pool of
> processes ready to be handed a query to a specific database, where you
> configure N connections to db1, M to db2 etc. still means lots of resource
> usage. In effect a preforked database server *is* an idle connection, just
> without the TCP establishment and teardown sequence which is negligable on
> modern platforms - and even if it were not negligable, it would be
> effectively identical regardless of the chosen DB platform.

In theory, it should drastically reduce the number of idle connections
for poor connection pooling on the other end.

The problem are pools for Apache that establish 1 connection per Apache
backend. 100 Apache backends means 100 backend connections (50 of which
may be idle as not all pages use the database). Multiply that by 40
webservers and you have a real mess of idle connections.

Cutting that count down to 10 idlers in total by having PostgreSQL
prefork a specific database would make a significant difference.

The other (preferable) alternative is to convince Apache to use a common
connection pool per server rather than per Apache backend.




Re: PostgreSQL pre-fork speedup

From
Andrew Sullivan
Date:
On Mon, May 03, 2004 at 11:59:45PM -0700, sdv mailer wrote:
> 
> Connection pooling (eg. SQLRelay) didn't work either
> because we needed to connect to hundreds of DB servers
> from each web server. Imagine having 200+ open
> connections on the web server and how many more of
> these connections remain idle. The situation gets

This sounds like a case where you probably ought to be using schema
support instead of many different databases, for the record.  I don't
see how pre forking is going to help you at all, because a connection
is to a database, so you're going to have to pick one, and it's
likely as not to be the wrong one.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
I'm talking about connecting to multiple database
servers on separate machines. Schemas don't apply
here.

How much work would it take to make a pre-fork smart
enough to open different databases on incoming
connection? How much of it can be modeled after
Apache?





    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Pre-fork does not equal to idle connections! Pre-fork
scales with database load where as persistent
connections scales with webserver load. A web server
that is heavily loaded but not necessarily performing
a lot of database activity will spawn hundreds of idle
database connections using persistent connection. With
pre-fork, you can potentially lower this down to even
10 open connections. 

Forking is quite fast on Linux but creating a new
process is still 10x more expensive than creating a
thread and is even worse on Win32 platform. CPU load
goes up because the OS needs to allocate/deallocate
memory making it difficult to get a steady state
resource consumption.

More importantly, solving the forking delay will have
a big impact on people's mind who have been given the
impression that forking is very very slow. Here's what
one site has to say about PostgreSQL's forking:

http://www.geocities.com/mailsoftware42/db/

"Postgres forks on every incoming connection - and the
forking process and backend setup is a bit slow, but
one can speed up PostgreSQL by coding things as stored
procedures"

Pre-fork will give MySQL one less argument to throw at
PostgreSQL. 

I think optimizing is this area will speed up the
general case for everyone rather than optimizing a
feature that affects 10% of the users. On top of that,
it will make a strong marketing case because forking
will no longer become a speed issue when compared to
MySQL.






    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Greg Stark
Date:
Rod Taylor <pg@rbt.ca> writes:

> Cutting that count down to 10 idlers in total by having PostgreSQL
> prefork a specific database would make a significant difference.

Well it would be 10 for each database. Since as has been pointed out before
loading the database is most of the delay.

If that's enough why not just run 10 apache processes instead of 100? 

I'm assuming the static non-database driven content is already separated onto
other servers. In which case running 100 apache processes, most of which are
idle is the source of the problem.

-- 
greg



Re: PostgreSQL pre-fork speedup

From
Richard Huxton
Date:
sdv mailer wrote:
> I'm talking about connecting to multiple database
> servers on separate machines. Schemas don't apply
> here.
> 
> How much work would it take to make a pre-fork smart
> enough to open different databases on incoming
> connection? How much of it can be modeled after
> Apache?
> 

I've not used it but Tatsuo Ishii has just released pgpool v1.0. Quoting 
from its README:

1. What is pgpool
   pgpool is a connection server program for PostgreSQL. pgpool runs   between PostgreSQL's client(frontend) and
server(backend).Any   PostgreSQL clients can connect to pgpool as if it's a real   PostgreSQL server.
 
   pgpool caches the connection to PostgreSQL server to reduce the   overhead to establish the connection to it.
   Also pgpool could use two PostgreSQL servers for fail over   purpose. If the first server goes down, pgpool will
automatically  switch to the secondary server.
 

If that's not what you're after, then it must be fairly close. Check the 
mailing list archives - the download URL is on the "announce" list April 
2004.

I'm sure any real figures from your testing will be of much interest to 
all of us.

HTH
--   Richard Huxton  Archonet Ltd


Re: PostgreSQL pre-fork speedup

From
Rod Taylor
Date:
On Wed, 2004-05-05 at 11:57, Greg Stark wrote:
> Rod Taylor <pg@rbt.ca> writes:
> 
> > Cutting that count down to 10 idlers in total by having PostgreSQL
> > prefork a specific database would make a significant difference.
> 
> Well it would be 10 for each database. Since as has been pointed out before
> loading the database is most of the delay.
> 
> If that's enough why not just run 10 apache processes instead of 100? 

Because then we would need 10 times as many servers ;)

> I'm assuming the static non-database driven content is already separated onto
> other servers. In which case running 100 apache processes, most of which are
> idle is the source of the problem.

Most of it has been. It's the duty cycle. As stated in another email,
only about 20% of the work a script does is database related -- which
occurs all at one time. Even when all Apache backends are active, a
large number of connections will be idle but were used or will be used
at some point during the generation of that page.

It really is an Apache fault -- but I don't think it can be fixed within Apache itself.




Re: PostgreSQL pre-fork speedup

From
Rod Taylor
Date:
> Or, you run several seperate Apache webservers. The ones that serve static 
> content or don't need database connections don't run with the ones that do. 
> And just like each idle Apache process uses memory and other resources, 
> each idle PostgreSQL connection does to. So managing the number of Apache 

Considered that, but it doesn't help much. The duty cycle of any given
page is about 20% database, 80% webserver work. So at any given time 80%
of the connections to the database will be idle in a best case scenario.

If Apache did decent connection pooling or PostgreSQL gave us a hand
then a given webserver would need 1/4 of the connections which could be
internally shared.

Page 1 start
Page 1 DB connect

Page 1 DB disconnect
.
. <IDLE persistent connection as work happens>
.
Page 1 transmit results

If we could really disconnect from the database and not suffer high
re-connection overhead OR have Apache recognize the connection is unused
and allow another Apache backend to use it there would not be a problem.

> It all comes down to management, which Apache does a reasonable job of.

> If you really believe that you are right and I am wrong, then prove it. I'll 
> be happy to be shown the error of my thinking (and see an improvement to 
> PostgreSQL in the process).

You wouldn't run into a problem like this on a system with good
connection pooling. JBoss comes to mind, once a connection is free it is
available to other threads to use. AOL Server is a webserver which
demonstrates proper connection pooling.

Apache is the problem we're trying to work around. It does everything
per backend, rather than having a common pool for the server. That can
be fixed by improving PostgreSQL or by doing something (I'm not sure
what) with apache.





Re: PostgreSQL pre-fork speedup

From
Andrew Dunstan
Date:
sdv mailer wrote:

[snip]

>Pre-fork will give MySQL one less argument to throw at
>PostgreSQL. 
>
>I think optimizing is this area will speed up the
>general case for everyone rather than optimizing a
>feature that affects 10% of the users. On top of that,
>it will make a strong marketing case because forking
>will no longer become a speed issue when compared to
>MySQL.
>
>  
>

So when can we expect to see your proof of concept code and benchmarks 
to show the speedup achieved?

cheers

andrew


Re: PostgreSQL pre-fork speedup

From
Jonathan Gardner
Date:
On Wednesday 05 May 2004 07:24 am, Rod Taylor wrote:
> > And "preforking" makes this different, how ? Perhaps having a pool of
> > processes ready to be handed a query to a specific database, where you
> > configure N connections to db1, M to db2 etc. still means lots of
> > resource usage. In effect a preforked database server *is* an idle
> > connection, just without the TCP establishment and teardown sequence
> > which is negligable on modern platforms - and even if it were not
> > negligable, it would be effectively identical regardless of the chosen
> > DB platform.
>
> In theory, it should drastically reduce the number of idle connections
> for poor connection pooling on the other end.
>

If the client is poorly written, nothing on the server side can really 
prevent them from being poorly written.

> The problem are pools for Apache that establish 1 connection per Apache
> backend. 100 Apache backends means 100 backend connections (50 of which
> may be idle as not all pages use the database). Multiply that by 40
> webservers and you have a real mess of idle connections.
>

Or, you run several seperate Apache webservers. The ones that serve static 
content or don't need database connections don't run with the ones that do. 
And just like each idle Apache process uses memory and other resources, 
each idle PostgreSQL connection does to. So managing the number of Apache 
connections so that there aren't too many or too few solves the problem of 
having too many or too few idle database connections. This is all stuff 
that I personally have managed and planned for, and it is quite easy to do 
without any connection pooling on the server side.

It all comes down to management, which Apache does a reasonable job of. 
Either we duplicate the efforts of Apache (they are non-trivial), or we 
piggy-back on their success. And who's to say that the right solution for 
Apache is the right solution for another application? Are we going to 
implement a different flavor of management for each kind of application?

I suggest you implement server-side connection pooling and see for yourself:

(a) How much overhead there is for configuration (which databases? How many 
idle?)

(b) How much easier it is to do on the client side after all.

If you really believe that you are right and I am wrong, then prove it. I'll 
be happy to be shown the error of my thinking (and see an improvement to 
PostgreSQL in the process).

That's the great thing about Open Source. We can all talk the talk, but it 
comes down to whoever actually walks the walk. In the proprietary world, no 
one gets a chance to walk the walk.

-- 
Jonathan Gardner
jgardner@jonathangardner.net


Re: PostgreSQL pre-fork speedup

From
Steve Atkins
Date:
On Wed, May 05, 2004 at 02:12:58PM -0400, Rod Taylor wrote:
> 
> Most of it has been. It's the duty cycle. As stated in another email,
> only about 20% of the work a script does is database related -- which
> occurs all at one time. Even when all Apache backends are active, a
> large number of connections will be idle but were used or will be used
> at some point during the generation of that page.
> 
> It really is an Apache fault -- but I don't think it can be fixed within Apache itself.

http://apache.webthing.com/
 mod_pg_pool or mod_valet_sql - Apache modules to handle postgresql connection pools

http://sqlrelay.sourceforge.net/
http://dbbalancer.sourceforge.net/
 Database connection pooling software

And, of course, most development environments (perl, php, java etc)
have their own language specific connection pooling solutions.

Cheers, Steve 


Re: PostgreSQL pre-fork speedup

From
"scott.marlowe"
Date:
On Wed, 5 May 2004, sdv mailer wrote:

> Forking is quite fast on Linux but creating a new
> process is still 10x more expensive than creating a
> thread and is even worse on Win32 platform. CPU load
> goes up because the OS needs to allocate/deallocate
> memory making it difficult to get a steady state
> resource consumption.

Just a nit to pick here.  In Linux, the difference between forking and 
spawning a new thread is almost nothing.  Definitely less than a factor of 
2, and most assuredly less than the quoted factor of 10 here.

The fact that windows has a heavy process / lightweight thread design 
means little to me, since I'll likely never deploy a production postgresql 
server on it that needs to handle any serious load.



Re: PostgreSQL pre-fork speedup

From
Stephen Frost
Date:
* Steve Atkins (steve@blighty.com) wrote:
> On Wed, May 05, 2004 at 02:12:58PM -0400, Rod Taylor wrote:
> >
> > Most of it has been. It's the duty cycle. As stated in another email,
> > only about 20% of the work a script does is database related -- which
> > occurs all at one time. Even when all Apache backends are active, a
> > large number of connections will be idle but were used or will be used
> > at some point during the generation of that page.
> >
> > It really is an Apache fault -- but I don't think it can be fixed within Apache itself.
>
> http://apache.webthing.com/
>
>   mod_pg_pool or mod_valet_sql - Apache modules to handle postgresql
>   connection pools

Looks like what we need are functions in PHP or something which use the
functions provided by these apache modules, if they don't exist already
(as far as I know they don't?).  Or whatever language it is that he's
using.
Stephen

Re: PostgreSQL pre-fork speedup

From
"Joshua D. Drake"
Date:
>
> The fact that windows has a heavy process / lightweight thread design
> means little to me, since I'll likely never deploy a production postgresql
> server on it that needs to handle any serious load.

Yes but Solaris also has a heavy process / lightweight thread design.

J



>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: PostgreSQL pre-fork speedup

From
Rod Taylor
Date:
> And, of course, most development environments (perl, php, java etc)
> have their own language specific connection pooling solutions.

Yes, the one for php is what I was thinking of when I made my statement.
They work on a per backend basis as Apache does not allow for the type
of communication between processes that would otherwise be required. A
connection created by Apache backend A cannot be used by Apache backend
B.

Java is an example where it is done well, but the language decision was
made long before I joined the firm.

I cannot tell if mod_pg_pool works across Apache forked backends or is
still bound to a single process. They state it is intended for sharing
connections across modules, so it is probably still backend specific.



Re: PostgreSQL pre-fork speedup

From
"scott.marlowe"
Date:
On Wed, 5 May 2004, Rod Taylor wrote:

> > And, of course, most development environments (perl, php, java etc)
> > have their own language specific connection pooling solutions.
> 
> Yes, the one for php is what I was thinking of when I made my statement.
> They work on a per backend basis as Apache does not allow for the type
> of communication between processes that would otherwise be required. A
> connection created by Apache backend A cannot be used by Apache backend
> B.
> 
> Java is an example where it is done well, but the language decision was
> made long before I joined the firm.
> 
> I cannot tell if mod_pg_pool works across Apache forked backends or is
> still bound to a single process. They state it is intended for sharing
> connections across modules, so it is probably still backend specific.

Have you looked at sqlrealy.sourceforge.net?  IT looks like it might do 
what you need.



Re: PostgreSQL pre-fork speedup

From
Rod Taylor
Date:
> > I cannot tell if mod_pg_pool works across Apache forked backends or is
> > still bound to a single process. They state it is intended for sharing
> > connections across modules, so it is probably still backend specific.
> 
> Have you looked at sqlrealy.sourceforge.net?  IT looks like it might do 
> what you need.

SQL Relay (and friends) do what I'm looking for in a round about way.

If you put it onto the webservers it would help -- but it would require
deployment of additional webservers to accommodate the increased load.
That can be accomplished if it helps drop the load on the DB machine.
But still uses resources unnecessarily.

I've not looked at sqlrelay but most of these things use a different
interface. That would work with the inhouse code but puts a damper on
the commercial software.

As a temporary step these types of things help. But it's still doesn't
really fix the problem of Apache not using real connection pooling.



Re: PostgreSQL pre-fork speedup

From
Andrew Dunstan
Date:

Rod Taylor wrote:

>
>As a temporary step these types of things help. But it's still doesn't
>really fix the problem of Apache not using real connection pooling.
>
>  
>

Rod,

In principle, this should not be enormously hard to do - at least for 
Unix where the methods of handing off file handles between processes are 
fairly well known ( I have no idea if this is even possible on Windows).

Maybe you'd like to start a pgFoundry project to do it? It would be a 
great feather in the postgresql cap, and I think it's well worth doing.

cheers

andrew





Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Forking is expensive on many systems. Linux is a bit
better but still expensive compared to threads. On
Windows, creating process is much more expensive than
on Linux. Check this benchmark:

http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html

Forking shouldn't be taken lightly as free thing.
There are pros and cons. The general trend is going
towards threads, but that's a different issue.


--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> On Wed, 5 May 2004, sdv mailer wrote:
> 
> > Forking is quite fast on Linux but creating a new
> > process is still 10x more expensive than creating
> a
> > thread and is even worse on Win32 platform. CPU
> load
> > goes up because the OS needs to
> allocate/deallocate
> > memory making it difficult to get a steady state
> > resource consumption.
> 
> Just a nit to pick here.  In Linux, the difference
> between forking and 
> spawning a new thread is almost nothing.  Definitely
> less than a factor of 
> 2, and most assuredly less than the quoted factor of
> 10 here.
> 
> The fact that windows has a heavy process /
> lightweight thread design 
> means little to me, since I'll likely never deploy a
> production postgresql 
> server on it that needs to handle any serious load.
> 


    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
I've already tried pooling (SQLRelay) and persistent
connection (PHP). They may work for other people but
they do not work for me. I have already separated
static from database driven codes but you can never
balance web server load with database server load.
Pre-fork scales with database load and not with web
server load. This point is crucial.

Most people paying $5.99/mo for web hosting don't have
access to persistent connection or connection pooling
under PHP. Maybe this is why MySQL is favored among
them. I'm not saying this is my case, but there is a
general need for speedier connections. If you can
satisfy the needs of the mass, then you practically
won their vote. Currently MySQL connects 10x faster
than PostgreSQL. See my last benchmark.



    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
I'll pretend I didn't see that last comment on
Windows. I wouldn't want to disappoint the users who
are eagerly expecting the Win32 port to complete
including myself.  ;-)

Having said that, I think it's more the reason to get
a working pre-fork for Win32. Don't you think so?



--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> On Wed, 5 May 2004, sdv mailer wrote:
> 
> > Forking is quite fast on Linux but creating a new
> > process is still 10x more expensive than creating
> a
> > thread and is even worse on Win32 platform. CPU
> load
> > goes up because the OS needs to
> allocate/deallocate
> > memory making it difficult to get a steady state
> > resource consumption.
> 
> Just a nit to pick here.  In Linux, the difference
> between forking and 
> spawning a new thread is almost nothing.  Definitely
> less than a factor of 
> 2, and most assuredly less than the quoted factor of
> 10 here.
> 
> The fact that windows has a heavy process /
> lightweight thread design 
> means little to me, since I'll likely never deploy a
> production postgresql 
> server on it that needs to handle any serious load.
> 


    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
I don't think I can volunteer on this end as I am
already actively volunteering for another open
project. I was hoping someone could take up on this
since one of the last threads mentionned we don't have
something substantial to present for 7.5 if June 1 is
dateline for code freeze. Pre-fork came to mind. :-)

As for proof of concept, I think pgpool from Tatsuo
Ishii is a good indication that pre-fork works. I'll
try to see if I can generate some benchmarks using
pgpool on my Linux. 

PgPool is a server-side connection pool/load
balancer/replicator that implements pre-fork but
because it acts as a proxy there is 7% to 15% overhead
according to his README file.

http://www.mail-archive.com/pgsql-general@postgresql.org/msg44082.html




--- Andrew Dunstan <andrew@dunslane.net> wrote:
> sdv mailer wrote:
> 
> [snip]
> 
> >Pre-fork will give MySQL one less argument to throw
> at
> >PostgreSQL. 
> >
> >I think optimizing is this area will speed up the
> >general case for everyone rather than optimizing a
> >feature that affects 10% of the users. On top of
> that,
> >it will make a strong marketing case because
> forking
> >will no longer become a speed issue when compared
> to
> >MySQL.
> >
> >  
> >
> 
> So when can we expect to see your proof of concept
> code and benchmarks 
> to show the speedup achieved?
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map
settings


    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
"Andrew Dunstan"
Date:
sdv mailer said:
> Forking is expensive on many systems. Linux is a bit
> better but still expensive compared to threads. On
> Windows, creating process is much more expensive than
> on Linux. Check this benchmark:
>
> http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html
>
> Forking shouldn't be taken lightly as free thing.
> There are pros and cons. The general trend is going
> towards threads, but that's a different issue.
>
>

This article shows a 3x speedup for thread creation over fork(), not the
numbers you have quoted. Furthermore, it talks about Linux kernel 2.0.30.
Do you know how old that is? The paper itself comes from Linux Journal,
January 1999, according to the author's web site.

Argument will get you nowhere - if you want it done then do it and prove
everyone wrong.

cheers

andrew






Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Yes, I realize it's a bit old but I just wanted to
make a small point that forking is slower. It's funny
you should ask because thread creation on Linux has in
fact improved over process creation much more in 2.4
kernel.

Benchmark at IBM shows Linux 2.4 thread creation is
30x faster than process creation. Process creation on
Windows 2000 is about twice longer than process
creation on Linux. This means forking on Win32 will be
2x slower! See 2002 benchmark below:

http://www-106.ibm.com/developerworks/linux/library/l-rt7/?Open&t=grl,l=252,p=mgth

Cheers,



--- Andrew Dunstan <andrew@dunslane.net> wrote:
> sdv mailer said:
> > Forking is expensive on many systems. Linux is a
> bit
> > better but still expensive compared to threads. On
> > Windows, creating process is much more expensive
> than
> > on Linux. Check this benchmark:
> >
> >
>
http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html
> >
> > Forking shouldn't be taken lightly as free thing.
> > There are pros and cons. The general trend is
> going
> > towards threads, but that's a different issue.
> >
> >
> 
> This article shows a 3x speedup for thread creation
> over fork(), not the
> numbers you have quoted. Furthermore, it talks about
> Linux kernel 2.0.30.
> Do you know how old that is? The paper itself comes
> from Linux Journal,
> January 1999, according to the author's web site.
> 
> Argument will get you nowhere - if you want it done
> then do it and prove
> everyone wrong.
> 
> cheers
> 
> andrew
> 
> 
> 
> 
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map
settings


    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Tatsuo Ishii
Date:
> > Have you looked at sqlrealy.sourceforge.net?  IT looks like it might do 
> > what you need.
> 
> SQL Relay (and friends) do what I'm looking for in a round about way.
> 
> If you put it onto the webservers it would help -- but it would require
> deployment of additional webservers to accommodate the increased load.
> That can be accomplished if it helps drop the load on the DB machine.
> But still uses resources unnecessarily.
> 
> I've not looked at sqlrelay but most of these things use a different
> interface. That would work with the inhouse code but puts a damper on
> the commercial software.
> 
> As a temporary step these types of things help. But it's still doesn't
> really fix the problem of Apache not using real connection pooling.

You can try pgpool
(http://www2b.biglobe.ne.jp/~caco/pgpool/pgpool-1.2.tar.gz) if you
like. pgpool is a connection pool server for PostgreSQL. Pgpool
pretends as if PostgreSQL and you do not need to change applications
to use a special interface like SQL Relay.

Moreover, pgpool uses pre-fork technique to reduce overhead. Here is a
benchmark result using pgbench -S -C -c 10 -t 100 performed on my note pc:

Normal PostgreSQL 7.3.6: 25.6 TPS
with pgpool 1.2:     36.1 TPS

As you can see even with some overhead with pgpool, I got 40%
improvement over normal PostgreSQL using pgpool. In this case
PostgreSQL, pgpool and pgbench are running on same machine BTW.

Another point with pgpool is it can be run in replication mode.
--
Tatsuo Ishii


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Tatsuo,

I did some benchmark on my Linux box (AMD 1.2Ghz,
256MB, Fedora Core 1 Linux 2.4.20-8) using Pgpool 1.2
and PostgreSQL 7.4. I ran the benchmark script
repeatedly (10+ times each).

I get 5x faster using Pgpool on UNIX socket, which is
encouraging. This shows pre-fork does speed things up.

However, when I tried TCP socket, Pgpool was actually
slower by 15x !! Perhaps you can clarify why the TCP
socket is so much slower?


PHP connecting on UNIX socket
-----------------------------

Without pgpool: 0.144 sec 
With pgpool   : 0.027 sec

PHP connecting on TCP Socket
----------------------------

Without pgpool: 0.152 sec
With pgpool   : 2.39 sec

<?php
$time_start = getmicrotime();
for ($i = 0; $i < 20; $i++) {
       // With pgpool on UNIX socket       //$DBH = pg_connect('dbname=test1 port=9999
user=postgres');
       // With pgpool on TCP socket       //$DBH = pg_connect('dbname=test1
host=127.0.0.1 port=9999 user=postgres');
       // Without pgpool on UNIX socket       //$DBH = pg_connect('dbname=test1
user=postgres');
       // Without pgpool on TCP socket       //$DBH = pg_connect('dbname=test1
host=127.0.0.1 user=postgres');
       $Res = pg_exec($DBH, 'SELECT 1');       pg_close($DBH);
}
$Time = getmicrotime() - $time_start;
?>


I only changed the pgpool configuration where it says:

allow_inet_domain_socket = 1




    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Ok, I did some benchmark on my Linux box (AMD 1.2Ghz,
256MB, Fedora Core 1 Linux 2.4.20-8) using Pgpool 1.2
and PostgreSQL 7.4. I ran the benchmark script
repeatedly (10+ times each).

I get 5x faster using Pgpool on UNIX socket, which is
encouraging. This shows pre-fork does speed things up
even with the overhead incurred by the proxy.

However, when I tried TCP socket, Pgpool was actually
slower by 15x !! Tatsuo, perhaps you can clarify why
the TCP socket is so much slower?


PHP connecting on UNIX socket
-----------------------------

Without pgpool: 0.144 sec 
With pgpool   : 0.027 sec

PHP connecting on TCP Socket
----------------------------

Without pgpool: 0.152 sec
With pgpool   : 2.39 sec

<?php
$time_start = getmicrotime();
for ($i = 0; $i < 20; $i++) {
       // With pgpool on UNIX socket       //$DBH = pg_connect('dbname=test1 port=9999
user=postgres');
       // With pgpool on TCP socket       //$DBH = pg_connect('dbname=test1
host=127.0.0.1 port=9999 user=postgres');
       // Without pgpool on UNIX socket       //$DBH = pg_connect('dbname=test1
user=postgres');
       // Without pgpool on TCP socket       //$DBH = pg_connect('dbname=test1
host=127.0.0.1 user=postgres');
       $Res = pg_exec($DBH, 'SELECT 1');       pg_close($DBH);
}
$Time = getmicrotime() - $time_start;
?>


I only changed the pgpool configuration where it says:

allow_inet_domain_socket = 1




    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Adrian Phillips
Date:
>>>>> "sdv" == sdv mailer <sdvmailer@yahoo.com> writes:
   sdv> Yes, I realize it's a bit old but I just wanted to make a   sdv> small point that forking is slower. It's funny
youshould ask   sdv> because thread creation on Linux has in fact improved over   sdv> process creation much more in
2.4kernel.
 
   sdv> Benchmark at IBM shows Linux 2.4 thread creation is 30x   sdv> faster than process creation. Process creation
onWindows   sdv> 2000 is about twice longer than process creation on   sdv> Linux. This means forking on Win32 will be
2xslower! See   sdv> 2002 benchmark below:
 
   sdv> http://www-106.ibm.com/developerworks/linux/library/l-rt7/?Open&t=grl,l=252,p=mgth

Excuse me for butting in here but this shows that fork AND exec is
slower than thread creation. I was under the impression that (for 2.2
or 2.4 at least) both fork and thread creation used clone (kernel not
libc). Only when a process does an exec does the diiference show
(well, actually it seems when either process modifies its stack).

Now, saying that, I have no idea how postgresql works so will shut up.

Sincerely,

Adrian Phillips

-- 
Who really wrote the works of William Shakespeare ?
http://www.pbs.org/wgbh/pages/frontline/shakespeare/


Re: PostgreSQL pre-fork speedup

From
Rod Taylor
Date:
> However, when I tried TCP socket, Pgpool was actually
> slower by 15x !! Perhaps you can clarify why the TCP
> socket is so much slower?

How did you have pgpool configured to connect to the database? Domain
socket or tcpip?



Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Pgpool connects to PostgreSQL on UNIX socket. I also
ran on TCP socket but there is no significant
difference if I recall correctly due to the inherent
nature of connection pooling or pre-fork technology. 
;-)


--- Rod Taylor <pg@rbt.ca> wrote:
> > However, when I tried TCP socket, Pgpool was
> actually
> > slower by 15x !! Perhaps you can clarify why the
> TCP
> > socket is so much slower?
> 
> How did you have pgpool configured to connect to the
> database? Domain
> socket or tcpip?
> 


    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Bruce Momjian
Date:
sdv mailer wrote:
> Pgpool connects to PostgreSQL on UNIX socket. I also
> ran on TCP socket but there is no significant
> difference if I recall correctly due to the inherent
> nature of connection pooling or pre-fork technology. 
> ;-)

I am confused.  First you said TCP was slower, but now you say it isn't.

---------------------------------------------------------------------------


> --- Rod Taylor <pg@rbt.ca> wrote:
> > > However, when I tried TCP socket, Pgpool was
> > actually
> > > slower by 15x !! Perhaps you can clarify why the
> > TCP
> > > socket is so much slower?
> > 
> > How did you have pgpool configured to connect to the
> > database? Domain
> > socket or tcpip?
> > 
> 
> 
> 
>     
>         
> __________________________________
> Do you Yahoo!?
> Win a $20,000 Career Makeover at Yahoo! HotJobs  
> http://hotjobs.sweepstakes.yahoo.com/careermakeover 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Hi Bruce,

Sorry for the confusion because Rod asked a question
and I answered too quickly. This is what I mean.

15x Slower:
-----------
Client <--TCP--> PgPool <--UNIX--> PostgreSQL
Client <--TCP--> PgPool <--TCP--> PostgreSQL

5x Faster:
----------
Client <--UNIX--> PgPool <--UNIX--> PostgreSQL
Client <--UNIX--> PgPool <--TCP--> PostgreSQL


Hope this helps! Pgpool speeds up connection time by
5x with UNIX socket due to pre-fork and connection
pooling. However, pgpool slows down by 15x under TCP
socket for some unknown reason.



    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Bruce Momjian
Date:
sdv mailer wrote:
> Hi Bruce,
> 
> Sorry for the confusion because Rod asked a question
> and I answered too quickly. This is what I mean.
> 
> 15x Slower:
> -----------
> Client <--TCP--> PgPool <--UNIX--> PostgreSQL
> Client <--TCP--> PgPool <--TCP--> PostgreSQL
> 
> 5x Faster:
> ----------
> Client <--UNIX--> PgPool <--UNIX--> PostgreSQL
> Client <--UNIX--> PgPool <--TCP--> PostgreSQL
> 
> 
> Hope this helps! Pgpool speeds up connection time by
> 5x with UNIX socket due to pre-fork and connection
> pooling. However, pgpool slows down by 15x under TCP
> socket for some unknown reason.

How does this compare to using tcpip without pgpool?  Is it the tcp
startup/shutdown time that is hurting performance?

pgpool is using persistent connections so I don't think a difference
would show up there, but the client/pgpool connections are being
created/destroyed frequently.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL pre-fork speedup

From
James Robinson
Date:
On May 6, 2004, at 12:19 PM, sdv mailer wrote:

>
> 15x Slower:
> -----------
> Client <--TCP--> PgPool <--UNIX--> PostgreSQL
> Client <--TCP--> PgPool <--TCP--> PostgreSQL
>
> 5x Faster:
> ----------
> Client <--UNIX--> PgPool <--UNIX--> PostgreSQL
> Client <--UNIX--> PgPool <--TCP--> PostgreSQL
>

If the problem were in the TCP stack utilization itself, one would 
expect case #1 to be equivalent to case #4, since both use one UNIX 
domain connection and one TCP connection. Likewise, one would expect 
case #2 to be the worst.

Does PgPool markedly differ codewise when reading from TCP socket 
instead of UNIX domain? Pulling down code ...

----
James Robinson
Socialserve.com



Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
The speedup (UNIX) and slowdown (TCP) are both
compared against normal direct connections from Client
to PostgreSQL. This means with Pgpool (UNIX) it is 5x
faster than normal connections to PostgreSQL. It is
also 15x slower with Pgpool (TCP) compared to normal
connections to PostgreSQL.

My guess is Tatsuo will be best to answer why we see a
slowdown on the Client to PgPool using TCP. Perhaps a
bug or feature in his code. :-)

The point is pre-forking can *potentially* speed up
connections by 5x as shown in this simplistic
non-conclusive benchmark. It would be faster too
without the proxy overhead. Forking on Linux is still
a price to pay since we don't have threads but you can
make the best out of it by pre-forking a la Apache.

Theoretically, pre-forking may be faster than
threading (MySQL) because you have one less thing to
do.





    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Thomas Swan
Date:
sdv mailer wrote:

>Hi Bruce,
>
>Sorry for the confusion because Rod asked a question
>and I answered too quickly. This is what I mean.
>
>15x Slower:
>-----------
>Client <--TCP--> PgPool <--UNIX--> PostgreSQL
>Client <--TCP--> PgPool <--TCP--> PostgreSQL
>
>5x Faster:
>----------
>Client <--UNIX--> PgPool <--UNIX--> PostgreSQL
>Client <--UNIX--> PgPool <--TCP--> PostgreSQL
>
>
>Hope this helps! Pgpool speeds up connection time by
>5x with UNIX socket due to pre-fork and connection
>pooling. However, pgpool slows down by 15x under TCP
>socket for some unknown reason.
>
>
>
>  
>
Do you have SSL enabled on the postgresql server?  If you do, this would
account for the slower connect time over TCP/IP.


Re: PostgreSQL pre-fork speedup

From
Bruce Momjian
Date:
sdv mailer wrote:
> The speedup (UNIX) and slowdown (TCP) are both
> compared against normal direct connections from Client
> to PostgreSQL. This means with Pgpool (UNIX) it is 5x
> faster than normal connections to PostgreSQL. It is
> also 15x slower with Pgpool (TCP) compared to normal
> connections to PostgreSQL.

When you say compared to normal direct connections, do you mean normal
Unix connections or normal TCP connections?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
No SSL. No authentication either. Just friendly
handshakes.




    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
James Robinson
Date:
Quick overview of the code for differences in TCP-on-the-frontend code 
is a call to setsockopt(..., TCP_NODELAY, ...) if the connection to the 
frontend is a TCP socket. Could this be producing pseudo-fragmentation, 
resulting in over-the-top context switches? Looks like 
pool_process_query() does a lot of little itty bitty writes to the 
frontend filedescriptor.

What do you get if you comment out that block in child.c, around line 
372? Either a faster system or a non-working one?


----
James Robinson
Socialserve.com



Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
I compared against both TCP and UNIX direct
connections. No SSL, no authentication. See benchmark
results posted below again:

Direct
------
0.144 sec. Client <--UNIX--> PG
0.152 sec. Client <--TCP--> PG

5x Faster
---------
0.027 sec. Client <--UNIX--> Pgpool <--UNIX--> PG
0.028 sec. Client <--UNIX--> Pgpool <--TCP--> PG

15x Slower
----------
2.39  sec. Client <--TCP--> Pgpool <--UNIX--> PG
2.40  sec. Client <--TCP--> Pgpool <--TCP--> PG



    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Nope. I commented out that block of code at 372 and no
difference. 



    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
Tom Lane
Date:
sdv mailer <sdvmailer@yahoo.com> writes:
> The point is pre-forking can *potentially* speed up
> connections by 5x as shown in this simplistic
> non-conclusive benchmark.

I think this "benchmark" proves no such thing.

The thing that pgpool is doing is not preforking connections at all, but
re-using prior connections.  The important difference is that you are
using a "hot" backend that has already loaded a full working set of
relcache and syscache entries --- and not just any old entries, but
exactly those needed to process your query.  (The fact that the pgbench
test uses only a very limited set of queries probably causes this test
to overstate the effect compared to more realistic workloads.)

The profiling that I've done of backend startup shows that cache
initialization accounts for the bulk of the startup delay.  And IIRC,
I was just measuring the time needed to be ready to accept the first
query, not the additional effort to fetch query-specific cache entries.
So having a hot backend would make a significant difference, but merely
avoiding the fork wouldn't necessarily.
        regards, tom lane


Re: PostgreSQL pre-fork speedup

From
Tom Lane
Date:
James Robinson <jlrobins@socialserve.com> writes:
> Quick overview of the code for differences in TCP-on-the-frontend code 
> is a call to setsockopt(..., TCP_NODELAY, ...) if the connection to the 
> frontend is a TCP socket. Could this be producing pseudo-fragmentation, 
> resulting in over-the-top context switches?

Could be.  Although libpq and the backend both set that option, they are
both careful not to present data to the kernel at all until they have a
full buffer or need a response from the far end.  pgpool seems way too
enthusiatic about flushing after each logical message --- or even part
of a logical message in some places.  I'd expect this is presenting
nontrivial extra overhead in the Unix-socket case too (at the minimum,
more kernel calls than necessary).  But it'd really hurt in TCP if we're
sending packets with just a few bytes ...

Possibly pgpool could be taught to flush only after "significant"
messages that indicate query completion or a request for response.  At
the very least I'd get rid of the flushes associated with AsciiRow and
BinaryRow messages.  Those would be a lot of overhead during a large
select retrieval.
        regards, tom lane


Re: PostgreSQL pre-fork speedup

From
Jeff
Date:
On May 6, 2004, at 1:06 PM, sdv mailer wrote:

> I compared against both TCP and UNIX direct
> connections. No SSL, no authentication. See benchmark
> results posted below again:

I recall your script only connected 20 times - that is not enough to 
filter out "noise" in those numbers.  Please run it again, this time 
with say, 1000 connects.  That will give a more true number.

And also try running several of these in parellel.

I've been using pgpool in production with great success and it can 
drastically improve connect times.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
Tom,

You're correct about the test measuring a hot backend
and not forking. How much exactly is the "bulk of the
startup" done by cache initialization relative to the
forking? What would be the impact on Win32 knowing
that process creation is twice as slow than on Linux? 



    
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 


Re: PostgreSQL pre-fork speedup

From
James Robinson
Date:
Tom Lane writes:... too much flushing ...

I agree. I'll bet replacing the pool_write_and_flush() calls in 
BinaryRow()
and AsciiRow() with just pool_write(), followed by removing the
fflush() calls at the bottom of those two methods should go a long
way towards fixing things, since the CompleteCommandResponse
handler method ends with a call to pool_write_and_flush(), and
you've pretty much gotta get a CompleteCommand message
trailing all of those rows.

----
James Robinson
Socialserve.com



Re: PostgreSQL pre-fork speedup

From
"scott.marlowe"
Date:
On Thu, 6 May 2004, Tom Lane wrote:

> sdv mailer <sdvmailer@yahoo.com> writes:
> > The point is pre-forking can *potentially* speed up
> > connections by 5x as shown in this simplistic
> > non-conclusive benchmark.
> 
> I think this "benchmark" proves no such thing.
> 
> The thing that pgpool is doing is not preforking connections at all, but
> re-using prior connections.  The important difference is that you are
> using a "hot" backend that has already loaded a full working set of
> relcache and syscache entries --- and not just any old entries, but
> exactly those needed to process your query.  (The fact that the pgbench
> test uses only a very limited set of queries probably causes this test
> to overstate the effect compared to more realistic workloads.)
> 
> The profiling that I've done of backend startup shows that cache
> initialization accounts for the bulk of the startup delay.  And IIRC,
> I was just measuring the time needed to be ready to accept the first
> query, not the additional effort to fetch query-specific cache entries.
> So having a hot backend would make a significant difference, but merely
> avoiding the fork wouldn't necessarily.

Wouldn't the db selection / authentication be more / as expensive as 
buffer creation?  Even in trust mode the backend still has to 
"authenticate" it just doesn't have to do as much to do that as with 
passwords.  I'd expect that to be a big chunk of time too.

It appears the best place to fix this "problem" (not a problem with 
postgresql, but an engineering problem in an abstract sense) is with 
pooling, and once the flushing etc... in tatsuo's code is fixed up to be 
zippy, pgpool would be THE answer for such issues.



Re: PostgreSQL pre-fork speedup

From
Tatsuo Ishii
Date:
> Hi Bruce,
> 
> Sorry for the confusion because Rod asked a question
> and I answered too quickly. This is what I mean.
> 
> 15x Slower:
> -----------
> Client <--TCP--> PgPool <--UNIX--> PostgreSQL
> Client <--TCP--> PgPool <--TCP--> PostgreSQL
> 
> 5x Faster:
> ----------
> Client <--UNIX--> PgPool <--UNIX--> PostgreSQL
> Client <--UNIX--> PgPool <--TCP--> PostgreSQL
> 
> 
> Hope this helps! Pgpool speeds up connection time by
> 5x with UNIX socket due to pre-fork and connection
> pooling. However, pgpool slows down by 15x under TCP
> socket for some unknown reason.

It appeared that the cause of TCP socket slowness was in reading the
startup packet which is performed by read_startup_packet(). I did some
measurement for the function and it showed huge difference between
UNIX and TCP sockets. Times (in micro sec) for 100 call to
read_startup_packet() are:

UNIX socket: 623
TCP socket:  6086

As you can see TCP is nearly 10 times slower than UNIX socket. In the
function there are 2 read()s to process the startup packet. I think I
could enhance pool_read() so that it reduces the call to read() as
little as possible...
--
Tatsuo Ishii


Re: PostgreSQL pre-fork speedup

From
Tatsuo Ishii
Date:
> > Sorry for the confusion because Rod asked a question
> > and I answered too quickly. This is what I mean.
> > 
> > 15x Slower:
> > -----------
> > Client <--TCP--> PgPool <--UNIX--> PostgreSQL
> > Client <--TCP--> PgPool <--TCP--> PostgreSQL
> > 
> > 5x Faster:
> > ----------
> > Client <--UNIX--> PgPool <--UNIX--> PostgreSQL
> > Client <--UNIX--> PgPool <--TCP--> PostgreSQL
> > 
> > 
> > Hope this helps! Pgpool speeds up connection time by
> > 5x with UNIX socket due to pre-fork and connection
> > pooling. However, pgpool slows down by 15x under TCP
> > socket for some unknown reason.
> 
> It appeared that the cause of TCP socket slowness was in reading the
> startup packet which is performed by read_startup_packet(). I did some
> measurement for the function and it showed huge difference between
> UNIX and TCP sockets. Times (in micro sec) for 100 call to
> read_startup_packet() are:
> 
> UNIX socket: 623
> TCP socket:  6086
> 
> As you can see TCP is nearly 10 times slower than UNIX socket. In the
> function there are 2 read()s to process the startup packet. I think I
> could enhance pool_read() so that it reduces the call to read() as
> little as possible...

I think I have improved the TCP performance of pgpool. With my quick
testing, in the follwing case pgpool is at least as fast as
PostgreSQL(client directly connected to PostgreSQL).

> > 15x Slower:
> > -----------
> > Client <--TCP--> PgPool <--UNIX--> PostgreSQL
> > Client <--TCP--> PgPool <--TCP--> PostgreSQL

The latest version can be obtained from:

http://www2b.biglobe.ne.jp/~caco/pgpool/pgpool-1.2.2.tar.gz
--
Tatsuo Ishii


Re: PostgreSQL pre-fork speedup

From
sdv mailer
Date:
I ran the new Pgpool-1.2.2 and it was a bit faster on
the TCP but still slower than on UNIX socket. I used
the same script as before.

TCP Socket (Pgpool 1.2.0)
----------
2.39 sec

TCP Socket (Pgpool 1.2.2)
----------
0.80 sec
0.80 sec
0.79 sec

UNIX Socket (Pgpool 1.2.2)
-----------
0.026 sec
0.027 sec
0.027 sec

Direct TCP connection (no pgpool)
---------------------------------
0.16 sec
0.15 sec
0.16 sec


PgPool on TCP is still slower than direct connection
but much faster than v1.2. Any other areas that can be
improved?

Regards,


    
__________________________________
Do you Yahoo!?
Yahoo! Movies - Buy advance tickets for 'Shrek 2'
http://movies.yahoo.com/showtimes/movie?mid=1808405861 


Re: PostgreSQL pre-fork speedup

From
Tatsuo Ishii
Date:
> I ran the new Pgpool-1.2.2 and it was a bit faster on
> the TCP but still slower than on UNIX socket. I used
> the same script as before.
> 
> TCP Socket (Pgpool 1.2.0)
> ----------
> 2.39 sec
> 
> TCP Socket (Pgpool 1.2.2)
> ----------
> 0.80 sec
> 0.80 sec
> 0.79 sec
> 
> UNIX Socket (Pgpool 1.2.2)
> -----------
> 0.026 sec
> 0.027 sec
> 0.027 sec
> 
> Direct TCP connection (no pgpool)
> ---------------------------------
> 0.16 sec
> 0.15 sec
> 0.16 sec
> 
> 
> PgPool on TCP is still slower than direct connection
> but much faster than v1.2. Any other areas that can be
> improved?

This is strange. Using pgbench(pgbench -S -C -t 1000 -h localhost),
TCP socket with pgpool 1.2.2 runs about x2 faster than direct
connection.

Direct connection: 60TPS
With pgpool:       122TPS

Here is the set up:

Direct connection: pgbench <--TCP-->PG
With pgpool:       pgbench <--TCP-->pgpool<--UNIX-->PG

Note: I use PostgreSQL 7.4.2. This means that pgpool forces pgbench to
fallback to V2 protocol (remember that pgpool does not support V3
yet), and the start up packet flys on the wire twice at the each
connection statge. This actually makes the benchmark worse, still
pgpool is better than direct connection however.
--
Tatsuo Ishii