Thread: PostgreSQL pre-fork speedup
I had lots of trouble posting so you may receive this more than once. My apologies.. ------------------------------ Hi, I know the issue of pre-fork PostgreSQL has been discussed previously. Someone mentionned pre-fork can be implemented when schemas become available in PostgreSQL because there will be less of the need to run multiple databases. I think Oracle 7 uses pre-forking and it helps speed up the startup time considerably. Often, there are cases where connection pooling or persistent connection cannot be used efficiently (e.g. replicated or splitted databases over hundreds of machines or where persistent connection opens up too many idle connections). Instead, there's a big need to create a new connection on every query and with PostgreSQL needing to fork on every incoming connection can be quite slow. This could be a big win since even a moderate improvement at the connection level will affect almost every user. Any chance of that happening for 7.5? Thanks. __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer wrote: > Instead, there's a big need to > create a new connection on > every query and with PostgreSQL needing to fork on > every incoming connection > can be quite slow. Really? My general experience has beent that forking/connection setup times are very good with PgSQL. Do not assume your Oracle experience transfers directly over -- Oracle has very large connection time overheads, PgSQL does not. > This could be a big win since even a moderate > improvement at the connection > level will affect almost every user. Any chance of > that happening for 7.5? Only if you do it yourself, probably. The calculation of the developers appears to be that the amount of time spent by the database on fork/connect will generally be dwarfed by the amount of time spent by the database actually doing work (this being a database, the actual workloads required of the backend are much higher than, say, for a web server). So the operational benefit of adding the complexity of a pre-fork system is not very high. And if you have the rare workload where a pre-fork actually *would* speed things up a great deal, you can solve the problem yourself with a connection-pooling middleware. -- __ / | Paul Ramsey | Refractions Research \_
Paul Ramsey <pramsey@refractions.net> writes: > ... So the operational benefit of adding the complexity of a > pre-fork system is not very high. In particular, most of the connection startup overhead work cannot be performed until we've identified which database to connect to (since it largely consists of bootstrapping access to the system catalogs in that database). If you want that work to be done in advance of receiving a client connection request, life is much more complicated than it would be for something like Apache. There is considerable discussion of this point in the pgsql-hackers archives. I'd suggest reading the past threads before trying to do anything yourself. regards, tom lane
Forking consumes a large amount of CPU when you have many simultaneous connections and adds up to the latency. Particularly MySQL users may think PostgreSQL's connection time is much slower because these users tend to perform relatively simple queries. In my case, connection pooling and persistent connection is useless for a large server farm consisting of hundreds of partitioned and replicated servers doing only simple queries. Below is a benchmark of MySQL 3.2 and PostgreSQL 7.4 doing multiple connects/disconnects within the same server (AMD 1.2GHz, 512MB, Linux 2.4). If forking is the issue then pre-forking will give a big boost especially for simple queries: MySQL time ---------- 0.012786865234375 0.011546850204468 0.01167106628418 <?php $time_start = getmicrotime(); for ($i = 0; $i < 20; $i++) { $DBH = mysql_connect('127.0.0.1'); mysql_select_db('test1'); mysql_close($DBH); } $Time = getmicrotime() - $time_start; ?> MySQL time (with simple query) ------------------------------ 0.015650987625122 0.01443886756897 0.014433860778809 <?php $time_start = getmicrotime(); for ($i = 0; $i < 20; $i++) { $DBH = mysql_connect('127.0.0.1'); mysql_select_db('test1');$Res = mysql_query('SELECT* FROM table1 WHERE id = 1', $DBH); mysql_close($DBH); } $Time = getmicrotime() - $time_start; ?> PostgreSQL time --------------- 0.15319013595581 0.14930582046509 0.14920592308044 <?php $time_start = getmicrotime(); for ($i = 0; $i < 20; $i++) { $DBH = pg_connect('dbname=test1 host=127.0.0.1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ?> PostgreSQL time (with simple query) ------------------------------------ 0.19016313552856 0.18785095214844 0.18786096572876 <?php $time_start = getmicrotime(); for ($i = 0; $i < 20; $i++) { $DBH = pg_connect('dbname=test1 host=127.0.0.1');$Res = pg_query($DBH, 'SELECT * FROM table1 WHERE id = 1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ?> __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer <sdvmailer@yahoo.com> writes: > Forking consumes a large amount of CPU when you have > many simultaneous connections and adds up to the > latency. Particularly MySQL users may think > PostgreSQL's connection time is much slower because > these users tend to perform relatively simple queries. Frankly I think part of the reason you'll get little traction on this front is that some people consider an application that puts building a database connection into a critical path, especially one that does only a single simple query and disconnects, a stupid design. If it's a connection time is critical element it's trivial to move it outside the critical path and reuse connections. You'll get much better performance that way as well since both Postgres and Linux will have more time to observe how the process behaves and adjust cache and schedule behaviour. > In my case, connection pooling and persistent > connection is useless for a large server farm > consisting of hundreds of partitioned and replicated > servers doing only simple queries. Well have you tried either? It would involve having a lot more idle connections but then that's effectively the same as "preforking" anyways. Not only would they be "preforked" but they wouldn't have to be shut down and recreated repeatedly at all, even outside the critical path. If the idle connections consume too much memory to be feasible (like, say, if you have thousands of database servers but only a small unpredictable subset of which are busy at any time) then "preforking" wouldn't really help much either and suffer the same problem. -- greg
We used to run persistent connection until the DB servers got maxed out because of too many idle connections sucking up all the memory. Web servers run different loads than database servers and persistent connections are notorious for crashing your DB. Connection pooling (eg. SQLRelay) didn't work either because we needed to connect to hundreds of DB servers from each web server. Imagine having 200+ open connections on the web server and how many more of these connections remain idle. The situation gets worse when you multiply by an even greater number of web servers connected to all these database servers. Do the math! We're talking large server farm here, not 2 or 3 machines. Saving that X ms can be substantial for large number of simultaneous connections and shouldn't be neglected, otherwise why have persistent connection or connection pooling in the first place. Imagine every query uses up that X ms of time just for connecting/forking. It adds up to a lot from experience. I think pre-forking can be beneficial and is a lot simpler than to rewrite a multi-threaded DB server. Pre-forking would not consume as much memory as persistent connections because it scales with the database load and NOT with the web server load. I'm guessing pre-forking will benefit more on systems where launching a new process is expensive (Win32, certain UNIXes). Here's a snippet from one of the Apache's conferences: "Traditionally TCP/IP servers fork a new child to handle incoming requests from clients. However, in the situation of a busy web site, the overhead of forking a huge number of children will simply suffocate the server. As a consequence, Apache uses a different technique. It forks a fixed number of children right from the beginning. The children service incoming requests independently, using different address spaces. Apache can dynamically control the number of children it forks based on current load. This design has worked well and proved to be both reliable and efficient; one of its best features is that the server can survive the death of children and is also reliable. It is also more efficient than the canonical UNIX model of forking a new child for every request." Beside solving my own problems, having a pre-fork solution will benefit PostgreSQL too. MySQL is reputated for having a fast connection and people know it because you cannot avoid simple queries (e.g. counters, session retrieval, etc). The truth of the matter is many people still operate on connect/query/disconnect model running simple queries and if you can satisfy these people then it can be a big marketing win for PostgreSQL. Many web hosting companies out there don't allow persistent connection, which is where MySQL shines. Over and over again, we hear people say how MySQL is fast for the Web because it can connect and execute simple queries quickly. Take for instance http://www-css.fnal.gov/dsg/external/freeware/pgsql-vs-mysql.html "MySQL handles connections very fast, thus making it suitable to use MySQL for Web - if you have hundreds of CGIs connecting/disconnecting all the time you'd like to avoid long startup procedures." and http://www-css.fnal.gov/dsg/external/freeware/Repl_mysql_vs_psql.html "MySQL handles connections and simple SELECTs very fast." Likely, PostgreSQL is just as fast but if people don't see that on the first try running a simple query, then MySQL already won the war when it comes to speed. Other benchmark I came across: http://www.randomnetworks.com/joseph/blog/?eid=101 __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer wrote: > We used to run persistent connection until the DB > servers got maxed out because of too many idle > connections sucking up all the memory. Web servers run > different loads than database servers and persistent > connections are notorious for crashing your DB. And this translates from your experiences with mysql to postgresql ? You haven't made it clear which platforms and what level of concurrent connections gave you this behaviour. Tom Lane has already explained that most of the connection time is probably used in configuring the connection based on the database required etc. > Connection pooling (eg. SQLRelay) didn't work either > because we needed to connect to hundreds of DB servers > from each web server. Imagine having 200+ open > connections on the web server and how many more of > these connections remain idle. The situation gets > worse when you multiply by an even greater number of > web servers connected to all these database servers. > Do the math! We're talking large server farm here, not > 2 or 3 machines. And "preforking" makes this different, how ? Perhaps having a pool of processes ready to be handed a query to a specific database, where you configure N connections to db1, M to db2 etc. still means lots of resource usage. In effect a preforked database server *is* an idle connection, just without the TCP establishment and teardown sequence which is negligable on modern platforms - and even if it were not negligable, it would be effectively identical regardless of the chosen DB platform. > I think pre-forking can be beneficial and is a lot > simpler than to rewrite a multi-threaded DB server. This is open source, feel free to do a proof on concept (or pay someone to do a proof of concept), run the numbers and see if your assertions work for real. Many others here with more experience than myself of running thousands of connections at once don't appear to think so. My limited expereience with many hundreds of "idle" connections is that it is not particularly taxing at all on any even semi-modern hardware (PIII/512MB etc). Peter
> And "preforking" makes this different, how ? Perhaps having a pool of > processes ready to be handed a query to a specific database, where you > configure N connections to db1, M to db2 etc. still means lots of resource > usage. In effect a preforked database server *is* an idle connection, just > without the TCP establishment and teardown sequence which is negligable on > modern platforms - and even if it were not negligable, it would be > effectively identical regardless of the chosen DB platform. In theory, it should drastically reduce the number of idle connections for poor connection pooling on the other end. The problem are pools for Apache that establish 1 connection per Apache backend. 100 Apache backends means 100 backend connections (50 of which may be idle as not all pages use the database). Multiply that by 40 webservers and you have a real mess of idle connections. Cutting that count down to 10 idlers in total by having PostgreSQL prefork a specific database would make a significant difference. The other (preferable) alternative is to convince Apache to use a common connection pool per server rather than per Apache backend.
On Mon, May 03, 2004 at 11:59:45PM -0700, sdv mailer wrote: > > Connection pooling (eg. SQLRelay) didn't work either > because we needed to connect to hundreds of DB servers > from each web server. Imagine having 200+ open > connections on the web server and how many more of > these connections remain idle. The situation gets This sounds like a case where you probably ought to be using schema support instead of many different databases, for the record. I don't see how pre forking is going to help you at all, because a connection is to a database, so you're going to have to pick one, and it's likely as not to be the wrong one. A -- Andrew Sullivan | ajs@crankycanuck.ca
I'm talking about connecting to multiple database servers on separate machines. Schemas don't apply here. How much work would it take to make a pre-fork smart enough to open different databases on incoming connection? How much of it can be modeled after Apache? __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
Pre-fork does not equal to idle connections! Pre-fork scales with database load where as persistent connections scales with webserver load. A web server that is heavily loaded but not necessarily performing a lot of database activity will spawn hundreds of idle database connections using persistent connection. With pre-fork, you can potentially lower this down to even 10 open connections. Forking is quite fast on Linux but creating a new process is still 10x more expensive than creating a thread and is even worse on Win32 platform. CPU load goes up because the OS needs to allocate/deallocate memory making it difficult to get a steady state resource consumption. More importantly, solving the forking delay will have a big impact on people's mind who have been given the impression that forking is very very slow. Here's what one site has to say about PostgreSQL's forking: http://www.geocities.com/mailsoftware42/db/ "Postgres forks on every incoming connection - and the forking process and backend setup is a bit slow, but one can speed up PostgreSQL by coding things as stored procedures" Pre-fork will give MySQL one less argument to throw at PostgreSQL. I think optimizing is this area will speed up the general case for everyone rather than optimizing a feature that affects 10% of the users. On top of that, it will make a strong marketing case because forking will no longer become a speed issue when compared to MySQL. __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
Rod Taylor <pg@rbt.ca> writes: > Cutting that count down to 10 idlers in total by having PostgreSQL > prefork a specific database would make a significant difference. Well it would be 10 for each database. Since as has been pointed out before loading the database is most of the delay. If that's enough why not just run 10 apache processes instead of 100? I'm assuming the static non-database driven content is already separated onto other servers. In which case running 100 apache processes, most of which are idle is the source of the problem. -- greg
sdv mailer wrote: > I'm talking about connecting to multiple database > servers on separate machines. Schemas don't apply > here. > > How much work would it take to make a pre-fork smart > enough to open different databases on incoming > connection? How much of it can be modeled after > Apache? > I've not used it but Tatsuo Ishii has just released pgpool v1.0. Quoting from its README: 1. What is pgpool pgpool is a connection server program for PostgreSQL. pgpool runs between PostgreSQL's client(frontend) and server(backend).Any PostgreSQL clients can connect to pgpool as if it's a real PostgreSQL server. pgpool caches the connection to PostgreSQL server to reduce the overhead to establish the connection to it. Also pgpool could use two PostgreSQL servers for fail over purpose. If the first server goes down, pgpool will automatically switch to the secondary server. If that's not what you're after, then it must be fairly close. Check the mailing list archives - the download URL is on the "announce" list April 2004. I'm sure any real figures from your testing will be of much interest to all of us. HTH -- Richard Huxton Archonet Ltd
On Wed, 2004-05-05 at 11:57, Greg Stark wrote: > Rod Taylor <pg@rbt.ca> writes: > > > Cutting that count down to 10 idlers in total by having PostgreSQL > > prefork a specific database would make a significant difference. > > Well it would be 10 for each database. Since as has been pointed out before > loading the database is most of the delay. > > If that's enough why not just run 10 apache processes instead of 100? Because then we would need 10 times as many servers ;) > I'm assuming the static non-database driven content is already separated onto > other servers. In which case running 100 apache processes, most of which are > idle is the source of the problem. Most of it has been. It's the duty cycle. As stated in another email, only about 20% of the work a script does is database related -- which occurs all at one time. Even when all Apache backends are active, a large number of connections will be idle but were used or will be used at some point during the generation of that page. It really is an Apache fault -- but I don't think it can be fixed within Apache itself.
> Or, you run several seperate Apache webservers. The ones that serve static > content or don't need database connections don't run with the ones that do. > And just like each idle Apache process uses memory and other resources, > each idle PostgreSQL connection does to. So managing the number of Apache Considered that, but it doesn't help much. The duty cycle of any given page is about 20% database, 80% webserver work. So at any given time 80% of the connections to the database will be idle in a best case scenario. If Apache did decent connection pooling or PostgreSQL gave us a hand then a given webserver would need 1/4 of the connections which could be internally shared. Page 1 start Page 1 DB connect Page 1 DB disconnect . . <IDLE persistent connection as work happens> . Page 1 transmit results If we could really disconnect from the database and not suffer high re-connection overhead OR have Apache recognize the connection is unused and allow another Apache backend to use it there would not be a problem. > It all comes down to management, which Apache does a reasonable job of. > If you really believe that you are right and I am wrong, then prove it. I'll > be happy to be shown the error of my thinking (and see an improvement to > PostgreSQL in the process). You wouldn't run into a problem like this on a system with good connection pooling. JBoss comes to mind, once a connection is free it is available to other threads to use. AOL Server is a webserver which demonstrates proper connection pooling. Apache is the problem we're trying to work around. It does everything per backend, rather than having a common pool for the server. That can be fixed by improving PostgreSQL or by doing something (I'm not sure what) with apache.
sdv mailer wrote: [snip] >Pre-fork will give MySQL one less argument to throw at >PostgreSQL. > >I think optimizing is this area will speed up the >general case for everyone rather than optimizing a >feature that affects 10% of the users. On top of that, >it will make a strong marketing case because forking >will no longer become a speed issue when compared to >MySQL. > > > So when can we expect to see your proof of concept code and benchmarks to show the speedup achieved? cheers andrew
On Wednesday 05 May 2004 07:24 am, Rod Taylor wrote: > > And "preforking" makes this different, how ? Perhaps having a pool of > > processes ready to be handed a query to a specific database, where you > > configure N connections to db1, M to db2 etc. still means lots of > > resource usage. In effect a preforked database server *is* an idle > > connection, just without the TCP establishment and teardown sequence > > which is negligable on modern platforms - and even if it were not > > negligable, it would be effectively identical regardless of the chosen > > DB platform. > > In theory, it should drastically reduce the number of idle connections > for poor connection pooling on the other end. > If the client is poorly written, nothing on the server side can really prevent them from being poorly written. > The problem are pools for Apache that establish 1 connection per Apache > backend. 100 Apache backends means 100 backend connections (50 of which > may be idle as not all pages use the database). Multiply that by 40 > webservers and you have a real mess of idle connections. > Or, you run several seperate Apache webservers. The ones that serve static content or don't need database connections don't run with the ones that do. And just like each idle Apache process uses memory and other resources, each idle PostgreSQL connection does to. So managing the number of Apache connections so that there aren't too many or too few solves the problem of having too many or too few idle database connections. This is all stuff that I personally have managed and planned for, and it is quite easy to do without any connection pooling on the server side. It all comes down to management, which Apache does a reasonable job of. Either we duplicate the efforts of Apache (they are non-trivial), or we piggy-back on their success. And who's to say that the right solution for Apache is the right solution for another application? Are we going to implement a different flavor of management for each kind of application? I suggest you implement server-side connection pooling and see for yourself: (a) How much overhead there is for configuration (which databases? How many idle?) (b) How much easier it is to do on the client side after all. If you really believe that you are right and I am wrong, then prove it. I'll be happy to be shown the error of my thinking (and see an improvement to PostgreSQL in the process). That's the great thing about Open Source. We can all talk the talk, but it comes down to whoever actually walks the walk. In the proprietary world, no one gets a chance to walk the walk. -- Jonathan Gardner jgardner@jonathangardner.net
On Wed, May 05, 2004 at 02:12:58PM -0400, Rod Taylor wrote: > > Most of it has been. It's the duty cycle. As stated in another email, > only about 20% of the work a script does is database related -- which > occurs all at one time. Even when all Apache backends are active, a > large number of connections will be idle but were used or will be used > at some point during the generation of that page. > > It really is an Apache fault -- but I don't think it can be fixed within Apache itself. http://apache.webthing.com/ mod_pg_pool or mod_valet_sql - Apache modules to handle postgresql connection pools http://sqlrelay.sourceforge.net/ http://dbbalancer.sourceforge.net/ Database connection pooling software And, of course, most development environments (perl, php, java etc) have their own language specific connection pooling solutions. Cheers, Steve
On Wed, 5 May 2004, sdv mailer wrote: > Forking is quite fast on Linux but creating a new > process is still 10x more expensive than creating a > thread and is even worse on Win32 platform. CPU load > goes up because the OS needs to allocate/deallocate > memory making it difficult to get a steady state > resource consumption. Just a nit to pick here. In Linux, the difference between forking and spawning a new thread is almost nothing. Definitely less than a factor of 2, and most assuredly less than the quoted factor of 10 here. The fact that windows has a heavy process / lightweight thread design means little to me, since I'll likely never deploy a production postgresql server on it that needs to handle any serious load.
* Steve Atkins (steve@blighty.com) wrote: > On Wed, May 05, 2004 at 02:12:58PM -0400, Rod Taylor wrote: > > > > Most of it has been. It's the duty cycle. As stated in another email, > > only about 20% of the work a script does is database related -- which > > occurs all at one time. Even when all Apache backends are active, a > > large number of connections will be idle but were used or will be used > > at some point during the generation of that page. > > > > It really is an Apache fault -- but I don't think it can be fixed within Apache itself. > > http://apache.webthing.com/ > > mod_pg_pool or mod_valet_sql - Apache modules to handle postgresql > connection pools Looks like what we need are functions in PHP or something which use the functions provided by these apache modules, if they don't exist already (as far as I know they don't?). Or whatever language it is that he's using. Stephen
> > The fact that windows has a heavy process / lightweight thread design > means little to me, since I'll likely never deploy a production postgresql > server on it that needs to handle any serious load. Yes but Solaris also has a heavy process / lightweight thread design. J > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
> And, of course, most development environments (perl, php, java etc) > have their own language specific connection pooling solutions. Yes, the one for php is what I was thinking of when I made my statement. They work on a per backend basis as Apache does not allow for the type of communication between processes that would otherwise be required. A connection created by Apache backend A cannot be used by Apache backend B. Java is an example where it is done well, but the language decision was made long before I joined the firm. I cannot tell if mod_pg_pool works across Apache forked backends or is still bound to a single process. They state it is intended for sharing connections across modules, so it is probably still backend specific.
On Wed, 5 May 2004, Rod Taylor wrote: > > And, of course, most development environments (perl, php, java etc) > > have their own language specific connection pooling solutions. > > Yes, the one for php is what I was thinking of when I made my statement. > They work on a per backend basis as Apache does not allow for the type > of communication between processes that would otherwise be required. A > connection created by Apache backend A cannot be used by Apache backend > B. > > Java is an example where it is done well, but the language decision was > made long before I joined the firm. > > I cannot tell if mod_pg_pool works across Apache forked backends or is > still bound to a single process. They state it is intended for sharing > connections across modules, so it is probably still backend specific. Have you looked at sqlrealy.sourceforge.net? IT looks like it might do what you need.
> > I cannot tell if mod_pg_pool works across Apache forked backends or is > > still bound to a single process. They state it is intended for sharing > > connections across modules, so it is probably still backend specific. > > Have you looked at sqlrealy.sourceforge.net? IT looks like it might do > what you need. SQL Relay (and friends) do what I'm looking for in a round about way. If you put it onto the webservers it would help -- but it would require deployment of additional webservers to accommodate the increased load. That can be accomplished if it helps drop the load on the DB machine. But still uses resources unnecessarily. I've not looked at sqlrelay but most of these things use a different interface. That would work with the inhouse code but puts a damper on the commercial software. As a temporary step these types of things help. But it's still doesn't really fix the problem of Apache not using real connection pooling.
Rod Taylor wrote: > >As a temporary step these types of things help. But it's still doesn't >really fix the problem of Apache not using real connection pooling. > > > Rod, In principle, this should not be enormously hard to do - at least for Unix where the methods of handing off file handles between processes are fairly well known ( I have no idea if this is even possible on Windows). Maybe you'd like to start a pgFoundry project to do it? It would be a great feather in the postgresql cap, and I think it's well worth doing. cheers andrew
Forking is expensive on many systems. Linux is a bit better but still expensive compared to threads. On Windows, creating process is much more expensive than on Linux. Check this benchmark: http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html Forking shouldn't be taken lightly as free thing. There are pros and cons. The general trend is going towards threads, but that's a different issue. --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Wed, 5 May 2004, sdv mailer wrote: > > > Forking is quite fast on Linux but creating a new > > process is still 10x more expensive than creating > a > > thread and is even worse on Win32 platform. CPU > load > > goes up because the OS needs to > allocate/deallocate > > memory making it difficult to get a steady state > > resource consumption. > > Just a nit to pick here. In Linux, the difference > between forking and > spawning a new thread is almost nothing. Definitely > less than a factor of > 2, and most assuredly less than the quoted factor of > 10 here. > > The fact that windows has a heavy process / > lightweight thread design > means little to me, since I'll likely never deploy a > production postgresql > server on it that needs to handle any serious load. > __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
I've already tried pooling (SQLRelay) and persistent connection (PHP). They may work for other people but they do not work for me. I have already separated static from database driven codes but you can never balance web server load with database server load. Pre-fork scales with database load and not with web server load. This point is crucial. Most people paying $5.99/mo for web hosting don't have access to persistent connection or connection pooling under PHP. Maybe this is why MySQL is favored among them. I'm not saying this is my case, but there is a general need for speedier connections. If you can satisfy the needs of the mass, then you practically won their vote. Currently MySQL connects 10x faster than PostgreSQL. See my last benchmark. __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
I'll pretend I didn't see that last comment on Windows. I wouldn't want to disappoint the users who are eagerly expecting the Win32 port to complete including myself. ;-) Having said that, I think it's more the reason to get a working pre-fork for Win32. Don't you think so? --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Wed, 5 May 2004, sdv mailer wrote: > > > Forking is quite fast on Linux but creating a new > > process is still 10x more expensive than creating > a > > thread and is even worse on Win32 platform. CPU > load > > goes up because the OS needs to > allocate/deallocate > > memory making it difficult to get a steady state > > resource consumption. > > Just a nit to pick here. In Linux, the difference > between forking and > spawning a new thread is almost nothing. Definitely > less than a factor of > 2, and most assuredly less than the quoted factor of > 10 here. > > The fact that windows has a heavy process / > lightweight thread design > means little to me, since I'll likely never deploy a > production postgresql > server on it that needs to handle any serious load. > __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
I don't think I can volunteer on this end as I am already actively volunteering for another open project. I was hoping someone could take up on this since one of the last threads mentionned we don't have something substantial to present for 7.5 if June 1 is dateline for code freeze. Pre-fork came to mind. :-) As for proof of concept, I think pgpool from Tatsuo Ishii is a good indication that pre-fork works. I'll try to see if I can generate some benchmarks using pgpool on my Linux. PgPool is a server-side connection pool/load balancer/replicator that implements pre-fork but because it acts as a proxy there is 7% to 15% overhead according to his README file. http://www.mail-archive.com/pgsql-general@postgresql.org/msg44082.html --- Andrew Dunstan <andrew@dunslane.net> wrote: > sdv mailer wrote: > > [snip] > > >Pre-fork will give MySQL one less argument to throw > at > >PostgreSQL. > > > >I think optimizing is this area will speed up the > >general case for everyone rather than optimizing a > >feature that affects 10% of the users. On top of > that, > >it will make a strong marketing case because > forking > >will no longer become a speed issue when compared > to > >MySQL. > > > > > > > > So when can we expect to see your proof of concept > code and benchmarks > to show the speedup achieved? > > cheers > > andrew > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer said: > Forking is expensive on many systems. Linux is a bit > better but still expensive compared to threads. On > Windows, creating process is much more expensive than > on Linux. Check this benchmark: > > http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html > > Forking shouldn't be taken lightly as free thing. > There are pros and cons. The general trend is going > towards threads, but that's a different issue. > > This article shows a 3x speedup for thread creation over fork(), not the numbers you have quoted. Furthermore, it talks about Linux kernel 2.0.30. Do you know how old that is? The paper itself comes from Linux Journal, January 1999, according to the author's web site. Argument will get you nowhere - if you want it done then do it and prove everyone wrong. cheers andrew
Yes, I realize it's a bit old but I just wanted to make a small point that forking is slower. It's funny you should ask because thread creation on Linux has in fact improved over process creation much more in 2.4 kernel. Benchmark at IBM shows Linux 2.4 thread creation is 30x faster than process creation. Process creation on Windows 2000 is about twice longer than process creation on Linux. This means forking on Win32 will be 2x slower! See 2002 benchmark below: http://www-106.ibm.com/developerworks/linux/library/l-rt7/?Open&t=grl,l=252,p=mgth Cheers, --- Andrew Dunstan <andrew@dunslane.net> wrote: > sdv mailer said: > > Forking is expensive on many systems. Linux is a > bit > > better but still expensive compared to threads. On > > Windows, creating process is much more expensive > than > > on Linux. Check this benchmark: > > > > > http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html > > > > Forking shouldn't be taken lightly as free thing. > > There are pros and cons. The general trend is > going > > towards threads, but that's a different issue. > > > > > > This article shows a 3x speedup for thread creation > over fork(), not the > numbers you have quoted. Furthermore, it talks about > Linux kernel 2.0.30. > Do you know how old that is? The paper itself comes > from Linux Journal, > January 1999, according to the author's web site. > > Argument will get you nowhere - if you want it done > then do it and prove > everyone wrong. > > cheers > > andrew > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
> > Have you looked at sqlrealy.sourceforge.net? IT looks like it might do > > what you need. > > SQL Relay (and friends) do what I'm looking for in a round about way. > > If you put it onto the webservers it would help -- but it would require > deployment of additional webservers to accommodate the increased load. > That can be accomplished if it helps drop the load on the DB machine. > But still uses resources unnecessarily. > > I've not looked at sqlrelay but most of these things use a different > interface. That would work with the inhouse code but puts a damper on > the commercial software. > > As a temporary step these types of things help. But it's still doesn't > really fix the problem of Apache not using real connection pooling. You can try pgpool (http://www2b.biglobe.ne.jp/~caco/pgpool/pgpool-1.2.tar.gz) if you like. pgpool is a connection pool server for PostgreSQL. Pgpool pretends as if PostgreSQL and you do not need to change applications to use a special interface like SQL Relay. Moreover, pgpool uses pre-fork technique to reduce overhead. Here is a benchmark result using pgbench -S -C -c 10 -t 100 performed on my note pc: Normal PostgreSQL 7.3.6: 25.6 TPS with pgpool 1.2: 36.1 TPS As you can see even with some overhead with pgpool, I got 40% improvement over normal PostgreSQL using pgpool. In this case PostgreSQL, pgpool and pgbench are running on same machine BTW. Another point with pgpool is it can be run in replication mode. -- Tatsuo Ishii
Tatsuo, I did some benchmark on my Linux box (AMD 1.2Ghz, 256MB, Fedora Core 1 Linux 2.4.20-8) using Pgpool 1.2 and PostgreSQL 7.4. I ran the benchmark script repeatedly (10+ times each). I get 5x faster using Pgpool on UNIX socket, which is encouraging. This shows pre-fork does speed things up. However, when I tried TCP socket, Pgpool was actually slower by 15x !! Perhaps you can clarify why the TCP socket is so much slower? PHP connecting on UNIX socket ----------------------------- Without pgpool: 0.144 sec With pgpool : 0.027 sec PHP connecting on TCP Socket ---------------------------- Without pgpool: 0.152 sec With pgpool : 2.39 sec <?php $time_start = getmicrotime(); for ($i = 0; $i < 20; $i++) { // With pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 port=9999 user=postgres'); // With pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 port=9999 user=postgres'); // Without pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 user=postgres'); // Without pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 user=postgres'); $Res = pg_exec($DBH, 'SELECT 1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ?> I only changed the pgpool configuration where it says: allow_inet_domain_socket = 1 __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
Ok, I did some benchmark on my Linux box (AMD 1.2Ghz, 256MB, Fedora Core 1 Linux 2.4.20-8) using Pgpool 1.2 and PostgreSQL 7.4. I ran the benchmark script repeatedly (10+ times each). I get 5x faster using Pgpool on UNIX socket, which is encouraging. This shows pre-fork does speed things up even with the overhead incurred by the proxy. However, when I tried TCP socket, Pgpool was actually slower by 15x !! Tatsuo, perhaps you can clarify why the TCP socket is so much slower? PHP connecting on UNIX socket ----------------------------- Without pgpool: 0.144 sec With pgpool : 0.027 sec PHP connecting on TCP Socket ---------------------------- Without pgpool: 0.152 sec With pgpool : 2.39 sec <?php $time_start = getmicrotime(); for ($i = 0; $i < 20; $i++) { // With pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 port=9999 user=postgres'); // With pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 port=9999 user=postgres'); // Without pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 user=postgres'); // Without pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 user=postgres'); $Res = pg_exec($DBH, 'SELECT 1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ?> I only changed the pgpool configuration where it says: allow_inet_domain_socket = 1 __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
>>>>> "sdv" == sdv mailer <sdvmailer@yahoo.com> writes: sdv> Yes, I realize it's a bit old but I just wanted to make a sdv> small point that forking is slower. It's funny youshould ask sdv> because thread creation on Linux has in fact improved over sdv> process creation much more in 2.4kernel. sdv> Benchmark at IBM shows Linux 2.4 thread creation is 30x sdv> faster than process creation. Process creation onWindows sdv> 2000 is about twice longer than process creation on sdv> Linux. This means forking on Win32 will be 2xslower! See sdv> 2002 benchmark below: sdv> http://www-106.ibm.com/developerworks/linux/library/l-rt7/?Open&t=grl,l=252,p=mgth Excuse me for butting in here but this shows that fork AND exec is slower than thread creation. I was under the impression that (for 2.2 or 2.4 at least) both fork and thread creation used clone (kernel not libc). Only when a process does an exec does the diiference show (well, actually it seems when either process modifies its stack). Now, saying that, I have no idea how postgresql works so will shut up. Sincerely, Adrian Phillips -- Who really wrote the works of William Shakespeare ? http://www.pbs.org/wgbh/pages/frontline/shakespeare/
> However, when I tried TCP socket, Pgpool was actually > slower by 15x !! Perhaps you can clarify why the TCP > socket is so much slower? How did you have pgpool configured to connect to the database? Domain socket or tcpip?
Pgpool connects to PostgreSQL on UNIX socket. I also ran on TCP socket but there is no significant difference if I recall correctly due to the inherent nature of connection pooling or pre-fork technology. ;-) --- Rod Taylor <pg@rbt.ca> wrote: > > However, when I tried TCP socket, Pgpool was > actually > > slower by 15x !! Perhaps you can clarify why the > TCP > > socket is so much slower? > > How did you have pgpool configured to connect to the > database? Domain > socket or tcpip? > __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer wrote: > Pgpool connects to PostgreSQL on UNIX socket. I also > ran on TCP socket but there is no significant > difference if I recall correctly due to the inherent > nature of connection pooling or pre-fork technology. > ;-) I am confused. First you said TCP was slower, but now you say it isn't. --------------------------------------------------------------------------- > --- Rod Taylor <pg@rbt.ca> wrote: > > > However, when I tried TCP socket, Pgpool was > > actually > > > slower by 15x !! Perhaps you can clarify why the > > TCP > > > socket is so much slower? > > > > How did you have pgpool configured to connect to the > > database? Domain > > socket or tcpip? > > > > > > > > __________________________________ > Do you Yahoo!? > Win a $20,000 Career Makeover at Yahoo! HotJobs > http://hotjobs.sweepstakes.yahoo.com/careermakeover > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Hi Bruce, Sorry for the confusion because Rod asked a question and I answered too quickly. This is what I mean. 15x Slower: ----------- Client <--TCP--> PgPool <--UNIX--> PostgreSQL Client <--TCP--> PgPool <--TCP--> PostgreSQL 5x Faster: ---------- Client <--UNIX--> PgPool <--UNIX--> PostgreSQL Client <--UNIX--> PgPool <--TCP--> PostgreSQL Hope this helps! Pgpool speeds up connection time by 5x with UNIX socket due to pre-fork and connection pooling. However, pgpool slows down by 15x under TCP socket for some unknown reason. __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer wrote: > Hi Bruce, > > Sorry for the confusion because Rod asked a question > and I answered too quickly. This is what I mean. > > 15x Slower: > ----------- > Client <--TCP--> PgPool <--UNIX--> PostgreSQL > Client <--TCP--> PgPool <--TCP--> PostgreSQL > > 5x Faster: > ---------- > Client <--UNIX--> PgPool <--UNIX--> PostgreSQL > Client <--UNIX--> PgPool <--TCP--> PostgreSQL > > > Hope this helps! Pgpool speeds up connection time by > 5x with UNIX socket due to pre-fork and connection > pooling. However, pgpool slows down by 15x under TCP > socket for some unknown reason. How does this compare to using tcpip without pgpool? Is it the tcp startup/shutdown time that is hurting performance? pgpool is using persistent connections so I don't think a difference would show up there, but the client/pgpool connections are being created/destroyed frequently. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On May 6, 2004, at 12:19 PM, sdv mailer wrote: > > 15x Slower: > ----------- > Client <--TCP--> PgPool <--UNIX--> PostgreSQL > Client <--TCP--> PgPool <--TCP--> PostgreSQL > > 5x Faster: > ---------- > Client <--UNIX--> PgPool <--UNIX--> PostgreSQL > Client <--UNIX--> PgPool <--TCP--> PostgreSQL > If the problem were in the TCP stack utilization itself, one would expect case #1 to be equivalent to case #4, since both use one UNIX domain connection and one TCP connection. Likewise, one would expect case #2 to be the worst. Does PgPool markedly differ codewise when reading from TCP socket instead of UNIX domain? Pulling down code ... ---- James Robinson Socialserve.com
The speedup (UNIX) and slowdown (TCP) are both compared against normal direct connections from Client to PostgreSQL. This means with Pgpool (UNIX) it is 5x faster than normal connections to PostgreSQL. It is also 15x slower with Pgpool (TCP) compared to normal connections to PostgreSQL. My guess is Tatsuo will be best to answer why we see a slowdown on the Client to PgPool using TCP. Perhaps a bug or feature in his code. :-) The point is pre-forking can *potentially* speed up connections by 5x as shown in this simplistic non-conclusive benchmark. It would be faster too without the proxy overhead. Forking on Linux is still a price to pay since we don't have threads but you can make the best out of it by pre-forking a la Apache. Theoretically, pre-forking may be faster than threading (MySQL) because you have one less thing to do. __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer wrote: >Hi Bruce, > >Sorry for the confusion because Rod asked a question >and I answered too quickly. This is what I mean. > >15x Slower: >----------- >Client <--TCP--> PgPool <--UNIX--> PostgreSQL >Client <--TCP--> PgPool <--TCP--> PostgreSQL > >5x Faster: >---------- >Client <--UNIX--> PgPool <--UNIX--> PostgreSQL >Client <--UNIX--> PgPool <--TCP--> PostgreSQL > > >Hope this helps! Pgpool speeds up connection time by >5x with UNIX socket due to pre-fork and connection >pooling. However, pgpool slows down by 15x under TCP >socket for some unknown reason. > > > > > Do you have SSL enabled on the postgresql server? If you do, this would account for the slower connect time over TCP/IP.
sdv mailer wrote: > The speedup (UNIX) and slowdown (TCP) are both > compared against normal direct connections from Client > to PostgreSQL. This means with Pgpool (UNIX) it is 5x > faster than normal connections to PostgreSQL. It is > also 15x slower with Pgpool (TCP) compared to normal > connections to PostgreSQL. When you say compared to normal direct connections, do you mean normal Unix connections or normal TCP connections? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
No SSL. No authentication either. Just friendly handshakes. __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
Quick overview of the code for differences in TCP-on-the-frontend code is a call to setsockopt(..., TCP_NODELAY, ...) if the connection to the frontend is a TCP socket. Could this be producing pseudo-fragmentation, resulting in over-the-top context switches? Looks like pool_process_query() does a lot of little itty bitty writes to the frontend filedescriptor. What do you get if you comment out that block in child.c, around line 372? Either a faster system or a non-working one? ---- James Robinson Socialserve.com
I compared against both TCP and UNIX direct connections. No SSL, no authentication. See benchmark results posted below again: Direct ------ 0.144 sec. Client <--UNIX--> PG 0.152 sec. Client <--TCP--> PG 5x Faster --------- 0.027 sec. Client <--UNIX--> Pgpool <--UNIX--> PG 0.028 sec. Client <--UNIX--> Pgpool <--TCP--> PG 15x Slower ---------- 2.39 sec. Client <--TCP--> Pgpool <--UNIX--> PG 2.40 sec. Client <--TCP--> Pgpool <--TCP--> PG __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
Nope. I commented out that block of code at 372 and no difference. __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
sdv mailer <sdvmailer@yahoo.com> writes: > The point is pre-forking can *potentially* speed up > connections by 5x as shown in this simplistic > non-conclusive benchmark. I think this "benchmark" proves no such thing. The thing that pgpool is doing is not preforking connections at all, but re-using prior connections. The important difference is that you are using a "hot" backend that has already loaded a full working set of relcache and syscache entries --- and not just any old entries, but exactly those needed to process your query. (The fact that the pgbench test uses only a very limited set of queries probably causes this test to overstate the effect compared to more realistic workloads.) The profiling that I've done of backend startup shows that cache initialization accounts for the bulk of the startup delay. And IIRC, I was just measuring the time needed to be ready to accept the first query, not the additional effort to fetch query-specific cache entries. So having a hot backend would make a significant difference, but merely avoiding the fork wouldn't necessarily. regards, tom lane
James Robinson <jlrobins@socialserve.com> writes: > Quick overview of the code for differences in TCP-on-the-frontend code > is a call to setsockopt(..., TCP_NODELAY, ...) if the connection to the > frontend is a TCP socket. Could this be producing pseudo-fragmentation, > resulting in over-the-top context switches? Could be. Although libpq and the backend both set that option, they are both careful not to present data to the kernel at all until they have a full buffer or need a response from the far end. pgpool seems way too enthusiatic about flushing after each logical message --- or even part of a logical message in some places. I'd expect this is presenting nontrivial extra overhead in the Unix-socket case too (at the minimum, more kernel calls than necessary). But it'd really hurt in TCP if we're sending packets with just a few bytes ... Possibly pgpool could be taught to flush only after "significant" messages that indicate query completion or a request for response. At the very least I'd get rid of the flushes associated with AsciiRow and BinaryRow messages. Those would be a lot of overhead during a large select retrieval. regards, tom lane
On May 6, 2004, at 1:06 PM, sdv mailer wrote: > I compared against both TCP and UNIX direct > connections. No SSL, no authentication. See benchmark > results posted below again: I recall your script only connected 20 times - that is not enough to filter out "noise" in those numbers. Please run it again, this time with say, 1000 connects. That will give a more true number. And also try running several of these in parellel. I've been using pgpool in production with great success and it can drastically improve connect times. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Tom, You're correct about the test measuring a hot backend and not forking. How much exactly is the "bulk of the startup" done by cache initialization relative to the forking? What would be the impact on Win32 knowing that process creation is twice as slow than on Linux? __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover
Tom Lane writes:... too much flushing ... I agree. I'll bet replacing the pool_write_and_flush() calls in BinaryRow() and AsciiRow() with just pool_write(), followed by removing the fflush() calls at the bottom of those two methods should go a long way towards fixing things, since the CompleteCommandResponse handler method ends with a call to pool_write_and_flush(), and you've pretty much gotta get a CompleteCommand message trailing all of those rows. ---- James Robinson Socialserve.com
On Thu, 6 May 2004, Tom Lane wrote: > sdv mailer <sdvmailer@yahoo.com> writes: > > The point is pre-forking can *potentially* speed up > > connections by 5x as shown in this simplistic > > non-conclusive benchmark. > > I think this "benchmark" proves no such thing. > > The thing that pgpool is doing is not preforking connections at all, but > re-using prior connections. The important difference is that you are > using a "hot" backend that has already loaded a full working set of > relcache and syscache entries --- and not just any old entries, but > exactly those needed to process your query. (The fact that the pgbench > test uses only a very limited set of queries probably causes this test > to overstate the effect compared to more realistic workloads.) > > The profiling that I've done of backend startup shows that cache > initialization accounts for the bulk of the startup delay. And IIRC, > I was just measuring the time needed to be ready to accept the first > query, not the additional effort to fetch query-specific cache entries. > So having a hot backend would make a significant difference, but merely > avoiding the fork wouldn't necessarily. Wouldn't the db selection / authentication be more / as expensive as buffer creation? Even in trust mode the backend still has to "authenticate" it just doesn't have to do as much to do that as with passwords. I'd expect that to be a big chunk of time too. It appears the best place to fix this "problem" (not a problem with postgresql, but an engineering problem in an abstract sense) is with pooling, and once the flushing etc... in tatsuo's code is fixed up to be zippy, pgpool would be THE answer for such issues.
> Hi Bruce, > > Sorry for the confusion because Rod asked a question > and I answered too quickly. This is what I mean. > > 15x Slower: > ----------- > Client <--TCP--> PgPool <--UNIX--> PostgreSQL > Client <--TCP--> PgPool <--TCP--> PostgreSQL > > 5x Faster: > ---------- > Client <--UNIX--> PgPool <--UNIX--> PostgreSQL > Client <--UNIX--> PgPool <--TCP--> PostgreSQL > > > Hope this helps! Pgpool speeds up connection time by > 5x with UNIX socket due to pre-fork and connection > pooling. However, pgpool slows down by 15x under TCP > socket for some unknown reason. It appeared that the cause of TCP socket slowness was in reading the startup packet which is performed by read_startup_packet(). I did some measurement for the function and it showed huge difference between UNIX and TCP sockets. Times (in micro sec) for 100 call to read_startup_packet() are: UNIX socket: 623 TCP socket: 6086 As you can see TCP is nearly 10 times slower than UNIX socket. In the function there are 2 read()s to process the startup packet. I think I could enhance pool_read() so that it reduces the call to read() as little as possible... -- Tatsuo Ishii
> > Sorry for the confusion because Rod asked a question > > and I answered too quickly. This is what I mean. > > > > 15x Slower: > > ----------- > > Client <--TCP--> PgPool <--UNIX--> PostgreSQL > > Client <--TCP--> PgPool <--TCP--> PostgreSQL > > > > 5x Faster: > > ---------- > > Client <--UNIX--> PgPool <--UNIX--> PostgreSQL > > Client <--UNIX--> PgPool <--TCP--> PostgreSQL > > > > > > Hope this helps! Pgpool speeds up connection time by > > 5x with UNIX socket due to pre-fork and connection > > pooling. However, pgpool slows down by 15x under TCP > > socket for some unknown reason. > > It appeared that the cause of TCP socket slowness was in reading the > startup packet which is performed by read_startup_packet(). I did some > measurement for the function and it showed huge difference between > UNIX and TCP sockets. Times (in micro sec) for 100 call to > read_startup_packet() are: > > UNIX socket: 623 > TCP socket: 6086 > > As you can see TCP is nearly 10 times slower than UNIX socket. In the > function there are 2 read()s to process the startup packet. I think I > could enhance pool_read() so that it reduces the call to read() as > little as possible... I think I have improved the TCP performance of pgpool. With my quick testing, in the follwing case pgpool is at least as fast as PostgreSQL(client directly connected to PostgreSQL). > > 15x Slower: > > ----------- > > Client <--TCP--> PgPool <--UNIX--> PostgreSQL > > Client <--TCP--> PgPool <--TCP--> PostgreSQL The latest version can be obtained from: http://www2b.biglobe.ne.jp/~caco/pgpool/pgpool-1.2.2.tar.gz -- Tatsuo Ishii
I ran the new Pgpool-1.2.2 and it was a bit faster on the TCP but still slower than on UNIX socket. I used the same script as before. TCP Socket (Pgpool 1.2.0) ---------- 2.39 sec TCP Socket (Pgpool 1.2.2) ---------- 0.80 sec 0.80 sec 0.79 sec UNIX Socket (Pgpool 1.2.2) ----------- 0.026 sec 0.027 sec 0.027 sec Direct TCP connection (no pgpool) --------------------------------- 0.16 sec 0.15 sec 0.16 sec PgPool on TCP is still slower than direct connection but much faster than v1.2. Any other areas that can be improved? Regards, __________________________________ Do you Yahoo!? Yahoo! Movies - Buy advance tickets for 'Shrek 2' http://movies.yahoo.com/showtimes/movie?mid=1808405861
> I ran the new Pgpool-1.2.2 and it was a bit faster on > the TCP but still slower than on UNIX socket. I used > the same script as before. > > TCP Socket (Pgpool 1.2.0) > ---------- > 2.39 sec > > TCP Socket (Pgpool 1.2.2) > ---------- > 0.80 sec > 0.80 sec > 0.79 sec > > UNIX Socket (Pgpool 1.2.2) > ----------- > 0.026 sec > 0.027 sec > 0.027 sec > > Direct TCP connection (no pgpool) > --------------------------------- > 0.16 sec > 0.15 sec > 0.16 sec > > > PgPool on TCP is still slower than direct connection > but much faster than v1.2. Any other areas that can be > improved? This is strange. Using pgbench(pgbench -S -C -t 1000 -h localhost), TCP socket with pgpool 1.2.2 runs about x2 faster than direct connection. Direct connection: 60TPS With pgpool: 122TPS Here is the set up: Direct connection: pgbench <--TCP-->PG With pgpool: pgbench <--TCP-->pgpool<--UNIX-->PG Note: I use PostgreSQL 7.4.2. This means that pgpool forces pgbench to fallback to V2 protocol (remember that pgpool does not support V3 yet), and the start up packet flys on the wire twice at the each connection statge. This actually makes the benchmark worse, still pgpool is better than direct connection however. -- Tatsuo Ishii