Thread: Persistent Connections in Webserver Environment
Hi, as the subject says I need some advice on setting up connection handling to PG in a webserver environment. It's a typical dual Xeon FreeBSD box running Apache2 with mod_php5 and PG 8. About 20 different applications (ecommerce systems) will be running on this box. Each app resides in it's own schema inside a single database. As far as I understand persistent connections from apache processes can only be reused if the authentication information of the allready existing connection is the same. So in case an apache process holds a persistent connection to database "test", auth'ed with username "user1" and another app wants to connect as "user2" the connection can't be reused and a new one will be spawned. So what we are doing atm is telling all apps to use the user "apache", grant access for this user to all schemas and fire "SET search_path TO <app_schema>;" at the startup of each app / script. It works, but I really would like to have an dedicated user for each app / schema for security reasons. The next better idea I came up with was to fire "SET SESSION AUTHORIZATION TO <user>;" at each app / script startup, but for this to work I would need to initially connect as superuser - and I really dislike the idea of having a webserver connecting as superuser :/ Any ideas? I can't be the first person on earth with that problem ;/ Thanks in advance
I have only a few connections, but I just connect with the equivalent of your "apache" user. My database is pretty much query-only with a few exceptions that are not "sensitive". But for you, could you just write a stored function to do the transaction and write the audit trail for data-altering queries? That way, the application can still provide a "username" to the function for the audit trail and the audit trail can be made "safe" within the database framework (ie., it will only be written if the transaction succeeds). Alternatively, this could be done on the client side by doing all data changes and auditing within the same transaction block, but having all the code on the server side makes altering the schema later easier (?). This should be a balance between having cached connections (VERY important for any even slightly-loaded system, in my very limited experience) and having robust auditing. Sean ----- Original Message ----- From: "Hannes Dorbath" <light@theendofthetunnel.de> To: <pgsql-general@postgresql.org> Sent: Monday, May 02, 2005 8:45 AM Subject: [GENERAL] Persistent Connections in Webserver Environment > Hi, > as the subject says I need some advice on setting up connection handling > to PG in a webserver environment. It's a typical dual Xeon FreeBSD box > running Apache2 with mod_php5 and PG 8. About 20 different applications > (ecommerce systems) will be running on this box. Each app resides in it's > own schema inside a single database. As far as I understand persistent > connections from apache processes can only be reused if the authentication > information of the allready existing connection is the same. So in case an > apache process holds a persistent connection to database "test", auth'ed > with username "user1" and another app wants to connect as "user2" the > connection can't be reused and a new one will be spawned. > > So what we are doing atm is telling all apps to use the user "apache", > grant access for this user to all schemas and fire "SET search_path TO > <app_schema>;" at the startup of each app / script. It works, but I really > would like to have an dedicated user for each app / schema for security > reasons. > > The next better idea I came up with was to fire "SET SESSION AUTHORIZATION > TO <user>;" at each app / script startup, but for this to work I would > need to initially connect as superuser - and I really dislike the idea of > having a webserver connecting as superuser :/ > > Any ideas? I can't be the first person on earth with that problem ;/ > > > Thanks in advance > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Hm. That would work, but there are so many data-altering queries, it's a lot of work :/ I'm dreaming of a simple proxy that securely holds a pool of su-connections and uses: SET SESSION AUTHORIZATION $foo; $query; RESET SESSION AUTHORIZATION; It would just have to filter queries that contain "SESSION AUTHORIZATION" to prevent sql injection.. I wonder why pgPool doesn't work that way.. On 02.05.2005 15:23, Sean Davis wrote: > I have only a few connections, but I just connect with the equivalent of > your "apache" user. My database is pretty much query-only with a few > exceptions that are not "sensitive". But for you, could you just write > a stored function to do the transaction and write the audit trail for > data-altering queries? That way, the application can still provide a > "username" to the function for the audit trail and the audit trail can > be made "safe" within the database framework (ie., it will only be > written if the transaction succeeds). Alternatively, this could be done > on the client side by doing all data changes and auditing within the > same transaction block, but having all the code on the server side makes > altering the schema later easier (?). This should be a balance between > having cached connections (VERY important for any even slightly-loaded > system, in my very limited experience) and having robust auditing. > > Sean > ----- Original Message ----- From: "Hannes Dorbath" > <light@theendofthetunnel.de> > To: <pgsql-general@postgresql.org> > Sent: Monday, May 02, 2005 8:45 AM > Subject: [GENERAL] Persistent Connections in Webserver Environment > > >> Hi, >> as the subject says I need some advice on setting up connection >> handling to PG in a webserver environment. It's a typical dual Xeon >> FreeBSD box running Apache2 with mod_php5 and PG 8. About 20 different >> applications (ecommerce systems) will be running on this box. Each app >> resides in it's own schema inside a single database. As far as I >> understand persistent connections from apache processes can only be >> reused if the authentication information of the allready existing >> connection is the same. So in case an apache process holds a >> persistent connection to database "test", auth'ed with username >> "user1" and another app wants to connect as "user2" the connection >> can't be reused and a new one will be spawned. >> >> So what we are doing atm is telling all apps to use the user "apache", >> grant access for this user to all schemas and fire "SET search_path TO >> <app_schema>;" at the startup of each app / script. It works, but I >> really would like to have an dedicated user for each app / schema for >> security reasons. >> >> The next better idea I came up with was to fire "SET SESSION >> AUTHORIZATION TO <user>;" at each app / script startup, but for this >> to work I would need to initially connect as superuser - and I really >> dislike the idea of having a webserver connecting as superuser :/ >> >> Any ideas? I can't be the first person on earth with that problem ;/ >> >> >> Thanks in advance >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Mon, 2005-05-02 at 14:45 +0200, Hannes Dorbath wrote: > Hi, > as the subject says I need some advice on setting up connection handling > to PG in a webserver environment. It's a typical dual Xeon FreeBSD box > running Apache2 with mod_php5 and PG 8. About 20 different applications > (ecommerce systems) will be running on this box. Each app resides in > it's own schema inside a single database. As far as I understand > persistent connections from apache processes can only be reused if the > authentication information of the allready existing connection is the > same. So in case an apache process holds a persistent connection to > database "test", auth'ed with username "user1" and another app wants to > connect as "user2" the connection can't be reused and a new one will be > spawned. > > So what we are doing atm is telling all apps to use the user "apache", > grant access for this user to all schemas and fire "SET search_path TO > <app_schema>;" at the startup of each app / script. It works, but I > really would like to have an dedicated user for each app / schema for > security reasons. > > The next better idea I came up with was to fire "SET SESSION > AUTHORIZATION TO <user>;" at each app / script startup, but for this to > work I would need to initially connect as superuser - and I really > dislike the idea of having a webserver connecting as superuser :/ > > Any ideas? I can't be the first person on earth with that problem ;/ > Have you measured the real gain in using persistent connections at all? In my experience, it's just a CPU vs RAM tradeoff. Before you go thru the pain of setting up a weird authentication mechanism, try and consider whether you really need persistent connections. Search the lists, it has been discussed in the past. I remember of this thread: http://archives.postgresql.org/pgsql-php/2005-02/msg00009.php There may be others, too. .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
On 02.05.2005 16:41, Marco Colombo wrote: > Have you measured the real gain in using persistent connections at all? I measured it about one year ago on a linux box. Swichting from multi-user-pg_connect to single-user-pg_pconnect was a big improvment on that box -- 50% and more on pages with just 1-2 simple SELECT queries. I haven't done tests on BSD though.
On 02.05.2005 16:41, Marco Colombo wrote: > Have you measured the real gain in using persistent connections at all? As simple as possible: <?php require_once('Benchmark/Timer.php'); $timer =& new Benchmark_Timer(); $timer->start(); pg_pconnect('host=myhost dbname=database user=user'); pg_query("SET search_path TO myschema;"); $q = "SELECT u.login FROM users WHERE u.user_id = 1;"; $qr = pg_query($q); print_r(pg_fetch_all($qr)); $timer->setMarker('Database'); $timer->stop(); $timer->display(); ?> Results: pconnect: 0.001435995101928 connect: 0.016793966293335 It's factor 10 on such simple things on the BSD box.
On 02.05.2005 17:32, Hannes Dorbath wrote: > $q = "SELECT u.login FROM users WHERE u.user_id = 1;"; Sorry, it should read: $q = "SELECT u.login FROM users u WHERE u.user_id = 1;"; I accidently removed the "u" after users while removing line breaks to make it shorter to post here.
On Mon, 2005-05-02 at 10:32, Hannes Dorbath wrote: > On 02.05.2005 16:41, Marco Colombo wrote: > > > Have you measured the real gain in using persistent connections at all? > > As simple as possible: > > <?php > require_once('Benchmark/Timer.php'); > $timer =& new Benchmark_Timer(); > $timer->start(); > > pg_pconnect('host=myhost dbname=database user=user'); > pg_query("SET search_path TO myschema;"); > > $q = "SELECT u.login FROM users WHERE u.user_id = 1;"; > > $qr = pg_query($q); > > print_r(pg_fetch_all($qr)); > > $timer->setMarker('Database'); > $timer->stop(); > $timer->display(); > ?> > > Results: > > pconnect: 0.001435995101928 > connect: 0.016793966293335 But if the average PHP script takes 50 milliseconds to start up and 100 milliseconds to run, then either one is still pretty much noise. Plus, neither benchmark is interesting really until you have pretty good parallel load running. It may well be that pconnect makes a difference under heavier load. But most the time, I've seen one or two queries that could be tuned make a much larger difference than using pconnect. And, if you haven't got apache / postgresql configured properly and run out of connections, the connect time goes from 16 mS to hours as your users wait for you to fix the connectivity issues. :)
On 02.05.2005 17:41, Scott Marlowe wrote: > But if the average PHP script takes 50 milliseconds to start up and 100 milliseconds to run, then either one is still prettymuch noise. Yeah, _IF_ :) Our scripts reside precompiled in a bytecode cache so there just isn't much start up time ;) I just replaced the simple query with a real world one: http://hannes.imos.net/real_world.txt http://hannes.imos.net/pconnect.php http://hannes.imos.net/connect.php Refresh both a few times to get meaningful result. Still factor 3-4. And this is one of the most complex queries we have -- and factor 3-4 just makes a differece for us :) > Plus, neither benchmark is interesting really until you have pretty good parallel load running. The scripts are on a production box with decent load. > It may well be that pconnect makes a difference under heavier load. But most the time, I've seen one or two queries thatcould be tuned make a much larger difference than using pconnect. We allways try to optimize and get most of performance from a query anyway :) > And, if you haven't got apache / postgresql configured properly and run out of connections, the connect time goes from16 mS to hours as your users wait for you to fix the connectivity issues. :) True, but we will test high load and will make sure that this won't happen :) -- imos Gesellschaft fuer Internet-Marketing und Online-Services mbH Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net
On Mon, 2005-05-02 at 17:32 +0200, Hannes Dorbath wrote: > On 02.05.2005 16:41, Marco Colombo wrote: > > > Have you measured the real gain in using persistent connections at all? > > As simple as possible: > > <?php > require_once('Benchmark/Timer.php'); > $timer =& new Benchmark_Timer(); > $timer->start(); > > pg_pconnect('host=myhost dbname=database user=user'); > pg_query("SET search_path TO myschema;"); > > $q = "SELECT u.login FROM users WHERE u.user_id = 1;"; > > $qr = pg_query($q); > > print_r(pg_fetch_all($qr)); > > $timer->setMarker('Database'); > $timer->stop(); > $timer->display(); > ?> > > Results: > > pconnect: 0.001435995101928 > connect: 0.016793966293335 > > It's factor 10 on such simple things on the BSD box. Ok, but the difference is going to be unnoticed, that's not the point at all. The question was: have you measured any difference in the server load? I did in the past and wasn't really able to measure it, with more than 300 http processes active. The web server load is _way_ lower than the db server. Currently we're about at 100 processes (but with pconnect) and: (web) load average: 0.31, 0.27, 0.21 (db) load average: 0.24, 0.21, 0.18 and I know that turning to use simple connect won't change much as page load time is dominated by the time spent in the queries (and the overhead of 1/100 or 1/1000 of second in the startup time goes unnoticed at all). With any modern operating system, the overhead is very low (15ms is very good actually). In my experience, pconnect my cause RAM problems. The number of processes is useless high. You have make provisions for a large number of backends, and that means little RAM to single backend. My advice is: use pconnect only when you have CPU problems, unless your case is very degenerated one (your db host being on the other side of the globe). And, in my experience again, the first reasons for CPU problems on the database server are: - wrong/missing vacuum/analyze (or similar); - bad coding on the application side (placing silly load on the server); - bad queries (misuse/lack of indexes); - bad tuning of PostgreSQL (expecially RAM); ... ... - connect overhead. I've never managed to reach the last item in the list in real world cases. I think it is by far the least important item. #1 Golden Rule for optimizing: - Don't. (Expecially when it causes _real_ troubles elsewhere.) Have a nice day, .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
On 02.05.2005 18:33, Marco Colombo wrote: > #1 Golden Rule for optimizing: > - Don't. > > (Expecially when it causes _real_ troubles elsewhere.) hmm.. :/ I'll do some more meaningful testing on server load this night.. Thanks so far!