Thread: performance problem - 10.000 databases
Hi all We are building hosting with apache + php ( our own mod_virtual module ) with about 10.000 wirtul domains + PostgreSQL. PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB RAM scsi raid 1+0 ) I've made some test's - 3000 databases and 400 clients connected at same time. These clients was doing on each database set of query 10 times, and then connect to next database. These queries: select * from table1 where number='$i' update table2 set some_text='int(rand(5))' select * from table1 where position in (select position from table2 where number in (select number from table3)) Each database has four tables (int,text,int) with 1000 records. Postgres is taking all memory and all processor ( 4CPU with Hyper Threading ) The first two queries has time duration 0 to 10 sec Third query has 15-70 sec. But my problem is that when I hit command: psql -h 127.0.0.1 dbname dbuser I'm waiting about 3-5 sec to enter psql monitor, so every new connection from apache will wait about 3-5 sec to put query to server. Thats a very long time... 4 sec. to connect to server and 4 sec. to process a query. Why this time to connect to server is so long ??? I could made persistent connection, but with 10.000 clients it will kill the server. Has any one idea how to tune postgres, to accept connection faster? Maybe some others settings to speed up server ? My settings: PostgreSQL: max_connections = 512 shared_buffers = 8192 max_fsm_relations = 10000 max_fsm_pages = 100000 max_locks_per_transaction = 512 wal_buffers = 32 sort_mem = 327681 vacuum_mem = 8192 fsync = true effective_cache_size = 1000000 log_connections = true log_pid = true log_statement = true log_duration = true log_timestamp = true Kernel: kernel.shmmni = 8192 kernel.shmall = 134217728 kernel.shmmax = 536870912 RLIMIT_NPROC=1000 greetings Marek
> I could made persistent connection, but with 10.000 clients it will kill > the server. But if they're virtual domains, why would you need one connection per domain? You should only need one connection per apache process...
W liście z pią, 31-10-2003, godz. 11:52, Matt Clark pisze: > > I could made persistent connection, but with 10.000 clients it will kill > > the server. > > But if they're virtual domains, why would you need one connection per domain? You should only need one connection perapache > process... Because every virtual domain has its own database, username and password. So one client domain1.com with db: domain1db user: domain1user cannot access to second client database domain2.com db: domain2db user: domain2user pg_hba.conf look like this: host domain1db domain1user ip_addr netmask md5 host domain2db domain2user ip_addr netmask md5 ..... .... 10.000 records ... ... host domain10000db domain10000user ip_addr netmask md5 You know, one client cannot access to other client database. So, if one process is connected to domain1db it cannto access to domain2db, and others. greetings Marek
Ooh, I see. That's a tricky one. Do you really need that level of separation? > Because every virtual domain has its own database, username and > password. So one client domain1.com with db: domain1db user: domain1user > cannot access to second client database domain2.com db: domain2db user: > domain2user > pg_hba.conf look like this: > host domain1db domain1user ip_addr netmask md5 > host domain2db domain2user ip_addr netmask md5 > ..... > .... > 10.000 records > ... > ... > host domain10000db domain10000user ip_addr netmask md5 > > > You know, one client cannot access to other client database. > So, if one process is connected to domain1db it cannto access to > domain2db, and others.
Marek Florianczyk wrote: > But my problem is that when I hit command: > psql -h 127.0.0.1 dbname dbuser > I'm waiting about 3-5 sec to enter psql monitor, so every new connection > from apache will wait about 3-5 sec to put query to server. Thats a very > long time... Why don't you use a connection manager ? Regards Gaetano Mendola
W liście z pią, 31-10-2003, godz. 12:25, Matt Clark pisze: > Ooh, I see. That's a tricky one. Do you really need that level of separation? Well, if you talk with the clients, and they promise, that they will not access to other databasess, and specially don't do "drop database my_bes_fried_db" I can put: host any any 0.0.0.0 0.0.0.0 trust in the very beginning of pg_hba.conf ;) greetings ;) Marek > > > > Because every virtual domain has its own database, username and > > password. So one client domain1.com with db: domain1db user: domain1user > > cannot access to second client database domain2.com db: domain2db user: > > domain2user > > pg_hba.conf look like this: > > host domain1db domain1user ip_addr netmask md5 > > host domain2db domain2user ip_addr netmask md5 > > ..... > > .... > > 10.000 records > > ... > > ... > > host domain10000db domain10000user ip_addr netmask md5 > > > > > > You know, one client cannot access to other client database. > > So, if one process is connected to domain1db it cannto access to > > domain2db, and others. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
> W liście z pią, 31-10-2003, godz. 12:25, Matt Clark pisze: > > Ooh, I see. That's a tricky one. Do you really need that level of separation? > > Well, if you talk with the clients, and they promise, that they will not > access to other databasess, and specially don't do "drop database > my_bes_fried_db" > I can put: > host any any 0.0.0.0 0.0.0.0 trust > in the very beginning of pg_hba.conf ;) I was more thinking that it might be possible to manage the security at a different level than the DB.
W liście z pią, 31-10-2003, godz. 13:06, Gaetano Mendola pisze: > Marek Florianczyk wrote: > > > But my problem is that when I hit command: > > psql -h 127.0.0.1 dbname dbuser > > I'm waiting about 3-5 sec to enter psql monitor, so every new connection > > from apache will wait about 3-5 sec to put query to server. Thats a very > > long time... > > Why don't you use a connection manager ? What is connection manager? In PostgreSQL related project I only found SQLB ( Load Balancer ) but it's designed to use with one ( large ) database and maybe with more than one PostgreSQL server. The best solution for me, it would by small pogram installed on machine with apache + php, php would have persistent connection to this small program troghout unix socket, and client authorization would be done by this program. This program would have some pool of connection to PostgreSQL server. So apache could have let's say 3000 persistent connection to this small program, but this program would have 100 connection to PostgreSQL server and thats enought to process all queries. If traffic increase, clients, are already connected to local socket, and more new (tcp/ip) connection to server are made if necessery. The only trick is that pg_hba.conf of the PostgreSQL server would have: host any connectmanager ip_addr netmask md5 And the autorization of the 10.000 clients would be done by small program istalled on apache server. I've never heard about souch a software ;) What is a connection manager ? greetings Marek
W liście z pią, 31-10-2003, godz. 13:33, Matt Clark pisze: > > W liście z pią, 31-10-2003, godz. 12:25, Matt Clark pisze: > > > Ooh, I see. That's a tricky one. Do you really need that level of separation? > > > > Well, if you talk with the clients, and they promise, that they will not > > access to other databasess, and specially don't do "drop database > > my_bes_fried_db" > > I can put: > > host any any 0.0.0.0 0.0.0.0 trust > > in the very beginning of pg_hba.conf ;) > > I was more thinking that it might be possible to manage the security at a different level than the DB. Well why not, but I'cant see any solution. As far as I know only limitation to connect to any database is pg_hba.conf Marek
On Fri, 31 Oct 2003, Matt Clark wrote: > I was more thinking that it might be possible to manage the security at a different level than the DB. > We do this with users and permissions. Each virtual host has an apache config include specifying a db user, pass (and database, although most of them use the same one). Permissions on the database tables are set so that a given vhost can only access their own data. Our setup is mod_perl. Don't know how one would go about doing this with PHP, but I imagine it has some mechanism for per-vhost variables or similar. -j -- Jamie Lawrence jal@jal.org "Remember, half-measures can be very effective if all you deal with are half-wits." - Chris Klein
W liście z pią, 31-10-2003, godz. 13:54, Jamie Lawrence pisze: > On Fri, 31 Oct 2003, Matt Clark wrote: > > > I was more thinking that it might be possible to manage the security at a different level than the DB. > > > > > We do this with users and permissions. > > Each virtual host has an apache config include specifying a db user, > pass (and database, although most of them use the same one). > Permissions on the database tables are set so that a given vhost can > only access their own data. > > Our setup is mod_perl. Don't know how one would go about doing this with > PHP, but I imagine it has some mechanism for per-vhost variables or > similar. So, as I understand apache vhost can only connect to specified database. Strange... no PHP only mod_perl that fetch data from database and writes html document ? So, clients don't make any scripts, and don't use function like pgconnect? Do they use CGI with mod_perl, and they write scripts in perl ? Interesting. Don't know if it's possible with PHP, don't think so. But... If I would have 200, or even 900 clients I would do apache with vhost. But when I have 10.000 clients, apache cannot work with vhosts. ( some system limitation ) So we use our own dynamic vhost module. When request is made to server, it checks domain part of the request, and search i LDAP what is DocumentRoot for that domain, and then return proper file. Config looks like it was only one vhost, but it works with 10.000 domains ;) No, I think that your solution, would not work for us. Everything is complicated when a large number of anything occurs. ;) greetings sorry for my bad english
Marek Florianczyk <franki@tpi.pl> writes: > We are building hosting with apache + php ( our own mod_virtual module ) > with about 10.000 wirtul domains + PostgreSQL. > PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB RAM > scsi raid 1+0 ) > I've made some test's - 3000 databases and 400 clients connected at same > time. You are going to need much more serious iron than that if you want to support 10000 active databases. The required working set per database is a couple hundred K just for system catalogs (I don't have an exact figure in my head, but it's surely of that order of magnitude). So the system catalogs alone would require 2 gig of RAM to keep 'em swapped in; never mind caching any user data. The recommended way to handle this is to use *one* database and create 10000 users each with his own schema. That should scale a lot better. Also, with a large max_connections setting, you have to beware that your kernel settings are adequate --- particularly the open-files table. It's pretty easy for Postgres to eat all your open files slots. PG itself will usually survive this condition just fine, but everything else you run on the machine will start falling over :-(. For safety you should make sure that max_connections * max_files_per_process is comfortably less than the size of the kernel's open-files table. regards, tom lane
Hmm, maybe you need to back off a bit here on your expectations. You said your test involved 400 clients simultaneouslyrunning queries that hit pretty much all the data in each client's DB. Why would you expect that to be anything *other* than slow? And does it reflect expected production use? Unless those 10,000 sites are all fantastically popular, surely it's more likelythat only a small number of queries will be in progress at any given time? You're effectively simulating running 400 _very_ popular dynamic websites off one 2-cpu DB server. You also said that CPU is pegged at 100%. Given that you've got 400 backends all competing for CPU time you must have aninsane load average too, so improving the connect time might prove to be of no use, as you could well just get fasert connects andthen slower queries! Sorry this email wasn't more constructive ;-) M > -----Original Message----- > From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Marek Florianczyk > Sent: 31 October 2003 13:20 > To: Jamie Lawrence > Cc: Matt Clark; pgsql-admin@postgresql.org > Subject: Re: [ADMIN] performance problem - 10.000 databases > > > W liście z pią, 31-10-2003, godz. 13:54, Jamie Lawrence pisze: > > On Fri, 31 Oct 2003, Matt Clark wrote: > > > > > I was more thinking that it might be possible to manage the security at a different level than the DB. > > > > > > > > > We do this with users and permissions. > > > > Each virtual host has an apache config include specifying a db user, > > pass (and database, although most of them use the same one). > > Permissions on the database tables are set so that a given vhost can > > only access their own data. > > > > Our setup is mod_perl. Don't know how one would go about doing this with > > PHP, but I imagine it has some mechanism for per-vhost variables or > > similar. > > So, as I understand apache vhost can only connect to specified database. > Strange... no PHP only mod_perl that fetch data from database and writes > html document ? So, clients don't make any scripts, and don't use > function like pgconnect? Do they use CGI with mod_perl, and they write > scripts in perl ? Interesting. > Don't know if it's possible with PHP, don't think so. > But... If I would have 200, or even 900 clients I would do apache with > vhost. But when I have 10.000 clients, apache cannot work with vhosts. ( > some system limitation ) So we use our own dynamic vhost module. When > request is made to server, it checks domain part of the request, and > search i LDAP what is DocumentRoot for that domain, and then return > proper file. Config looks like it was only one vhost, but it works with > 10.000 domains ;) > No, I think that your solution, would not work for us. > Everything is complicated when a large number of anything occurs. ;) > > greetings > sorry for my bad english > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
W liście z pią, 31-10-2003, godz. 15:23, Tom Lane pisze: > Marek Florianczyk <franki@tpi.pl> writes: > > We are building hosting with apache + php ( our own mod_virtual module ) > > with about 10.000 wirtul domains + PostgreSQL. > > PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB RAM > > scsi raid 1+0 ) > > I've made some test's - 3000 databases and 400 clients connected at same > > time. > > You are going to need much more serious iron than that if you want to > support 10000 active databases. The required working set per database > is a couple hundred K just for system catalogs (I don't have an exact > figure in my head, but it's surely of that order of magnitude). it's about 3.6M > So the > system catalogs alone would require 2 gig of RAM to keep 'em swapped in; > never mind caching any user data. > > The recommended way to handle this is to use *one* database and create > 10000 users each with his own schema. That should scale a lot better. > > Also, with a large max_connections setting, you have to beware that your > kernel settings are adequate --- particularly the open-files table. > It's pretty easy for Postgres to eat all your open files slots. PG > itself will usually survive this condition just fine, but everything > else you run on the machine will start falling over :-(. For safety > you should make sure that max_connections * max_files_per_process is > comfortably less than the size of the kernel's open-files table. Yes, I have made some updates, number of process, semaphores, and file descriptor. I'm aware of this limitation. On this machine there will be only PostgreSQL, nothing else. This idea with one database and 10.000 schemas is very interesting, I never thought about that. I will make some tests on monday and send results to the list. greeings Marek
W liście z pią, 31-10-2003, godz. 15:30, Matt Clark pisze: > Hmm, maybe you need to back off a bit here on your expectations. You said your test involved 400 clients simultaneouslyrunning > queries that hit pretty much all the data in each client's DB. Why would you expect that to be anything *other* than slow? > > And does it reflect expected production use? Unless those 10,000 sites are all fantastically popular, surely it's morelikely that > only a small number of queries will be in progress at any given time? You're effectively simulating running 400 _very_popular > dynamic websites off one 2-cpu DB server. Well, maybe these queries will not happens in production life, but if many clients will make large tables and no index, effect can be this same. Besides I wanted to identify thin throat on this machine before we will put this to the production. PostgreSQL was working quite good, and if not this long time to connect to database I would be quite happy. But solution from Tom is great I think, so I must test it. have nice weekend ! Marek > > You also said that CPU is pegged at 100%. Given that you've got 400 backends all competing for CPU time you must havean insane > load average too, so improving the connect time might prove to be of no use, as you could well just get fasert connectsand then > slower queries! > > Sorry this email wasn't more constructive ;-) > > M > > > -----Original Message----- > > From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Marek Florianczyk > > Sent: 31 October 2003 13:20 > > To: Jamie Lawrence > > Cc: Matt Clark; pgsql-admin@postgresql.org > > Subject: Re: [ADMIN] performance problem - 10.000 databases > > > > > > W liście z pią, 31-10-2003, godz. 13:54, Jamie Lawrence pisze: > > > On Fri, 31 Oct 2003, Matt Clark wrote: > > > > > > > I was more thinking that it might be possible to manage the security at a different level than the DB. > > > > > > > > > > > > > We do this with users and permissions. > > > > > > Each virtual host has an apache config include specifying a db user, > > > pass (and database, although most of them use the same one). > > > Permissions on the database tables are set so that a given vhost can > > > only access their own data. > > > > > > Our setup is mod_perl. Don't know how one would go about doing this with > > > PHP, but I imagine it has some mechanism for per-vhost variables or > > > similar. > > > > So, as I understand apache vhost can only connect to specified database. > > Strange... no PHP only mod_perl that fetch data from database and writes > > html document ? So, clients don't make any scripts, and don't use > > function like pgconnect? Do they use CGI with mod_perl, and they write > > scripts in perl ? Interesting. > > Don't know if it's possible with PHP, don't think so. > > But... If I would have 200, or even 900 clients I would do apache with > > vhost. But when I have 10.000 clients, apache cannot work with vhosts. ( > > some system limitation ) So we use our own dynamic vhost module. When > > request is made to server, it checks domain part of the request, and > > search i LDAP what is DocumentRoot for that domain, and then return > > proper file. Config looks like it was only one vhost, but it works with > > 10.000 domains ;) > > No, I think that your solution, would not work for us. > > Everything is complicated when a large number of anything occurs. ;) > > > > greetings > > sorry for my bad english > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
On Friday 31 October 2003 09:59 am, Marek Florianczyk wrote: > W liście z pią, 31-10-2003, godz. 15:23, Tom Lane pisze: > > Marek Florianczyk <franki@tpi.pl> writes: > > > We are building hosting with apache + php ( our own mod_virtual module > > > ) with about 10.000 wirtul domains + PostgreSQL. > > > PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB RAM > > > scsi raid 1+0 ) > > > I've made some test's - 3000 databases and 400 clients connected at > > > same time. > > > > You are going to need much more serious iron than that if you want to > > support 10000 active databases. The required working set per database > > is a couple hundred K just for system catalogs (I don't have an exact > > figure in my head, but it's surely of that order of magnitude). > > it's about 3.6M > > > So the > > system catalogs alone would require 2 gig of RAM to keep 'em swapped in; > > never mind caching any user data. > > > > The recommended way to handle this is to use *one* database and create > > 10000 users each with his own schema. That should scale a lot better. > > > > Also, with a large max_connections setting, you have to beware that your > > kernel settings are adequate --- particularly the open-files table. > > It's pretty easy for Postgres to eat all your open files slots. PG > > itself will usually survive this condition just fine, but everything > > else you run on the machine will start falling over :-(. For safety > > you should make sure that max_connections * max_files_per_process is > > comfortably less than the size of the kernel's open-files table. > > Yes, I have made some updates, number of process, semaphores, and file > descriptor. I'm aware of this limitation. On this machine there will be > only PostgreSQL, nothing else. > This idea with one database and 10.000 schemas is very interesting, I > never thought about that. I will make some tests on monday and send > results to the list. Following this logic, if you are willing to place the authentication in front of the database instead of inside it you can use a connection pool and simply change the search_path each time a new user accesses the database. > > greeings > Marek > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Mike Rylander
W liście z pią, 31-10-2003, godz. 16:51, Mike Rylander pisze: > On Friday 31 October 2003 09:59 am, Marek Florianczyk wrote: > > W liście z pią, 31-10-2003, godz. 15:23, Tom Lane pisze: > > > Marek Florianczyk <franki@tpi.pl> writes: > > > > We are building hosting with apache + php ( our own mod_virtual module > > > > ) with about 10.000 wirtul domains + PostgreSQL. > > > > PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB RAM > > > > scsi raid 1+0 ) > > > > I've made some test's - 3000 databases and 400 clients connected at > > > > same time. > > > > > > You are going to need much more serious iron than that if you want to > > > support 10000 active databases. The required working set per database > > > is a couple hundred K just for system catalogs (I don't have an exact > > > figure in my head, but it's surely of that order of magnitude). > > > > it's about 3.6M > > > > > So the > > > system catalogs alone would require 2 gig of RAM to keep 'em swapped in; > > > never mind caching any user data. > > > > > > The recommended way to handle this is to use *one* database and create > > > 10000 users each with his own schema. That should scale a lot better. > > > > > > Also, with a large max_connections setting, you have to beware that your > > > kernel settings are adequate --- particularly the open-files table. > > > It's pretty easy for Postgres to eat all your open files slots. PG > > > itself will usually survive this condition just fine, but everything > > > else you run on the machine will start falling over :-(. For safety > > > you should make sure that max_connections * max_files_per_process is > > > comfortably less than the size of the kernel's open-files table. > > > > Yes, I have made some updates, number of process, semaphores, and file > > descriptor. I'm aware of this limitation. On this machine there will be > > only PostgreSQL, nothing else. > > This idea with one database and 10.000 schemas is very interesting, I > > never thought about that. I will make some tests on monday and send > > results to the list. > > Following this logic, if you are willing to place the authentication in front > of the database instead of inside it you can use a connection pool and simply > change the search_path each time a new user accesses the database. Well it's not so simple, I think. If I've got apache+php+phpAccelerator with persistent connection on. Server holds some pool of connection, but when new request is made from phpscripts, apache looks at his connection pool for a connection with parameters: dbname,dbuser,dbserver. So for each of 10.000 virtual domain ( 10.000 databases ) I would have to hold such a connection. Second thing: How to change search_path dynamically ? I can set in postgresql.conf: "search_path '$user, public'" but it works when a new client (username,password) is connecting to server, it gets his own schema with proper privileges (USE,CREATE) and thats all. Right ? Or maybe I don't uderstand something ? If I will do in pg_hba.conf only one record: host any any ip_addr netmask md5 and only one database, I must make 10.000 schemas with proper accesslist (USE,CREATE only for one user, and schemaname is same as dbusername) This is what I want to test ;) Now user connect from phpscript with dbusername=unique_user dbpass=unique_pass dbname=shared_db Server holds persistent connection, but it's still one connection per user, so it would have to have 10.000 simultaneous connection. I can't see any benefits, with connection pool, or I did not understand what you wanted to tell me. How to place authentication in front of the database using, when clients are using phpscripts ? greetings Marek > > > > > greeings > > Marek > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
We have a similar issue regarding security. Some of the access to our database will be by ODBC connections for reporting purposes (ie. Actuate Report/Crystal Reports). Without creating a zillion or so views (which I suspect carries with it alot of overhead), I believe it would be tricky to maintain security. Our application is medical related, and we are bound by HIPAA rules, so security is most important. How would you architect this scenario so our ASP customers cannot see each others data? Naomi > > I was more thinking that it might be possible to manage the security at > a different level than the DB. > > > > >We do this with users and permissions. > >Each virtual host has an apache config include specifying a db user, >pass (and database, although most of them use the same one). >Permissions on the database tables are set so that a given vhost can >only access their own data. > >Our setup is mod_perl. Don't know how one would go about doing this with >PHP, but I imagine it has some mechanism for per-vhost variables or >similar. > >-j > >-- >Jamie Lawrence jal@jal.org >"Remember, half-measures can be very effective if all you deal with are >half-wits." > - Chris Klein > > > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) ------------------------------------------------------------------------------------------------------------------------- Naomi Walker Chief Information Officer Eldorado Computing, Inc. nwalker@eldocomp.com 602-604-3100 ------------------------------------------------------------------------------------------------------------------------- Insanity is doing things in the same way and expecting different results. ------------------------------------------------------------------------------------------------------------------------ -- CONFIDENTIALITY NOTICE -- This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy,disclose or distribute to anyone the message or any information contained in the message. If you have received thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.
On 31 Oct 2003, Marek Florianczyk wrote: > Hi all > > We are building hosting with apache + php ( our own mod_virtual module ) > with about 10.000 wirtul domains + PostgreSQL. > PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB RAM > scsi raid 1+0 ) Tom's right, you need more memory, period, and probably want a very large RAID1+0 (with like 10 or more disks). > Has any one idea how to tune postgres, to accept connection faster? Postgresql will take the amount of time it needs. Connections, especially in a contentious environment, aren't cheap. > Maybe some others settings to speed up server ? > My settings: > PostgreSQL: > max_connections = 512 > shared_buffers = 8192 > max_fsm_relations = 10000 > max_fsm_pages = 100000 > max_locks_per_transaction = 512 > wal_buffers = 32 > sort_mem = 327681 -------------^^^^^^-- THIS IS WAY TOO HIGH. That's ~320Meg! PER SORT. Drop this down to something reasonable like 8192 or something. (i.e. 8 meg) If there were lots of big sorts going on by all 300 users, then that's 300*320 Meg memory that could get used up. I.e. swap storm. Have you adjusted random_page_cost to reflect your I/O setup? While the default of 4 is a good number for a single drive server, it's kinda high for a machine with 4 or more drives in an array. Figures from 1.2 to 2.0 seem common. My database under 7.2.4 run best with about 1.4 random_page_cost
On Friday 31 October 2003 11:19 am, Marek Florianczyk wrote: > W liście z pią, 31-10-2003, godz. 16:51, Mike Rylander pisze: > > On Friday 31 October 2003 09:59 am, Marek Florianczyk wrote: > > > W liście z pią, 31-10-2003, godz. 15:23, Tom Lane pisze: > > > > Marek Florianczyk <franki@tpi.pl> writes: > > > > > We are building hosting with apache + php ( our own mod_virtual > > > > > module ) with about 10.000 wirtul domains + PostgreSQL. > > > > > PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB > > > > > RAM scsi raid 1+0 ) > > > > > I've made some test's - 3000 databases and 400 clients connected at > > > > > same time. > > > > > > > > You are going to need much more serious iron than that if you want to > > > > support 10000 active databases. The required working set per > > > > database is a couple hundred K just for system catalogs (I don't have > > > > an exact figure in my head, but it's surely of that order of > > > > magnitude). > > > > > > it's about 3.6M > > > > > > > So the > > > > system catalogs alone would require 2 gig of RAM to keep 'em swapped > > > > in; never mind caching any user data. > > > > > > > > The recommended way to handle this is to use *one* database and > > > > create 10000 users each with his own schema. That should scale a lot > > > > better. > > > > > > > > Also, with a large max_connections setting, you have to beware that > > > > your kernel settings are adequate --- particularly the open-files > > > > table. It's pretty easy for Postgres to eat all your open files > > > > slots. PG itself will usually survive this condition just fine, but > > > > everything else you run on the machine will start falling over :-(. > > > > For safety you should make sure that max_connections * > > > > max_files_per_process is comfortably less than the size of the > > > > kernel's open-files table. > > > > > > Yes, I have made some updates, number of process, semaphores, and file > > > descriptor. I'm aware of this limitation. On this machine there will be > > > only PostgreSQL, nothing else. > > > This idea with one database and 10.000 schemas is very interesting, I > > > never thought about that. I will make some tests on monday and send > > > results to the list. > > > > Following this logic, if you are willing to place the authentication in > > front of the database instead of inside it you can use a connection pool > > and simply change the search_path each time a new user accesses the > > database. > > Well it's not so simple, I think. If I've got apache+php+phpAccelerator > with persistent connection on. Server holds some pool of connection, but > when new request is made from phpscripts, apache looks at his connection > pool for a connection with parameters: dbname,dbuser,dbserver. So for > each of 10.000 virtual domain ( 10.000 databases ) I would have to hold > such a connection. > Second thing: How to change search_path dynamically ? I can set in > postgresql.conf: "search_path '$user, public'" but it works when a new > client (username,password) is connecting to server, it gets his own > schema with proper privileges (USE,CREATE) and thats all. Right ? search_path documentation is here: http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=ddl-schemas.html > Or maybe I don't uderstand something ? If I will do in pg_hba.conf only > one record: > host any any ip_addr netmask md5 > and only one database, I must make 10.000 schemas with proper accesslist > (USE,CREATE only for one user, and schemaname is same as dbusername) > This is what I want to test ;) > > Now user connect from phpscript with dbusername=unique_user > dbpass=unique_pass dbname=shared_db > Server holds persistent connection, but it's still one connection per > user, so it would have to have 10.000 simultaneous connection. > > I can't see any benefits, with connection pool, or I did not understand > what you wanted to tell me. How to place authentication in front of the > database using, when clients are using phpscripts ? I suppose I didn't really explain what I was thinking. The senario I was thinking of would go something like this: User logins (ssh, etc...) if available would be PAM based. The user/customer creation process would create a new schema in the single database with the username for web/shell/ftp/etc logins. Postgresql can also use PAM for logins and this would allow logins to the database from outside your web app. The web app would always connect to the database as a user with access to all schemas, but would look at the session authentication information to change the active search path to be [username],public. In the case of shell (psql) logins, the default search path would be $user,public as the docs show. If the schemas are created with an AUTORIZATION of the [username] then local (psql) logins would only allow them to see thier schema. But, because the web app is connecting as a user with privileges that allow it to see (the tables in) all schemas, it can act as any user by changing its search_path on a connection by connection basis. > > greetings > Marek > > > > greeings > > > Marek > > > > > > > > > > > > ---------------------------(end of > > > broadcast)--------------------------- TIP 2: you can get off all lists > > > at once with the unregister command (send "unregister > > > YourEmailAddressHere" to majordomo@postgresql.org) -- Mike Rylander
On Fri, 31 Oct 2003, Naomi Walker wrote: > We have a similar issue regarding security. Some of the access to our > database will be by ODBC connections for reporting purposes (ie. Actuate > Report/Crystal Reports). Without creating a zillion or so views (which I > suspect carries with it alot of overhead), I believe it would be tricky to > maintain security. > > Our application is medical related, and we are bound by HIPAA rules, so > security is most important. How would you architect this scenario so our > ASP customers cannot see each others data? > > Naomi Can you not assign each role/user a different username? If not, I suspect views are your best bet, although I'm sure others can come up with other approaches. The overhead for a view is, as I understand it, no more than the overhead for the underlying query. So, for instance, if you're omitting columns or using a WHERE clause to pick out data for a particular user, you're probably not doing anything that the front end application wouldn't do, anyway. Setting them up if a different story, but depending on HIPAA rules, writing a generator for that might not be too bad. ODBC doesn't have a layer of application sitting between the user and the database. I've seen proxies for ODBC which you may be able to customize - check freshmeat.net. I've never had to deal with HIPAA rules, so I'm not sure what requirements you have to merge with the requirements for using, say, Crystal Reports, so I'm not sure if this would work or not (and it would take some coding in either case). HTH, -j -- Jamie Lawrence jal@jal.org It it ain't broke, let me have a shot at it.
Hi all I was asking for help, a week ago. Performace tests took mi some more time because of other things that I had to do. The problem was: tune PostgreSQL to work with 10.000 databases. Tom Lane (thanks) suggested solution: one database and 10.000 schemas. From now I will write switch "database" - thats mean one database, but different schema. I don't know how much traffic there will be on production site, so I tested 3.000 schemas and 400 clients simultaneously connected. Each "database" has 4 tables (int,text,int) with 1.000 records and no idexes. Each client was doing: 10 x connect,"select * from table[rand(1-4)] where number=[rand(1-1000)]",disconnect--(fetch one row) 5 x connect,"select * from table[rand(1-4)] where position=[rand(1-5)]","update table[rand(1-4)] set text='some text', position='integer[rand(1-5)] where number=[rand(1-1000)]",disconnect( fetch 250 row/update one row) 1 x connect,"update table[rand(1-4)] set text='some text', position='[rand(1-5)]'",disconnect(update 250 rows) 1 x connect,"select * from table1 where position in (select position from table2 where number in (select number from table3))",disconnect after that client switch to another "database", and start testing from the beginning. During this test I was changing some parameters in postgres, and send kill -HUP ( pg_ctl reload ). I still don't know what settings will be best for me, except "shared buffers", and some kernel and shell settings. I noticed that queries like: "\d table1" "\di" "\dp" are extremly slow, when 400 clients is connected not even postgres user can't do that query - why, how to improve that? Will it be a problem ? Below, there are some results of this test, and postgresql.conf settings I didn't change random_page_cost because on this test machine I've got only one scsi disk for /data directory. Postgres use 90% of the 4 cpus and takes 2GB RAM but load average doesn't jump over 10-20, so it works better with lot of schemas, than with a lot of db's Maybe some suggestion of my postgresql.conf settings? And why queries "\d" are so extremly slow? Thank You Marek [PostgreSQL] max_connections = 512 shared_buffers = 65536 max_fsm_relations = 10000 max_fsm_pages = 100000 max_locks_per_transaction = 512 wal_buffers = 32 sort_mem = 16384 vacuum_mem = 8192 effective_cache_size = 1000000 random_page_cost = 4 [kernel] kernel/shmmni = 8192 kernel/shmall = 134217728 kernel/shmmax = 536870912 [test] times in sec. (dbname) (conn. time) (q = queries) (1 row)(250 rows)(triple join)(update 250 rows)(update 1000 rows) test2374: connect:1 q_fast:4 q_med:0 q_slow:46 q_upd:0 q_upd_all:33 test2347: connect:1 q_fast:4 q_med:1 q_slow:48 q_upd:1 q_upd_all:32 test2351: connect:0 q_fast:4 q_med:2 q_slow:49 q_upd:0 q_upd_all:31 test2373: connect:0 q_fast:5 q_med:0 q_slow:46 q_upd:0 q_upd_all:25 [PostgreSQL] max_connections = 512 shared_buffers = 4096 max_fsm_relations = 1000 max_fsm_pages = 10000 max_locks_per_transaction = 512 wal_buffers = 32 sort_mem = 16384 vacuum_mem = 8192 effective_cache_size = 1000000 random_page_cost = 4 [test] test2430: connect:0 q_fast:2 q_med:1 q_slow:40 q_upd:0 q_upd_all:17 test2425: connect:0 q_fast:2 q_med:0 q_slow:45 q_upd:0 q_upd_all:20 test2434: connect:0 q_fast:2 q_med:0 q_slow:44 q_upd:0 q_upd_all:23 test2435: connect:1 q_fast:2 q_med:0 q_slow:50 q_upd:0 q_upd_all:18 [PostgreSQL] max_fsm_relations = 2000 max_fsm_pages = 20000 [test] test2171: connect:0 q_fast:3 q_med:1 q_slow:42 q_upd:1 q_upd_all:20 test2177: connect:1 q_fast:3 q_med:0 q_slow:43 q_upd:0 q_upd_all:21 test2166: connect:1 q_fast:3 q_med:1 q_slow:39 q_upd:0 q_upd_all:20 test2165: connect:1 q_fast:3 q_med:1 q_slow:42 q_upd:0 q_upd_all:24 test2162: connect:1 q_fast:3 q_med:1 q_slow:39 q_upd:1 q_upd_all:23
On 05 Nov 2003 14:33:33 +0100 Marek Florianczyk <franki@tpi.pl> wrote: > > During this test I was changing some parameters in postgres, and send > kill -HUP ( pg_ctl reload ). I still don't know what settings will be > best for me, except "shared buffers", and some kernel and shell > settings. > as far as I know, -HUP won't make things like shared buffer changes take. you need a full restart of PG. .. but your numbers are different... I guess it did take. huh. .. how much disk IO is going on during these tests? (vmstat 1) Any swapping (also shown in vmstat) Where any of these tables analyze'd? I see you used no indexes, so on each of your tables it must do a seq scan. Try adding an index to your test tables and rerun.. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
W liście z śro, 05-11-2003, godz. 14:48, Jeff pisze: > On 05 Nov 2003 14:33:33 +0100 > Marek Florianczyk <franki@tpi.pl> wrote: > > > > > During this test I was changing some parameters in postgres, and send > > kill -HUP ( pg_ctl reload ). I still don't know what settings will be > > best for me, except "shared buffers", and some kernel and shell > > settings. > > > > as far as I know, -HUP won't make things like shared buffer changes > take. you need a full restart of PG. > .. > but your numbers are different... I guess it did take. huh. Well, I'm not sure, but I only did pg_ctl reload > .. > > how much disk IO is going on during these tests? (vmstat 1) > Any swapping (also shown in vmstat) I was watching iostat 1, and it shows about 600 tps, so it's not much, and when we do raid(1+0) on production machine, disk will go fine. > > Where any of these tables analyze'd? > I see you used no indexes, so on each of your tables it must do a seq > scan. Try adding an index to your test tables and rerun.. No they weren't analyzed, and I did not indexes specially. I'm testing postgres to work as sql engine for a hosting environment, these databases will be used by users=lamers, so many of them will not do any indexes. I wanted to make a test really close to reality, and see how many databases I can take on single machine. One database with 3.000 schemas works better than 3.000 databases, but there is REAL, BIG problem, and I won't be able to use this solution: Every query, like "\d table" "\di" takes veeeeeeery long time. Users have to have phpPgAdmin wich I modified to suit our needs, but now it doesn't work, not even log-in. If I rewrite phpPgAdmin to log users without checking all schemas, and tables within schemas, none of users will be able to examine structure of table. Query like "\d table" from psql monitor takes about 2-5 MINUTES :( I see that only option is to create one database for every user, and monitor traffic, and machine load to see when we need another PC and another PostgreSQL... Marek
On Wed, Nov 05, 2003 at 16:14:59 +0100, Marek Florianczyk <franki@tpi.pl> wrote: > One database with 3.000 schemas works better than 3.000 databases, but > there is REAL, BIG problem, and I won't be able to use this solution: > Every query, like "\d table" "\di" takes veeeeeeery long time. > Users have to have phpPgAdmin wich I modified to suit our needs, but now > it doesn't work, not even log-in. If I rewrite phpPgAdmin to log users > without checking all schemas, and tables within schemas, none of users > will be able to examine structure of table. Query like "\d table" from > psql monitor takes about 2-5 MINUTES :( Analyzing the system tables will likely make these queries go faster.
On Wed, Nov 05, 2003 at 08:48:52AM -0500, Jeff wrote: > > as far as I know, -HUP won't make things like shared buffer changes > take. you need a full restart of PG. It definitely will not. Anything that can only be set on "startup" actually means startup. > but your numbers are different... I guess it did take. huh. No, that probably just means that there is more variability to runs than people like to imagine. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
W liście z śro, 05-11-2003, godz. 17:24, Andrew Sullivan pisze: > On Wed, Nov 05, 2003 at 08:48:52AM -0500, Jeff wrote: > > > > as far as I know, -HUP won't make things like shared buffer changes > > take. you need a full restart of PG. > > It definitely will not. Anything that can only be set on "startup" > actually means startup. > > > but your numbers are different... I guess it did take. huh. > > No, that probably just means that there is more variability to runs > than people like to imagine. It look's like I will have to make this tests again. It's not big deal, just when I solve slow "\d table" problem. VACUUM ANALYZE in progress... thanks, Marek
Marek Florianczyk <franki@tpi.pl> writes: > Each client was doing: > 10 x connect,"select * from table[rand(1-4)] where > number=[rand(1-1000)]",disconnect--(fetch one row) Seems like this is testing the cost of connect and disconnect to the exclusion of nearly all else. PG is not designed to process just one query per connection --- backend startup is too expensive for that. Consider using a connection-pooling module if your application wants short-lived connections. > I noticed that queries like: "\d table1" "\di" "\dp" are extremly slow, I thought maybe you'd uncovered a performance issue with lots of schemas, but I can't reproduce it here. I made 10000 schemas each containing a table "mytab", which is about the worst case for an unqualified "\d mytab", but it doesn't seem excessively slow --- maybe about a quarter second to return the one mytab that's actually in my search path. In realistic conditions where the users aren't all using the exact same table names, I don't think there's an issue. regards, tom lane
W liście z śro, 05-11-2003, godz. 17:18, Bruno Wolff III pisze: > On Wed, Nov 05, 2003 at 16:14:59 +0100, > Marek Florianczyk <franki@tpi.pl> wrote: > > One database with 3.000 schemas works better than 3.000 databases, but > > there is REAL, BIG problem, and I won't be able to use this solution: > > Every query, like "\d table" "\di" takes veeeeeeery long time. > > Users have to have phpPgAdmin wich I modified to suit our needs, but now > > it doesn't work, not even log-in. If I rewrite phpPgAdmin to log users > > without checking all schemas, and tables within schemas, none of users > > will be able to examine structure of table. Query like "\d table" from > > psql monitor takes about 2-5 MINUTES :( > > Analyzing the system tables will likely make these queries go faster. I've made: VACUUM FULL; ANALYZE; and it works better, but no revelation, when I do "\d schemaname.table" it's better. I've to still wait about 10-30 sec. and now it's only 100 clients connected. :( Marek
On 05 Nov 2003 19:01:38 +0100 Marek Florianczyk <franki@tpi.pl> wrote: > and it works better, but no revelation, when I do "\d > schemaname.table" it's better. I've to still wait about 10-30 sec. and > now it's only 100 clients connected. :( > So it only goes slow with hundred(s) of clients connecting and disconnecting? I'm guessing the CPU is pretty pegged on the box. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
W liście z śro, 05-11-2003, godz. 18:59, Tom Lane pisze: > Marek Florianczyk <franki@tpi.pl> writes: > > Each client was doing: > > > 10 x connect,"select * from table[rand(1-4)] where > > number=[rand(1-1000)]",disconnect--(fetch one row) > > Seems like this is testing the cost of connect and disconnect to the > exclusion of nearly all else. PG is not designed to process just one > query per connection --- backend startup is too expensive for that. > Consider using a connection-pooling module if your application wants > short-lived connections. You right, maybe typical php page will have more queries "per view" How good is connection-pooling module when connection from each virtual site is uniq? Different user and password, and differen schemas and permissions, so this connect-pooling module would have to switch between users, without reconnecting to database? Impossible ? > > > I noticed that queries like: "\d table1" "\di" "\dp" are extremly slow, > > I thought maybe you'd uncovered a performance issue with lots of > schemas, but I can't reproduce it here. I made 10000 schemas each > containing a table "mytab", which is about the worst case for an > unqualified "\d mytab", but it doesn't seem excessively slow --- maybe > about a quarter second to return the one mytab that's actually in my > search path. In realistic conditions where the users aren't all using > the exact same table names, I don't think there's an issue. But did you do that under some database load ? eg. 100 clients connected, like in my example ? When I do these queries "\d" without any clients connected and after ANALYZE it's fast, but only 100 clients is enough to lengthen query time to 30 sec. :( I've 3000 schemas named: test[1-3000] and 3000 users named test[1-3000] in each schema there is four tables (table1 table2 table3 table4 ) each table has 3 column (int,text,int) and some of them has also indexes. If you want, I will send perl script that forks to 100 process and perform my queries. greetings Marek > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
W liście z śro, 05-11-2003, godz. 19:23, Jeff pisze: > On 05 Nov 2003 19:01:38 +0100 > Marek Florianczyk <franki@tpi.pl> wrote: > > > and it works better, but no revelation, when I do "\d > > schemaname.table" it's better. I've to still wait about 10-30 sec. and > > now it's only 100 clients connected. :( > > > > So it only goes slow with hundred(s) of clients connecting and > disconnecting? > > I'm guessing the CPU is pretty pegged on the box. Thats not exacly what you think, my test script ( some post earlier ) is doing queries, look at the results below, queries are very fast, just these with "\d" ale slow: test2364: connect:0 q_fast:0 q_med:0 q_slow:15 q_upd:0 q_upd_all:0 test2346: connect:0 q_fast:0 q_med:0 q_slow:17 q_upd:0 q_upd_all:7 test2347: connect:0 q_fast:0 q_med:0 q_slow:15 q_upd:0 q_upd_all:9 test2350: connect:0 q_fast:0 q_med:0 q_slow:16 q_upd:0 q_upd_all:8 test2349: connect:0 q_fast:1 q_med:0 q_slow:15 q_upd:0 q_upd_all:8 test2343: connect:0 q_fast:1 q_med:0 q_slow:17 q_upd:0 q_upd_all:7 test2351: connect:0 q_fast:0 q_med:0 q_slow:17 q_upd:0 q_upd_all:9 output from command: top 02:20:00 up 1 day, 6:19, 2 users, load average: 6.62, 6.67, 6.60 130 processes: 123 sleeping, 7 running, 0 zombie, 0 stopped CPU states: 82.4% user, 16.2% system, 0.0% nice, 1.4% idle Mem: 2070084K total, 2052932K used, 17152K free, 78048K buffers Swap: 995988K total, 6272K used, 989716K free, 1771140K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 18775 postgres 18 0 5532 5440 4688 S 8.7 0.2 0:01 postmaster 19001 postgres 15 0 5484 5392 4664 S 8.3 0.2 0:00 postmaster 19085 postgres 13 0 5492 5400 4680 S 8.3 0.2 0:00 postmaster 18831 postgres 16 0 5532 5440 4680 S 7.9 0.2 0:01 postmaster 19114 postgres 19 0 5436 5344 4624 S 7.9 0.2 0:00 postmaster
Marek Florianczyk <franki@tpi.pl> writes: > But did you do that under some database load ? eg. 100 clients > connected, like in my example ? When I do these queries "\d" without any > clients connected and after ANALYZE it's fast, but only 100 clients is > enough to lengthen query time to 30 sec. :( Then it's not \d's fault --- you simply don't have enough horsepower to support 100 concurrent clients, regardless of what specific query you're testing. You might find that not reconnecting so often would improve matters; I'm sure that a lot of your cycles are being taken by backend startup. regards, tom lane
W liście z śro, 05-11-2003, godz. 19:34, Tom Lane pisze: > Marek Florianczyk <franki@tpi.pl> writes: > > But did you do that under some database load ? eg. 100 clients > > connected, like in my example ? When I do these queries "\d" without any > > clients connected and after ANALYZE it's fast, but only 100 clients is > > enough to lengthen query time to 30 sec. :( > > Then it's not \d's fault --- you simply don't have enough horsepower to > support 100 concurrent clients, regardless of what specific query you're > testing. > > You might find that not reconnecting so often would improve matters; > I'm sure that a lot of your cycles are being taken by backend startup. Maybe reconnect is to often, but how to explain that reular queries like select * from table1 ale much faster than \d's ? ( my post to Jeff ) Marek
Marek Florianczyk <franki@tpi.pl> writes: > Maybe reconnect is to often, but how to explain that reular queries like > select * from table1 ale much faster than \d's ? ( my post to Jeff ) [ further experimentation... ] Ah-hah, I see the problem in 7.3, though not in 7.4 which is what I was testing to begin with. 7.4 is smarter about optimizing the LEFT JOINs that are used in \d's queries. regards, tom lane
W liście z śro, 05-11-2003, godz. 19:52, Tom Lane pisze: > Marek Florianczyk <franki@tpi.pl> writes: > > Maybe reconnect is to often, but how to explain that reular queries like > > select * from table1 ale much faster than \d's ? ( my post to Jeff ) > > [ further experimentation... ] Ah-hah, I see the problem in 7.3, though > not in 7.4 which is what I was testing to begin with. 7.4 is smarter > about optimizing the LEFT JOINs that are used in \d's queries. > So how do you think sould I try v7.4 on production machine and wait for stable version? Or better use v7.3 with 3000 databases? I have to do that till december. Will this optimization solve problem "\d" queries, or just speed it up few seconds ? greetings Marek
I'm trying to do a query: select ta.x from ta join tb using (y) where z = 'somevalue' FOR UPDATE Why can't this be executed without error in 7.3.x? It worked just fine in 7.2.x. Thanks for the input
Kris Kiger <kris@musicrebellion.com> writes: > I'm trying to do a query: select ta.x from ta join tb using (y) where z > = 'somevalue' FOR UPDATE > Why can't this be executed without error in 7.3.x? It worked just fine > in 7.2.x. Thanks for the input Try saying "FOR UPDATE OF ta, tb". I agree there's no real reason the system should be that pedantic, however. Will see about fixing this for 7.4. regards, tom lane
On 5 Nov 2003, Marek Florianczyk wrote: > W li¶cie z ¶ro, 05-11-2003, godz. 19:52, Tom Lane pisze: > > Marek Florianczyk <franki@tpi.pl> writes: > > > Maybe reconnect is to often, but how to explain that reular queries like > > > select * from table1 ale much faster than \d's ? ( my post to Jeff ) > > > > [ further experimentation... ] Ah-hah, I see the problem in 7.3, though > > not in 7.4 which is what I was testing to begin with. 7.4 is smarter > > about optimizing the LEFT JOINs that are used in \d's queries. > > > > So how do you think > sould I try v7.4 on production machine and wait for stable version? > Or better use v7.3 with 3000 databases? > I have to do that till december. > Will this optimization solve problem "\d" queries, or just speed it up > few seconds ? 7.4 just went RC1 and is looking good for release in <2 weeks. I'd test it to see if it works for you. I've found the .0 releases in postgresql to be good enough for production, even if they do have one or two minor bugs that pop up. Take frequent backups (you should anyway) and you'll probably be alright. Heck, you're already pushing the performance envelope with 3,000 users, might as well go for the faster of the two and you'll have one less scheduled upgrade ahead of you. When do you need to go live? If it's >1 month, then I'd definitely recommend 7.4.
> Heck, you're already pushing the performance envelope with 3,000 users, > might as well go for the faster of the two and you'll have one less > scheduled upgrade ahead of you. > > When do you need to go live? If it's >1 month, then I'd definitely > recommend 7.4. heh... ;) PostgreSQL 7.4 is so fu*#$%^ fast!!! Unbelievable... I've made test as usual ( some post earlier ) 3.000 schemas and 300 simultaneously connected. I've tuned postgresql.conf with my friend who is our sql guru ( but he runs oracle usualy ) 7.4 is so fast, that sometimes clients ( laptop with full X11 workstation and celeron 700 ) could not keep up forking perl script to test new "database" ;) my conf: ----------------------- max_connections = 512 shared_buffers = 32768 sort_mem = 2048 vacuum_mem = 20480 max_fsm_pages = 589824 max_fsm_relations = 32768 fsync = false wal_sync_method = fsync wal_buffers = 1024 checkpoint_segments = 4 checkpoint_timeout = 1800 checkpoint_warning = 30 commit_delay = 10000 commit_siblings = 2 effective_cache_size = 131072 random_page_cost = 4 log_connections = true log_duration = true log_pid = true log_statement = true log_timestamp = true search_path = '$user' max_locks_per_transaction = 512 -------------------------------- from this test 4 tables(int,text,int) 1000 rows in each, no indexes ------------------------------------------------------------------ [test] times in sec. (dbname) (conn. time) (q = queries) (1row)(250rows)(tripleJoin)(update250rows)(update1000rows) test2291: connect:1 q_fast:1 q_med:0 q_slow:4 q_upd:0 q_upd_all:9 test2260: connect:0 q_fast:1 q_med:0 q_slow:4 q_upd:0 q_upd_all:10 test2274: connect:0 q_fast:1 q_med:0 q_slow:4 q_upd:0 q_upd_all:8 test2296: connect:0 q_fast:1 q_med:0 q_slow:6 q_upd:0 q_upd_all:6 test2283: connect:0 q_fast:1 q_med:0 q_slow:4 q_upd:0 q_upd_all:8 test2302: connect:0 q_fast:1 q_med:0 q_slow:4 q_upd:0 q_upd_all:8 test2290: connect:0 q_fast:1 q_med:0 q_slow:3 q_upd:0 q_upd_all:8 test2287: connect:0 q_fast:1 q_med:0 q_slow:6 q_upd:0 q_upd_all:6 test2267: connect:0 q_fast:1 q_med:0 q_slow:1 q_upd:0 q_upd_all:11 ----------------------------------------------------------------- the "\d" queries works under this load just fine! Now, I just have to modify phpPgAdmin (it's for users to modify their own "database" ), I don't know why when I select to database it's try to fetch all tablenames from all schemas. From log: --------------------------------------------------------------------- 2003-11-06 22:53:06 [8880] LOG: statement: SET SEARCH_PATH TO "test998" 2003-11-06 22:53:06 [8880] LOG: duration: 1.207 ms 2003-11-06 22:53:06 [8880] LOG: statement: SELECT tablename, tableowner FROM pg_catalog.pg_tables WHERE schemaname='test998' ORDER BY tablename 2003-11-06 22:53:06 [8880] LOG: duration: 31.005 ms 2003-11-06 22:53:06 [8880] LOG: statement: SET SEARCH_PATH TO "test999" 2003-11-06 22:53:06 [8880] LOG: duration: 1.202 ms 2003-11-06 22:53:06 [8880] LOG: statement: SELECT tablename, tableowner FROM pg_catalog.pg_tables WHERE schemaname='test999' ORDER BY tablename 2003-11-06 22:53:06 [8880] LOG: duration: 30.604 ms ---------------------------------------------------------------- I should go alive with this hosting at the end of the month, but at the beginning we shouldn't have many customer, so we decide to try v7.4 in beta now, and wait for official release. ... And my management says, that there is no good support for Open Source, heh... ;))) thanks all Marek
On 06 Nov 2003 15:21:03 +0100 Marek Florianczyk <franki@tpi.pl> wrote: > fsync = false HOLD THE BOAT THERE BATMAN! I would *STRONGLY* advise not running with fsync=false in production as PG _CANNOT_ guaruntee data consistancy in the event of a hardware failure. It would sure suck to have a power failure screw up your nice db for the users! > wal_buffers = 1024 This also seems high. come to think about it- shared_buffers is also high. > commit_delay = 10000 I could also read to data loss, but you'll get a speed increase on inserts. One of the best things you can do to increase insert speed is a nice, battery backed raid card with a pile of disks hanging off of it. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
W liście z czw, 06-11-2003, godz. 15:37, Jeff pisze: > On 06 Nov 2003 15:21:03 +0100 > Marek Florianczyk <franki@tpi.pl> wrote: > > > > fsync = false > > HOLD THE BOAT THERE BATMAN! > > I would *STRONGLY* advise not running with fsync=false in production as > PG _CANNOT_ guaruntee data consistancy in the event of a hardware > failure. It would sure suck to have a power failure screw up your nice > db for the users! Sure I know, but with WAL it will make fsync every some? seconds, right? Maybe users data, aren't so critical ;) it's not for bank, only for www sites. I will try with fsync=true also. > > > > wal_buffers = 1024 > > This also seems high. come to think about it- shared_buffers is also > high. > > > commit_delay = 10000 > > I could also read to data loss, but you'll get a speed increase on > inserts. > > One of the best things you can do to increase insert speed is a nice, > battery backed raid card with a pile of disks hanging off of it. we will put 4 disks for /data directory ( raid1+0 ) so it will have performance and fault tolerance, so it should be OK. greetings Marek
Marek Florianczyk <franki@tpi.pl> writes: > W li�cie z czw, 06-11-2003, godz. 15:37, Jeff pisze: >> I would *STRONGLY* advise not running with fsync=false in production as >> PG _CANNOT_ guaruntee data consistancy in the event of a hardware >> failure. It would sure suck to have a power failure screw up your nice >> db for the users! > Sure I know, but with WAL it will make fsync every some? seconds, right? No. fsync = false turns off fsync of WAL. It's okay for development but not when you actually care about integrity of your data. regards, tom lane
On Thu, 6 Nov 2003, Jeff wrote: > On 06 Nov 2003 15:21:03 +0100 > Marek Florianczyk <franki@tpi.pl> wrote: > > > > fsync = false > > HOLD THE BOAT THERE BATMAN! > > I would *STRONGLY* advise not running with fsync=false in production as > PG _CANNOT_ guaruntee data consistancy in the event of a hardware > failure. It would sure suck to have a power failure screw up your nice > db for the users! Note that if you're on an IDE drive and you haven't disabled the write cache, you may as well turn off fsync as well, as it's just getting in the way and doing nothing, i.e. the IDE drives are already lying about fsync so why bother. Step the first, get on SCSI / or a good IDE RAID controller, then step the second, turn fsync back on. Without reliable storage, fsync is a dunsel.
Marek Florianczyk wrote, on Thursday, 11/06/03: > >... And my management says, that there is no good support for Open >Source, heh... ;))) In my experience, there is better support for Open Source than Closed Source when it comes to development (and usually all around). David Green Sage Automation, Inc.
On 6 Nov 2003, Marek Florianczyk wrote: > > ... And my management says, that there is no good support for Open > Source, heh... ;))) That's because your "support" needs are different. A developer wants answers and solutions, a manager often wants someone to blame. :-)
David Green wrote: >Marek Florianczyk wrote, on Thursday, 11/06/03: > > >>... And my management says, that there is no good support for Open >>Source, heh... ;))) >> >> > >In my experience, there is better support for Open Source than >Closed Source when it comes to development (and usually all around). > > There are two different kinds of support: One is the support for code/fixes. This is probably better in OSS. But there is also the support for a product. And here is still a big gap. Many oracle, db2, mssql, etc. consultants are available but only few OSS consultants/companies which are able to help in case of problems. They made a survey here in CH under top managers. And this is the answer they gave: They do not know where to get support. E.g. there is no company called PostgreSQL (Switzerland) Ltd. ------------------------------------------------------- Oli Sennhauser Database-Engineer (Oracle & PostgreSQL) Rebenweg 6 CH - 8610 Uster / Switzerland Phone (+41) 1 940 24 82 or Mobile (+41) 79 450 49 14 e-Mail oli.sennhauser@bluewin.ch Website http://mypage.bluewin.ch/shinguz/PostgreSQL/ Secure (signed/encrypted) e-Mail with a Free Personal SwissSign ID: http://www.swisssign.ch Import the SwissSign Root Certificate: http://swisssign.net/cgi-bin/trust/import
Attachment
threshar@torgo.978.org (Jeff) writes: > On 06 Nov 2003 15:21:03 +0100 > Marek Florianczyk <franki@tpi.pl> wrote: > >> fsync = false > > HOLD THE BOAT THERE BATMAN! > > I would *STRONGLY* advise not running with fsync=false in production as > PG _CANNOT_ guaruntee data consistancy in the event of a hardware > failure. It would sure suck to have a power failure screw up your nice > db for the users! On one of our test servers, I set "fsync=false", and a test load's load time dropped from about 90 minutes to 3 minutes. (It was REALLY update heavy, with huge numbers of tiny transactions.) Which is, yes, quite spectacularly faster. But also quite spectacularly unsafe. I'm willing to live with the risk on a test box whose purpose is _testing_; it's certainly not a good thing to do in production. -- let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
scott.marlowe wrote: > Note that if you're on an IDE drive and you haven't disabled the write > cache, you may as well turn off fsync as well, as it's just getting in the > way and doing nothing, i.e. the IDE drives are already lying about fsync > so why bother. What about Serial ATA?
Hi, Christopher Browne wrote, On 11/6/2003 4:40 PM: > threshar@torgo.978.org (Jeff) writes: >> On 06 Nov 2003 15:21:03 +0100 >> Marek Florianczyk <franki@tpi.pl> wrote: >> >>> fsync = false >> >> HOLD THE BOAT THERE BATMAN! >> >> I would *STRONGLY* advise not running with fsync=false in production as >> PG _CANNOT_ guaruntee data consistancy in the event of a hardware >> failure. It would sure suck to have a power failure screw up your nice >> db for the users! > > On one of our test servers, I set "fsync=false", and a test load's > load time dropped from about 90 minutes to 3 minutes. (It was REALLY > update heavy, with huge numbers of tiny transactions.) > > Which is, yes, quite spectacularly faster. But also quite > spectacularly unsafe. > > I'm willing to live with the risk on a test box whose purpose is > _testing_; it's certainly not a good thing to do in production. There is something like: set fsync to off; or set fsync to on; But it says: 'fsync' cannot be changed now. However could be very useful to set this option from sql, not just from config. Tom Lane probably knows why :) C.
* Christopher Browne (cbbrowne@libertyrms.info) wrote: > On one of our test servers, I set "fsync=false", and a test load's > load time dropped from about 90 minutes to 3 minutes. (It was REALLY > update heavy, with huge numbers of tiny transactions.) > > Which is, yes, quite spectacularly faster. But also quite > spectacularly unsafe. > > I'm willing to live with the risk on a test box whose purpose is > _testing_; it's certainly not a good thing to do in production. Would it be possible to have the effectively done for a specific transaction? If this was done as a single large transaction could there be an option to say "don't fsync this until it's all done and then do it all" or something? Just looking for a way to get the 'best of both worlds'... Stephen
Attachment
Quoth sfrost@snowman.net (Stephen Frost): > * Christopher Browne (cbbrowne@libertyrms.info) wrote: >> On one of our test servers, I set "fsync=false", and a test load's >> load time dropped from about 90 minutes to 3 minutes. (It was >> REALLY update heavy, with huge numbers of tiny transactions.) >> >> Which is, yes, quite spectacularly faster. But also quite >> spectacularly unsafe. >> >> I'm willing to live with the risk on a test box whose purpose is >> _testing_; it's certainly not a good thing to do in production. > > Would it be possible to have the effectively done for a specific > transaction? If this was done as a single large transaction could > there be an option to say "don't fsync this until it's all done and > then do it all" or something? Just looking for a way to get the > 'best of both worlds'... Oh, for sure, the whole thing could be invoked as one giant transaction, which would reduce the cost dramatically. But it diminishes the value of the benchmark for my purposes. It's useful to measure how costly those individual transactions are. -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','cbbrowne.com'). http://www.ntlug.org/~cbbrowne/oses.html "As far as Saddam Hussein being a great military strategist, he is neither a strategist, nor is he schooled in the operational arts, nor is he a tactician, nor is he a general, nor is he as a soldier. Other than that, he's a great military man, I want you to know that." -- General Norman Schwarzkopf, 2/27/91
On Thu, 6 Nov 2003, William Yu wrote: > scott.marlowe wrote: > > Note that if you're on an IDE drive and you haven't disabled the write > > cache, you may as well turn off fsync as well, as it's just getting in the > > way and doing nothing, i.e. the IDE drives are already lying about fsync > > so why bother. > > What about Serial ATA? I haven't gotten my hands on one yet to test. We might be getting some in in the next few months where I work and I'll test them and report back here then.