Thread: Troubles with performances
Hi, I use PostGreSQL with a Web server which receive 200 HTTP simultaneous queries. For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones. Queries have been optimized, I have INDEX on my tables... The biggest table has more than 500 000 records. And I have big troubles, PostGreSQL doesn't seem to be able to handle so many queries at the same time. The server is a NetFinity bi-proc 400 MHz with 1Gb RAM... What can I do ?? Thanx in advance Guillaume
>> I use PostGreSQL with a Web server which receive 200 HTTP >> simultaneous queries. > > > With what opts do you start postmaster? -F to disable fsync() I don't know if I have to use the -B and -S options to change the memory use... >> For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones. > Do you use BEGIN/END to run queries in one transaction, when appropriate? Yep... In fact, I have 3 SELECT and the 3 UPDATE in a PL/PGSQL Function...
> I use PostGreSQL with a Web server which receive 200 HTTP > simultaneous queries. With what opts do you start postmaster? > For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones. Do you use BEGIN/END to run queries in one transaction, when appropriate? Vadim
> >> I use PostGreSQL with a Web server which receive 200 HTTP > >> simultaneous queries. > > > > With what opts do you start postmaster? > -F to disable fsync() > I don't know if I have to use the -B and -S options to change > the memory use... With 200 simult connections I would recomment to use -N 256 -B 16384 Vadim
> I checked the Linux FAQ and the only thing related to this > issue says that I don't have shared memory turned on, > which I'm pretty sure I do, considering it's running.. ;) FAQ is wrong - you don't have *enough* of shared memory -:) I don't know how to increase shmem segment size in Linux. Vadim
> > >> I use PostGreSQL with a Web server which receive 200 HTTP > > >> simultaneous queries. > > > > > > With what opts do you start postmaster? > > -F to disable fsync() > > I don't know if I have to use the -B and -S options to change > > the memory use... > > With 200 simult connections I would recomment to use > -N 256 -B 16384 As we are about the same load, and I am always interested in more performance, I increased the value of my -B to what you suggested, but I generate the following error : IpcMemoryCreate: shmget failed (Invalid argument) key=5432001, size=143193088, permission=600 This type of error is usually caused by an improper shared memory or System V IPC semaphore configuration. For more information, see the FAQ and platform-specific FAQ's in the source directory pgsql/doc or on our web site at http://www.postgresql.org. FATAL 1: ShmemCreate: cannot create region I checked the Linux FAQ and the only thing related to this issue says that I don't have shared memory turned on, which I'm pretty sure I do, considering it's running.. ;) This is a PostgreSQL 7.0.3 installed on a Debian (Stable) box, running Linux 2.2.18. The machine has 1GB of memory, and is compiled for 32,768 file descriptors. The startup options I am using for PostgreSQL are : -o -F -B 2048 -N 512 -S 4096 Opinions or suggestions? :) Sincerely, Warren
Warren Vanichuk wrote: > IpcMemoryCreate: shmget failed (Invalid argument) key=5432001, size=143193088, permission=600 > -o -F -B 2048 -N 512 -S 4096 Ok, you have 2048 8K buffers, and up to 512 backends. If you're trying to set SortSize to 4096, you need quotes around it and a -o, AFAIK, asx the -S postmaster option sets silent mode. Further along, According to the shmget man page: The followings are limits on shared memory segment resources affecting a shmget call: SHMALL System wide maximum of shared memory pages: policy dependent. SHMMAX Maximum size in bytes for a shared memory seg ment: implementation dependent (currently 4M). SHMMIN Minimum size in bytes for a shared memory seg ment: implementation dependent (currently 1 byte, though PAGE_SIZE is the effective minimum size). SHMMNI System wide maximum number of shared memory segments: implementation dependent (currently 4096). The implementation has no specific limits for the per pro cess maximum number of shared memory segments (SHMSEG). And from the linux kernel source, includes/asm-i386/shmparam.h: /* * Keep _SHM_ID_BITS as low as possible since SHMMNI depends on it and * there is a static array of size SHMMNI. */ #define _SHM_ID_BITS 9 #define SHM_ID_MASK ((1<<_SHM_ID_BITS)-1) #define SHM_IDX_SHIFT (_SHM_ID_BITS) #define _SHM_IDX_BITS 15 #define SHM_IDX_MASK ((1<<_SHM_IDX_BITS)-1) /* * _SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on the i386 and * SHMMAX <= (PAGE_SIZE << _SHM_IDX_BITS). */ #define SHMMAX 0x2000000 /* max shared seg size (bytes) */ /* Try not to change the default shipped SHMMAX - people rely on it */ ------- Hmmmm.... Looks pretty difficult to change. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
"Mikheev, Vadim" wrote: >FAQ is wrong - you don't have *enough* of shared memory -:) >I don't know how to increase shmem segment size in Linux. If you have /proc/sys/kernel/shmmax, try (as root): cat new_bigger_value > /proc/sys/kernel/shmmax If it won't allow you to do that, you may have to recompile the kernel instead. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "For the eyes of the LORD run to and fro throughout the whole earth, to show himself strong in the behalf of them whose heart is perfect toward him..." II Chronicles 16:9
> FAQ is wrong - you don't have *enough* of shared memory -:) > I don't know how to increase shmem segment size in Linux. asm/shmparam.h Actually, not recommended. -- ������������������
Lamar Owen wrote: > > /* > * _SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on the i386 and > * SHMMAX <= (PAGE_SIZE << _SHM_IDX_BITS). > */ > > #define SHMMAX 0x2000000 /* max shared seg size (bytes) > */ > /* Try not to change the default shipped SHMMAX - people rely on it */ > > ------- > > Hmmmm.... > > Looks pretty difficult to change. echo 134217728 >/proc/sys/kernel/shmmax set's the maximum allowed segment! size to 128M. Shared memory under Linux can be swapped, so if you have 256M physical and run 3 postmasters, you can start all of them with 64M (or more) and while they're idle they most likey consume some swap space. But when it poors, they'll grab the memory they need. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck wrote: > Lamar Owen wrote: > > Looks pretty difficult to change. > echo 134217728 >/proc/sys/kernel/shmmax > set's the maximum allowed segment! size to 128M. Shared > memory under Linux can be swapped, so if you have 256M One of those well-documented /proc deals. Might have guessed. :-) Everything else seems to work that way.... :-) -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
Lamar Owen wrote: > > Warren Vanichuk wrote: > > IpcMemoryCreate: shmget failed (Invalid argument) key=5432001, size=143193088, permission=600 > > > -o -F -B 2048 -N 512 -S 4096 > > Ok, you have 2048 8K buffers, and up to 512 backends. If you're trying > to set SortSize to 4096, you need quotes around it and a -o, AFAIK, asx > the -S postmaster option sets silent mode. > > Further along, > > According to the shmget man page: > [snip] I think these are the ones you're looking for: kleptog/~>grep . `find /proc/sys -name '*shm*'` /proc/sys/kernel/shmall:4194304 /proc/sys/kernel/shmmax:33554432 Just echo new values into there to change them... -- Martijn van Oosterhout <kleptog@cupid.suninternet.com> http://cupid.suninternet.com/~kleptog/
> A few quick thoughts: > 200 simultaneous queries sounds like a lot > to me, and suggests you might be CPU-bound. Yes, in 'top' CPU usage is 99%... > Here are the things you should try: > > 1) Put PostgreSQL on a separate server from your > Web server/application logic. If you are CPU-bound, > then this can greatly improve your throughput. I don't really think my Web Server uses a lot of CPU, because my application is written in a module .so and it does only queries to my database. > 2) After doing #1, watch the CPU load on both database > and web server machines. If your web server is the > bottleneck, try running multiple web servers. I was wondering to have 2 web servers per database server, but if you say that 200 simultaneous queries is a lot.. what about 400 ? :-( > 3) After trying #1, if your database is the bottleneck > and your data will all fit into RAM, try redesigning > your web application to run entirely from memory. > With this design, you would only do UPDATEs on each > page; SELECTs would be done directly from data > you've stored in memory. This dramatically reduces > database traffic. However, it makes it hard to > run multiple web servers and also creates problems if > you have other applications updating the database. I don't know how to put the data in memormy to increase performance... What should I do ? > This assumes, of course, that you've carefully studied > and optimized the performance of your code. In my experience, > application performance is usually the bottleneck, not > the database. Each of the above strategies is appropriate > in different circumstances. Well, with all the performance problems, I tried to optimize my code everywhere I could. All I can do now is to change the way I pass the queries to the database... Thank you for all the advices. Guillaume
At 07:07 PM 1/18/01 +0100, Guillaume Lémery wrote: >I use PostGreSQL with a Web server which receive 200 HTTP simultaneous >queries. >For each HTTP query, I have about 5 SELECT queries and 3 UPDATE ones. Just a shot in the dark: Are you opening and closing a database connection for each query? If you are, I suggest you don't, and instead use persistent database connections, or some form of connection pooling. How many http connections per second are you getting? If it's not many connections per second, but they are taking a long time to complete, there might be ways of reducing the number of simultaneous queries. For example you could use buffering aka "http accelerator"- e.g. put a webcache in front of your webserver. The idea is so that your app (and database) can just spit out the results to the webcache at 100Mbps and not wait for the remote client which is probably <<2Mbps (and 50-500msec away), which will take 50 times longer or more. The webcache will buffer the results and trickle them to the client. However do note that some webcaches (e.g. squid) only buffer up to 8KB before blocking (not sure if you can change that). You need a webcache which can completely buffer your big and popular dynamic webpages (possibly about 50-100KB). Apache mod_proxy can actually be configured to buffer more, but I haven't really tested it in detail. More info about your environment and configuration/architecture could be helpful. e.g. what are you using for the stressed parts - mod_perl, fast-cgi, php, cgi-bin, apache module. Cheerio, Link.
The really big sites scale by employing two strategies: * Parallelize everything. That is design the application so that each important part can run on multiple machines at the same time. For low budget projects this is easy to achieve at the web and application logic tiers but hard for databases. Databases with built in paralellism like Oracle cost a fortune. * Cache everything. For example if your data changes infrequently or changes need not be visible immediately you can send back a cached web page instead of querying the database every time. Servlet engines like Resin (www.caucho.com) do that out of the box. If it is as you said, that the database is really the bottleneck (which is rare by the way), I would go for caching. If caching is not possible because you have very frequent updates and all queries must include the most recent data, you can simulate database parallelism by writing all changes to two database servers in a distributed transaction. You will need an XA capable transaction manager for this (in Java this cheap and easy because you can use an open source EJB server like JBoss or a low cost application server like Orion) Alexander Jerusalem ajeru@gmx.net vknn At 09:38 22.01.01, Guillaume Lémery wrote: >>A few quick thoughts: >> 200 simultaneous queries sounds like a lot >>to me, and suggests you might be CPU-bound. >Yes, in 'top' CPU usage is 99%... > >>Here are the things you should try: >>1) Put PostgreSQL on a separate server from your >> Web server/application logic. If you are CPU-bound, >> then this can greatly improve your throughput. >I don't really think my Web Server uses a lot of CPU, because my >application is written in a module .so and it does only queries to my database. > >>2) After doing #1, watch the CPU load on both database >> and web server machines. If your web server is the >> bottleneck, try running multiple web servers. >I was wondering to have 2 web servers per database server, but if you say >that 200 simultaneous queries is a lot.. what about 400 ? :-( > >>3) After trying #1, if your database is the bottleneck >> and your data will all fit into RAM, try redesigning >> your web application to run entirely from memory. >> With this design, you would only do UPDATEs on each >> page; SELECTs would be done directly from data >> you've stored in memory. This dramatically reduces >> database traffic. However, it makes it hard to >> run multiple web servers and also creates problems if >> you have other applications updating the database. >I don't know how to put the data in memormy to increase performance... >What should I do ? > >>This assumes, of course, that you've carefully studied >>and optimized the performance of your code. In my experience, >>application performance is usually the bottleneck, not >>the database. Each of the above strategies is appropriate >>in different circumstances. >Well, with all the performance problems, I tried to optimize my code >everywhere I could. >All I can do now is to change the way I pass the queries to the database... > >Thank you for all the advices. > >Guillaume
> Just a shot in the dark: > Are you opening and closing a database connection for each query? If you > are, I suggest you don't, and instead use persistent database connections, > or some form of connection pooling. Woops, I forgot to say that of course I use a connection pooling. Just 5 connections are pooled. > How many http connections per second are you getting? 200 > If it's not many connections per second, but they are taking a long time to > complete, there might be ways of reducing the number of simultaneous queries. > > However do note that some webcaches (e.g. squid) only buffer up to 8KB > before blocking (not sure if you can change that). You need a webcache > which can completely buffer your big and popular dynamic webpages (possibly > about 50-100KB). Apache mod_proxy can actually be configured to buffer > more, but I haven't really tested it in detail. I don't handle dynamic pages, but only HTTP redirects, so I think I don't need cache... > More info about your environment and configuration/architecture could be > helpful. e.g. what are you using for the stressed parts - mod_perl, > fast-cgi, php, cgi-bin, apache module. I do not use PHP or CGI because they are too slow. I built an Apache module. I'd like to have response in 200ms max, because it's an application for banners.
Guillaume =?ISO-8859-1?Q?L=E9mery?= <glemery@comclick.com> writes: > Woops, I forgot to say that of course I use a connection pooling. > Just 5 connections are pooled. >> How many http connections per second are you getting? > 200 Uh, so all of those http connections have to wait till one of the five database connections is free? Maybe that's your bottleneck. Try increasing the number of DB connections to something consistent with the traffic level. regards, tom lane
>> Woops, I forgot to say that of course I use a connection pooling. >> Just 5 connections are pooled. >>> How many http connections per second are you getting? >> 200 > > Uh, so all of those http connections have to wait till one of the five > database connections is free? Maybe that's your bottleneck. Try > increasing the number of DB connections to something consistent with > the traffic level. Oddly, increasing the number of connections pooled doesn't increase performance and if I create too much connections (e.g. 15 or 20, performance decrease). I don't know why... because all my queries try to access to the same table and sometime the same record (a problem with the lock ??) Thanx. Guillaume
> > How many http connections per second are you getting? > 200 Huh? 200 connections per second is very different from 200 simultaneous connections. Which is it? > I don't handle dynamic pages, but only HTTP redirects, so I think I > don't need cache... > I do not use PHP or CGI because they are too slow. > I built an Apache module. I'd like to have response in 200ms max, > because it's an application for banners. That should be easy if you simplify your code a LOT. 1) Configure Apache to limit to 10 child processes. (No, that's not a mis-print.) By limiting the number of child processes, you limit CPU load and help ensure fast response. You also limit total memory use. (To really optimize, drop Apache and find a good single-threaded/single-process web server; modify that to build a custom web server just for your app.) 2) When your module is initialized, open the database and read _everything_ into memory. If you're building a banner redirect system, then you probably only have at most a few megabytes of data, so just store it all in memory at startup. From there, just look things up in memory. 3) Don't write logging information to the database; write it to a file. (Designing a good logging system is tricky to do well. Log _files_ are easier to understand and manage and faster to write than trying to send log data to a database. If you need summary information such as hit totals in the database, have a separate program periodically scan the log files to generate such data.) The net result of 2 and 3 is that you won't ever touch the database during normal operation. Logging to files is extremely fast (one disk write for each transaction) and keeping your banner data in memory ensures that you can generate responses very quickly. You should be able to consistently generate responses in under 10ms with this kind of design. (Under 1ms if you do everything exactly right.) The only drawback is that a change in your database data won't immediately impact what's being served; you can deal with this within Apache by setting a limit on the number of hits served per child. That will help encourage Apache child processes to be restarted fairly regularly. > Oddly, increasing the number of connections pooled doesn't increase > performance and if I create too much connections (e.g. 15 or 20, > performance decrease). Of course. Remember that Apache is a forking web server; you've got five pooled connections for _every_process_. That's way too many; a single process handles only one request at a time, you only need to pool one connection in each process. By pooling more connections in each process, you're just asking PostgreSQL to keep a LOT of connections open, which is just a needless drain on the system.
At 02:39 PM 1/22/01 +0100, Guillaume Lémery wrote: >Woops, I forgot to say that of course I use a connection pooling. >Just 5 connections are pooled. > >> How many http connections per second are you getting? >200 So you're having 200 simultaneous connections, AND 200 connections per second (aka hits per second)? Wow that's quite a high load for your system (2 cpu 400MHz x86). 5 selects and 3 updates too! I only managed up to about 100 hits per sec on my test phonebook search app using FastCGI and perl with just a single LIKE selects. Benchmarked using ab with ab -n 10000 -c 200 on a Dell Poweredge 1300 single CPU 500MHz 128MB machine. However I didn't do any tweaking yet. Have you tried apachebench on your app? e.g. ab -n 10000 -c 200 "http://yourserver/UriToBannerApp" What do you get? Is your banner app: http://comtrack.comclick.com/cgi-bin/aff_bandeau.cgi >I don't handle dynamic pages, but only HTTP redirects, so I think I >don't need cache... >I do not use PHP or CGI because they are too slow. >I built an Apache module. I'd like to have response in 200ms max, >because it's an application for banners. Actually in that case the cache might still be useful. Since you're doing redirects they should fit nicely into almost any decent cache's buffer. If the same url always gives the same redirect that makes things even better - you can then tell the cache to cache and not check urls until they are 10 minutes old (or longer). Your webapp then talks to the cache as fast as it can. The cache talks to the client, as slow as the client is usually. With this often a _single_ webapp instance can handle 10 or more clients. BTW php etc can definite respond in faster than 200ms, but nevermind. Cheerio, Link.