Thread: How to debug performance problems
Hi, I'm running a web application using Zope that obtains all data from a PostgreSQL 7.4 database (Debian Sarge system with package 7.4.7-6sarge4 on an "older" Sparc machine, equipped with 2GB memory and two processors E250 server). Once I did some performance tuning and found out that max_connections = 256 shared_buffers = 131072 sort_mem = 65536 would help for a certain application (that is now not running any more on this machine, but I left these parameters in /etc/postgresql/postgresql.conf untouched. My web application was running fine for years without any problem and the performance was satisfying. Some months ago I added a table containing 4500000 data rows (all other used tables are smaller by order of magnitudes) so nothing very large and this table is not directly accessed in the web application (just some genereated caching tables updated once a day. Some functions and small tables were added as well, but there was a stable core over several years. Since about two weeks the application became *drastically* slower and I urgently have to bring back the old performance. As I said I'm talking about functions accessing tables that did not increased over several years and should behave more or less the same. I wonder whether adding tables and functions could have an influence on other untouched parts and how to find out what makes the things slow that worked for years reliable and satisfying. My first try was to switch back to the default settings of the current Debian package maintainers /etc/postgresql/postgresql.conf leaving the parameters above untouched but this did not changed anything. I'm quite clueless even how to explain the problem correctly and I'm hoping you will at least find information enouth to ask me "the right questions" to find out the information you need to track down the performance problems. Kind regards and thanks for any help Andreas.
Andreas Tille wrote: > My web application was running fine for years without any problem > and the performance was satisfying. Some months ago I added a > table containing 4500000 data rows ... > > Since about two weeks the application became *drastically* slower > and I urgently have to bring back the old performance. As I said > I'm talking about functions accessing tables that did not increased > over several years and should behave more or less the same. Don't assume that the big table you added is the source of the problem. It might be, but more likely it's something elseentirely. You indicated that the problem didn't coincide with creating the large table. There are a number of recurring themes on this discussion group: * A long-running transaction keeps vacuum from working. * A table grows just enough to pass a threshold in the planner and a drastically different plan is generated. * An index has become bloated and/or corrupted, and you need to run the REINDEX command. And several other common problems. The first thing is to find out which query is taking a lot of time. I'm no expert, but there have been several explanationson this forum recently how to find your top time-consuming queries. Once you find them, then EXPLAIN ANALYZEshould get you started Craig
On Mon, 2007-02-19 at 11:50 +0100, Andreas Tille wrote: > Hi, > > I'm running a web application using Zope that obtains all data > from a PostgreSQL 7.4 database (Debian Sarge system with package > 7.4.7-6sarge4 on an "older" Sparc machine, equipped with 2GB Upgrade to 8.2.3 if possible, or at least to 7.4.16. This is a basic question, but do you VACUUM ANALYZE regularly? 7.4 is before autovacuum was integrated in the core. If you don't do this you could have a lot of wasted space in your tables causing unneeded I/O, and the planner might be making bad plans. > memory and two processors E250 server). Once I did some performance > tuning and found out that > > max_connections = 256 > shared_buffers = 131072 > sort_mem = 65536 > You're allocating 50% of the physical memory to shared buffers. That's not necessarily too much, but that's on the high side of the normal range. Does the total size of all of your tables and indexes add up to enough to exhaust your physical memory? Check to see if you have any exceptionally large tables or indexes. You can do that easily with pg_relation_size('a_table_or_index') and pg_total_relation_size ('a_table'). > Since about two weeks the application became *drastically* slower > and I urgently have to bring back the old performance. As I said > I'm talking about functions accessing tables that did not increased > over several years and should behave more or less the same. > > I wonder whether adding tables and functions could have an influence > on other untouched parts and how to find out what makes the things > slow that worked for years reliable and satisfying. My first try You need to provide queries, and also define "slower". Set log_min_duration_statement to some positive value (I often start with 1000) to try to catch the slow statements in the logs. Once you have found the slow statements, do an EXPLAIN and an EXPLAIN ANALYZE on those statements. That will tell you exactly what you need to know. Regards, Jeff Davis
On Mon, 2007-02-19 at 12:18, Jeff Davis wrote: > On Mon, 2007-02-19 at 11:50 +0100, Andreas Tille wrote: > > Hi, > > > > I'm running a web application using Zope that obtains all data > > from a PostgreSQL 7.4 database (Debian Sarge system with package > > 7.4.7-6sarge4 on an "older" Sparc machine, equipped with 2GB > > Upgrade to 8.2.3 if possible, or at least to 7.4.16. What Jeff said ++ > > This is a basic question, but do you VACUUM ANALYZE regularly? 7.4 is > before autovacuum was integrated in the core. If you don't do this you > could have a lot of wasted space in your tables causing unneeded I/O, > and the planner might be making bad plans. Look into vacuum full followed by reindex to fix the bloat. Then schedule regular vacuums (regular, not full). > > memory and two processors E250 server). Once I did some performance > > tuning and found out that > > > > max_connections = 256 > > shared_buffers = 131072 > > sort_mem = 65536 > > > > You're allocating 50% of the physical memory to shared buffers. That's > not necessarily too much, but that's on the high side of the normal > range. For 7.4 that's far too much. Very few installations running 7.4 will be faster as you go past 10000 to 20000 buffers. 131072 is probably slowing the machine down instead of speeding it up, as the buffer cache algo in 7.4 was not that good with large amounts of memory.
I'd like to have a toolbox prepared for when performance goes south. I'm clueless. Would someone mind providing some detail about how to measure these four items Craig listed: 1. The first thing is to find out which query is taking a lot of time. 2. A long-running transaction keeps vacuum from working. 3. A table grows just enough to pass a threshold in the planner and a drastically different plan is generated. 4. An index has become bloated and/or corrupted, and you need to run the REINDEX command. Thx. On Wed, Aug 30, 2006 at 11:45:06AM -0700, Jeff Frost wrote: > On Wed, 30 Aug 2006, Joe McClintock wrote: > > >I ran a vacuum, analyze and reindex on the database with no change in > >performance, query time was still 37+ sec, a little worse. On our test > >system I found that a db_dump from production and then restore brought the > >database back to full performance. So in desperation I shut down the > >production application, backed up the production database, rename the > >production db, create a new empty production db and restored the > >production backup to the empty db. After a successful db restore and > >restart of the web application, everything was then up and running like a > >top. > > Joe, > > I would guess that since the dump/restore yielded good performance once > again, a VACUUM FULL would have also fixed the problem. How are your FSM > settings in the conf file? Can you run VACUUM VERBOSE and send us the last > 10 or so lines of output? > > A good article on FSM settings can be found here: > > http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10087&cNode=5K1C3W > > You probably should consider setting up autovacuum and definitely should > upgrade to at least 8.0.8 if not 8.1.4 when you get the chance. > > When you loaded the new data did you delete or update old data or was it > just a straight insert? > > -- > Jeff Frost, Owner <jeff@frostconsultingllc.com> > Frost Consulting, LLC http://www.frostconsultingllc.com/ > Phone: 650-780-7908 FAX: 650-649-1954 > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- On Mon, Feb 19, 2007 at 10:02:46AM -0800, Craig A. James wrote: > Andreas Tille wrote: > >My web application was running fine for years without any problem > >and the performance was satisfying. Some months ago I added a > >table containing 4500000 data rows ... > > > >Since about two weeks the application became *drastically* slower > >and I urgently have to bring back the old performance. As I said > >I'm talking about functions accessing tables that did not increased > >over several years and should behave more or less the same. > > Don't assume that the big table you added is the source of the problem. It > might be, but more likely it's something else entirely. You indicated that > the problem didn't coincide with creating the large table. > > There are a number of recurring themes on this discussion group: > > * A long-running transaction keeps vacuum from working. > > * A table grows just enough to pass a threshold in the > planner and a drastically different plan is generated. > > * An index has become bloated and/or corrupted, and you > need to run the REINDEX command. > > And several other common problems. > > The first thing is to find out which query is taking a lot of time. I'm no > expert, but there have been several explanations on this forum recently how > to find your top time-consuming queries. Once you find them, then EXPLAIN > ANALYZE should get you started > Craig > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- You have no chance to survive make your time.
Ray, > I'd like to have a toolbox prepared for when performance goes south. > I'm clueless. Would someone mind providing some detail about how to > measure these four items Craig listed: I hope I didn't give the impression that these were the only thing to look at ... those four items just popped into my head,because they've come up repeatedly in this forum. There are surely more things that could be suspect; perhaps otherscould add to your list. You can find the answers to each of the four topics I mentioned by looking through the archives of this list. It's a lotof work. It would be really nice if there was some full-time employee somewhere whose job was to monitor this group andpull out common themes that were put into a nice, tidy manual. But this is open-source development, and there is no suchperson, so you have to dig in and find it yourself. Craig
On Wed, Feb 21, 2007 at 08:09:49AM -0800, Craig A. James wrote: > I hope I didn't give the impression that these were the only thing to look > at ... those four items just popped into my head, because they've come up > repeatedly in this forum. There are surely more things that could be > suspect; perhaps others could add to your list. I'm only clueless about the details of pg, not db perf concepts. Really, a mechanism to determine where the system is spending the response time is key. As you pointed out, the added table may not be the issue. In fact, if you can't measure where the db time is being spent you will be lucky to fix a performance issue, since you don't really know what resources need to be addressed. > so you have to dig in and find it yourself. this afternoon, maybe.
Ray Stell wrote: > I'd like to have a toolbox prepared for when performance goes south. > I'm clueless. Would someone mind providing some detail about how to > measure these four items Craig listed: > > 1. The first thing is to find out which query is taking a lot of time. > > 2. A long-running transaction keeps vacuum from working. > > 3. A table grows just enough to pass a threshold in the > planner and a drastically different plan is generated. I just ran into a variation of this: 3.5 A table grows so large so that VACUUMING it takes extremely long, interfering with the general performance of the system. In our case, we think the table had about 36 million rows when it hit that threshold. I'm now working on a better solution for that table. Mark