Thread: VACUUM ANALYZE extremely slow
This is my first (and, by the love of the God, last) project w/pgsql and everything but the simplest selects is so slow I want to cry. This is especially bad with vacuum analyze - it takes several hours for a database of mere 15 Gb on a fast double-core server w/2Gb of RAM and virtually no workload at the moment. Maintenance work mem is set to 512 Mb. Is there any way to speed up ANALYZE? Without it all the queries run so slow that I want to cry after a couple of hours of operation and with it system has to go down for hours per day and that is unacceptable. The same database running on mysql on basically the same server used to run optimize table on every table every half an hour without any problem, I am actually pondering scraping half the work on the conversion and stuff and going back to mysql but I wonder if there's some way to improve it.
On Jun 17, 2007, at 2:15 PM, Sergei Shelukhin wrote: > This is my first (and, by the love of the God, last) project w/pgsql > and everything but the simplest selects is so slow I want to cry. > This is especially bad with vacuum analyze - it takes several hours > for a database of mere 15 Gb on a fast double-core server w/2Gb of RAM > and virtually no workload at the moment. Maintenance work mem is set > to 512 Mb. > > Is there any way to speed up ANALYZE? Without it all the queries run > so slow that I want to cry after a couple of hours of operation and > with it system has to go down for hours per day and that is > unacceptable. There's no need to stop the database to run vacuum or analyze (heck, with autovacuum in modern versions of postgresql you can get away without running them at all, sometimes). I suspect you're doing the wrong thing (perhaps running vacuum full, rather than plain vacuum). Can you tell us what version of postgresql you're running and how you're vacuuming it (what commands you're running)? Cheers, Steve
Sergei Shelukhin escribió: > The same database running on mysql on basically the same server used > to run optimize table on every table every half an hour without any > problem, I am actually pondering scraping half the work on the > conversion and stuff and going back to mysql but I wonder if there's > some way to improve it. Why waste time on it? Just move it to mysql since it's so good, and be done with it. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" "Someone said that it is at least an order of magnitude more work to do production software than a prototype. I think he is wrong by at least an order of magnitude." (Brian Kernighan)
On Jun 17, 2007, at 2:15 PM, Sergei Shelukhin wrote: >> This is my first (and, by the love of the God, last) project w/pgsql >> and everything but the simplest selects is so slow I want to cry. >> This is especially bad with vacuum analyze - it takes several hours >> for a database of mere 15 Gb on a fast double-core server w/2Gb of RAM >> and virtually no workload at the moment. Maintenance work mem is set >> to 512 Mb. I have a 23 GB PG 8.1 db running on a 3 year old dual processor Dell running Red Hat Linux, 2GB RAM. My tests ran on our production database while it was running (albeit not under a heavy load) with no obvious slowdown for users using our application. PG was using 1.2% of the CPU, and noted the server had been running for 322 days. I'm not sure if there's a way to make vacuum use more processor time to speed it up or not. "ANALYZE" took 1 minute 14 seconds. "VACUUM" took significantly longer at 24 minutes 22 seconds. I noted that we hadn't set maintenance_work_mem so it presumably is using the default of 16384 (16MB). I'm sure much depends on how many tables, how many deleted rows, etc. are in your db, but "several hours" certainly is nothing like what we see. David
In response to Sergei Shelukhin <realgeek@gmail.com>: > This is my first (and, by the love of the God, last) project w/pgsql One has to ask, are you actually looking for help, or trolling? If you honestly want help, I would suggest you work on your communication skills first. If you're a troll, go away. -- Bill Moran http://www.potentialtech.com
On Sun, 17 Jun 2007, Sergei Shelukhin wrote: > Is there any way to speed up ANALYZE? Without it all the queries run > so slow that I want to cry after a couple of hours of operation and > with it system has to go down for hours per day and that is > unacceptable. I've found I cry a lot less if I actually spend a minute educating myself about things instead of complaining to a mailing list in a fashion offensive to the people who might help me. You should start with http://www.postgresql.org/docs/current/static/sql-vacuum.html where you'll discover the amazing fact that VACUUM ANALYZE doesn't require taking down the system at all. The way you get it to take less time is to do it more often. In fact, if you're using a PostgreSQL version where you can set up auto-vacuum correctly, you might not ever need to run it manually at all. Here are the things to consider if you actually want some help here: 0) You might as well keep this threading going, but next time, post to the performance list instead of the general one; it's more appropriate and you'll get a better mix of people familiar with this sort of topic. 1) Give some more details about the non-default values in your postgresql.conf file. http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm will get you started on the most important ones to worry about and pointers to additional resources. You say you've increased the maintenance working mem; that's a good start, but if everything else is at the default you're not taking advantage of all the memory in your system. MySQL manages memory very differently, and those differences may be the root cause of your issue. 2) State what version of PostgreSQL you're running. If it's older than a recent 8.1 release, there are many known and unresolvable performance issues and you may have to upgrade to get what you're looking for. 3) Give some information about your disk configuration, and some tests results to confirm they're working normally if possible. http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm has an outline of how to do those tests. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Sergei Shelukhin wrote: > This is my first (and, by the love of the God, last) project w/pgsql > and everything but the simplest selects is so slow I want to cry. > This is especially bad with vacuum analyze - it takes several hours > for a database of mere 15 Gb on a fast double-core server w/2Gb of RAM > and virtually no workload at the moment. Maintenance work mem is set > to 512 Mb. > > I noticed you didn't mention your disk subsystem. PostgreSQL tends to use a fair bit of disk I/O when running vacuum and / or analyze. If you check with top / iostat while vacuum analyze is running, I'm betting you'll see a lot of waiting on I/O going on. You do know those two commands (vacuum and analyze) aren't married anymore, right? You can run analyze all by itself if you want? And I hope you're not running vacuum analyze full all the time, cause there's usually no need for that. Look up pg_autovacuum. Saves a lot of har pulling. > Is there any way to speed up ANALYZE? Analyze is usually REALY fast. Even on my rather pokey workstation, with a single SATA hard drive and other things to do, I can run analyze on a 31 Gig database in > Without it all the queries run > so slow that I want to cry after a couple of hours of operation and > with it system has to go down for hours per day and that is > unacceptable. > You should only need to run analyze every so often. You should only need vacuum after lots of updates / deletes. You should not need to take the system down to vacuum, as vacuum doesn't block. Vacuum full does block, but if you need that you either aren't vacuuming often enough or you don't have the autovacuum daemon configured. > The same database running on mysql on basically the same server used > to run optimize table on every table every half an hour without any > problem, I am actually pondering scraping half the work on the > conversion and stuff and going back to mysql but I wonder if there's > some way to improve it. > And when you ran optimize on those tables, were they not locked for regular users the whole time? There may be a way to improve it. Tell us, what OS are you running, what are your non-default postgresql.conf settings, what ACTUAL commands are you running here? Vacuum, vacuum analyze, vacuum full analyze? Are you inserting / deleting / updating tons of rows between vacuums and / or analyzes? > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
errr... workmem is 256Mb of course, and 5m for explain analyze costs.
Hi. Sorry for being a bit emotional, I was pretty constructive in my earlier posts (the earlier, the more constructive if you care to search) but I am progressively getting pissed off :( Thanks for the initial tip, running ANALYZE w/o vacuum is faster. Are frequent vacuums even necessary if there are virtually no deletes in the system and updates are much less frequent compared to inserts? Now for the answers. * What other non-default configuration settings do you have? I played w/shared buffers, setting them between 16k and 32k,~ 24k seems to be the best but the difference is minimal. The work_mem setting is 256kb, and I increased effective cache size to ~700Mb (~35% ram) based on a tip from some article. max_fsm_* is increased too to accomodate vacuum analyze runs. * What version of PostgreSQL are you using and on what OS? 8.1 and Debian What kind of disks and controllers do you have? Umm... Unfortunately I do not know, and do not have access to check :) THe development server probably has a fast consumer grade HDD, not a RAID I suppose. * My "system has to go down" remark was probably a bit exaggerated; the system is noticeably slower when the vacuum is running and that is basically without workload; it will probably become unusable under load. * Specific queries that were slow: there are generally no specific queries, everything runs slowly, mostly due to estimates being grossly inaccurate (like 800k cost for a complex query based on Explain and then 5, actual cost based on e.analyze) I was not able to find any special bottlenecks in the queries. I come from MSSQL (and a bit of MySQL) background, and the only thing I noticed is that there are a lot of seqscans in the places where indexes should be used (imo), when I turn seqscans off indexes are ridiculously slow (I have already posted about it), presumably due to random access used to build a big result set; however, result set is not in fact big in real queries and postgres supposedly doesn't do some magic that MSSQL does (join order or something) that makes the join that should utilize the index narrower and thus faster. There are also some weird development-stage-specific bottlenecks like adding a column to 34m row table and updating it to default value (I wasn't able to wait till the end of that query, it probably runs for days); I think I understand why it is slow based on forum posts, but again, MSSQL and MySQL do it better. Also there's a general impression of everything being slow even after MySQL on the same server; starting from seqscans to aggregate operations; I won't be able to supply concrete evidence tho (not yet anyway), it's just the gut feeling.
Sergei Shelukhin escribió: > * What other non-default configuration settings do you have? > I played w/shared buffers, setting them between 16k and 32k,~ 24k > seems to be the best but the difference is minimal. The work_mem > setting is 256kb, and I increased effective cache size to ~700Mb (~35% > ram) based on a tip from some article. > max_fsm_* is increased too to accomodate vacuum analyze runs. work_mem = 256 MB is probably too much; you might be filling your RAM with it, causing the system to swap. Try decreasing it. This would be particularly noticeable under load. Keep in mind that work_mem is per sort/hash and per backend, i.e. if you have a query which does 3 sorts and is being executed by 5 processes in parallel, you will have 3 * 5 * 256 MB = 3840 MB of RAM in active use. > * My "system has to go down" remark was probably a bit exaggerated; > the system is noticeably slower when the vacuum is running and that is > basically without workload; it will probably become unusable under > load. You can set the vacuum_cost_* settings in order to reduce the impact of vacuum in the available I/O. > * Specific queries that were slow: there are generally no specific > queries, everything runs slowly, mostly due to estimates being grossly > inaccurate (like 800k cost for a complex query based on Explain and > then 5, actual cost based on e.analyze) You might want to try increasing statistic targets for the problematic columns. It would be useful to choose one of these and send them along for dissection here on the lists, to investigate the actual problems. Misestimations are a problem at times, but most of the time there are workarounds. -- Alvaro Herrera http://www.advogato.org/person/alvherre "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)