Thread: Still confused about VACUUM vs. VACUUM FULL
[Apologies if you have seen this before. I just discovered that posting to the group via NNTP, at Teranews.com, apparently isn't working since my posts aren't showing up in the mailing list archives.] We are trying to run down some performance differences between our production system (where we run a VACUUM FULL every night on nearly every table) and our test system (where we have stopped doing the VACUUM FULL, but still do VACUUM ANALYZE). For large (>1 million rows) tables which have a pretty high turn-over (average life span of a row is 3 days), should there be any query performance differences whether you VACUUM FULL or not? -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
Jeff Boes <jboes@nexcerpt.com> writes: > For large (>1 million rows) tables > which have a pretty high turn-over (average life span of a row is 3 > days), should there be any query performance differences whether you > VACUUM FULL or not? How often do you VACUUM? Do you have enough FSM space to support the number of pages that get dirtied between vacuums? If you don't, the physical table size will bloat over time, leading to progressive slowdown. regards, tom lane
On Thu, 2003-06-12 at 12:42, Tom Lane wrote: > Jeff Boes <jboes@nexcerpt.com> writes: > > For large (>1 million rows) tables > > which have a pretty high turn-over (average life span of a row is 3 > > days), should there be any query performance differences whether you > > VACUUM FULL or not? > > How often do you VACUUM? Do you have enough FSM space to support the > number of pages that get dirtied between vacuums? If you don't, the > physical table size will bloat over time, leading to progressive > slowdown. > We've gone from daily, to twice daily, to several times during the "peak updates" period, and back to twice daily. We're also ANALYZE-ing the largest 12-18 tables on a cycle: every twenty minutes, a daemon wakes up and ANALYZEs until they're all done or two minutes has elapsed, whichever comes first. max_fsm_relations = 200 max_fsm_pages = 350000 We have around 220 tables total, only 40 of which have more than 1000 pg_class.reltuples. -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
Jeff Boes <jboes@nexcerpt.com> writes: > On Thu, 2003-06-12 at 12:42, Tom Lane wrote: >> How often do you VACUUM? > We've gone from daily, to twice daily, to several times during the "peak > updates" period, and back to twice daily. I suspect that the real problem here is the old open transactions discussed in your other message. Those are preventing VACUUM from reclaiming space. Now that you've got that fixed, you need to revisit your experiments about how often to VACUUM. > We're also ANALYZE-ing the largest 12-18 tables on a cycle: every twenty > minutes, a daemon wakes up and ANALYZEs until they're all done or two > minutes has elapsed, whichever comes first. That sounds a tad excessive; are the statistics really changing that fast? regards, tom lane
On Thu, 2003-06-12 at 13:16, Tom Lane wrote: > > We're also ANALYZE-ing the largest 12-18 tables on a cycle: every twenty > > minutes, a daemon wakes up and ANALYZEs until they're all done or two > > minutes has elapsed, whichever comes first. > > That sounds a tad excessive; are the statistics really changing that > fast? Well, I have some convincing evidence on this. One table at the center of some of our biggest, hairiest queries uses an index on a timestamp. Generally, the queries run looking back about 24 hours. We are inserting 40,000 rows a day (and deleting the same number, but the deletes happen all at once, and the inserts happen during nearly every part of the clock). I've done explain select * from foo where the_time < <some-timestamp>; and found that I could slice it down to a one-minute interval or so: before 11:42 AM, and the optimizer uses a sequential scan; after, and it uses the index. And of course it stays at that point, even if another 10,000 rows get inserted with current timestamps, until it's ANALYZEd again. So two or three ANALYZEs per hour is not excessive, if it will keep the index usable under the "right" circumstances. -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
Jeff Boes <jboes@nexcerpt.com> writes: > On Thu, 2003-06-12 at 13:16, Tom Lane wrote: >> That sounds a tad excessive; are the statistics really changing that >> fast? > Well, I have some convincing evidence on this. Ah. Well, you might consider analyzing just that table (maybe even just its timestamp column) every few minutes, but I still doubt that it's buying you much for most of the database. regards, tom lane