Thread: Interpreting vacuum verbosity
If I see VACUUM ANALYZE VERBOSE output like this... INFO: --Relation public.foo-- INFO: Index idx_foo_bar: Pages 219213; Tuples 28007: Deleted 9434. CPU 17.05s/4.58u sec elapsed 3227.62 sec. ...am I correct in reading this to say that it took more than 53 minutes (3227 secs) to get 17 seconds of CPU time? Is this an indicator of possible I/O contention? What else would account for this if my CPUs are clearly not very busy? TIA.
"Ed L." <pgsql@bluepolka.net> writes: > If I see VACUUM ANALYZE VERBOSE output like this... > INFO: --Relation public.foo-- > INFO: Index idx_foo_bar: Pages 219213; Tuples 28007: Deleted 9434. > CPU 17.05s/4.58u sec elapsed 3227.62 sec. > ...am I correct in reading this to say that it took more than 53 minutes > (3227 secs) to get 17 seconds of CPU time? Is this an indicator of > possible I/O contention? More like "your disk drives are being pounded into the ground" ? It's hard to evaluate this without knowing what else is going on in your system at the same time. In general a pure VACUUM process *ought* to be I/O bound. But without any additional data it's hard to say if 200:1 CPU vs I/O ratio is reasonable or not. Were other things happening at the same time, and if so did they seem bogged down? What sort of hardware is this on anyway? regards, tom lane
On Thursday May 6 2004 10:30, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > If I see VACUUM ANALYZE VERBOSE output like this... > > > > INFO: --Relation public.foo-- > > INFO: Index idx_foo_bar: Pages 219213; Tuples 28007: Deleted 9434. > > CPU 17.05s/4.58u sec elapsed 3227.62 sec. > > > > ...am I correct in reading this to say that it took more than 53 > > minutes (3227 secs) to get 17 seconds of CPU time? Is this an > > indicator of possible I/O contention? > > More like "your disk drives are being pounded into the ground" ? > > It's hard to evaluate this without knowing what else is going on in your > system at the same time. In general a pure VACUUM process *ought* to be > I/O bound. But without any additional data it's hard to say if 200:1 > CPU vs I/O ratio is reasonable or not. Were other things happening at > the same time, and if so did they seem bogged down? What sort of > hardware is this on anyway? There was a ton of other activity; tens to hundreds of inserts and deletes occurring per second. Lots of bogged down, ridiculously slow queries: 30-second selects on a 500-row table immediately after ANALYZE finished on the table, absurdly long inserts, etc. This is a SmartArray 5i/32 RAID5 device with some sort of Dell RAID controller, I believe, 160mb/s, dual 3.2GHz xeons, plenty of RAM. Some s/w redesign cut the I/O very signficantly, but it was still ridiculously slow. After seeing the VACUUM ANALYZE VERBOSE output for the most troublesomely slow table, and noticing 2.5M unused tuples there, we decided to drop/recreate/reload that table to reclaim the space and on the hunch that it might be related. We did that in a transaction without any customer downtime, and upon reloading, the system was blazing fast again. Joy. That was cool. I guess the activity just totally outran the ability of autovac to keep up. I was under the impression that unused tuples were only a diskspace issue and not such a performance issue, but maybe the live data just got so fragmented that it took forever to perform small scans over so many pages?
"Ed L." <pgsql@bluepolka.net> writes: > I guess the activity just totally outran the ability of autovac to keep up. Could you have been bit by autovac's bug with misreading '3e6' as '3'? If you don't have a recent version it's likely to fail to vacuum large tables often enough. regards, tom lane
On Friday May 7 2004 9:09, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > I guess the activity just totally outran the ability of autovac to keep > > up. > > Could you have been bit by autovac's bug with misreading '3e6' as '3'? > If you don't have a recent version it's likely to fail to vacuum large > tables often enough. No, our autovac logs the number of changes (upd+del for vac, upd+ins+del for analyze) on each round of checks, and we can see it was routinely performing when expected. The number of updates/deletes just far exceeded the thresholds. Vac threshold was 2000, and at times there might be 300,000 outstanding changes in the 10-30 minutes between vacuums. Given the gradual performance degradation we saw over a period of days if not weeks, and the extremely high numbers of unused tuples, I'm wondering if there is something like a data fragmentation problem occurring in which we're having to read many many disk pages to get just a few tuples off each page? This cluster has 3 databases (2 nearly idle) with a total of 600 tables (about 300 in the active database). Gzipped dumps are 1.7GB. max_fsm_relations = 1000 and max_fsm_pages = 10000. The pattern of ops is a continuous stream of inserts, sequential scan selects, and deletes.
"Ed L." <pgsql@bluepolka.net> writes: > No, our autovac logs the number of changes (upd+del for vac, upd+ins+del for > analyze) on each round of checks, and we can see it was routinely > performing when expected. The number of updates/deletes just far exceeded > the thresholds. Vac threshold was 2000, and at times there might be > 300,000 outstanding changes in the 10-30 minutes between vacuums. Well, in that case you probably want a lot less than "10-30 minutes" between vacuums, at least for this particular table. I don't know how one configures autovac for this, but I suppose it can be done ... > max_fsm_relations = 1000 and max_fsm_pages = 10000. Also you doubtless need max_fsm_pages a lot higher than that. A conservative setting would make it as big as your whole database, eg for a 10Gb disk footprint use 10Gb/8K (something upwards of a million) FSM page slots. regards, tom lane
On Friday May 7 2004 11:25, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > No, our autovac logs the number of changes (upd+del for vac, > > upd+ins+del for analyze) on each round of checks, and we can see it was > > routinely performing when expected. The number of updates/deletes just > > far exceeded the thresholds. Vac threshold was 2000, and at times > > there might be 300,000 outstanding changes in the 10-30 minutes between > > vacuums. > > Well, in that case you probably want a lot less than "10-30 minutes" > between vacuums, at least for this particular table. I don't know how > one configures autovac for this, but I suppose it can be done ... This period is the minimum time it takes to vacuum or analyze every table that "needs it" in round-robin fashion. Sometimes it is much shorter (seconds), sometimes longer, depending on how much upd/del/ins activity there has been. That seems too long/slow. > > max_fsm_relations = 1000 and max_fsm_pages = 10000. > > Also you doubtless need max_fsm_pages a lot higher than that. A > conservative setting would make it as big as your whole database, > eg for a 10Gb disk footprint use 10Gb/8K (something upwards of > a million) FSM page slots. Ah, OK. Two questions: 1) I'm inclined to set this to handle as large a DB footprint as will be in the coming year or two, so maybe 3X what it is now. What is the impact/cost of setting max_fsm_pages at, say, 3M for an 8GB footprint? (3 x 8GB/8K) 2) Would this low setting of 10000 explain the behavior we saw of seqscans of a perfectly analyzed table with 1000 rows requiring ridiculous amounts of time even after we cutoff the I/O load?
On Friday May 7 2004 12:20, Ed L. wrote: > > 1) I'm inclined to set this to handle as large a DB footprint as will be > in the coming year or two, so maybe 3X what it is now. What is the > impact/cost of setting max_fsm_pages at, say, 3M for an 8GB footprint? > (3 x 8GB/8K) Ok, so I see 40B per, so setting it to 3M ==> 3M * 40 = 120MB of additional RAM usage for this? Any other impacts with which to be concerned?
On Friday May 7 2004 12:23, Ed L. wrote: > On Friday May 7 2004 12:20, Ed L. wrote: > > 1) I'm inclined to set this to handle as large a DB footprint as will > > be in the coming year or two, so maybe 3X what it is now. What is the > > impact/cost of setting max_fsm_pages at, say, 3M for an 8GB footprint? > > (3 x 8GB/8K) > > Ok, so I see 40B per, so setting it to 3M ==> 3M * 40 = 120MB of > additional RAM usage for this? Any other impacts with which to be > concerned? Sorry, I see that's *6B* per, so setting it to 3M ==> 18MB, which is trivial for the benefit. Any other concerns in setting this too high?
"Ed L." <pgsql@bluepolka.net> writes: > Sorry, I see that's *6B* per, so setting it to 3M ==> 18MB, which is trivial > for the benefit. Any other concerns in setting this too high? Not that I know of. regards, tom lane
At some point in time, tgl@sss.pgh.pa.us (Tom Lane) wrote: > >> max_fsm_relations = 1000 and max_fsm_pages = 10000. > >Also you doubtless need max_fsm_pages a lot higher than that. A >conservative setting would make it as big as your whole database, >eg for a 10Gb disk footprint use 10Gb/8K (something upwards of >a million) FSM page slots. At some point, someone was going to write a "white paper" detailing how one might go about setting these parameters. If that someone has done so, I'd love to hear about it. If that someone hasn't ... well, how much beer would we have to provide to get you to talk? 8-) -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
"Ed L." <pgsql@bluepolka.net> writes: > 2) Would this low setting of 10000 explain the behavior we saw of seqscans > of a perfectly analyzed table with 1000 rows requiring ridiculous amounts > of time even after we cutoff the I/O load? Possibly. The undersized setting would cause leakage of disk space (that is, new rows get appended to the end of the table even when space is available within the table, because the system has "forgotten" about that space due to lack of FSM slots to remember it in). If the physical size of the table file gets large enough, seqscans will take a long time no matter how few live rows there are. I don't recall now whether your VACUUM VERBOSE results showed that the physical table size (number of pages) was out of proportion to the actual number of live rows. But it sure sounds like that might have been the problem. regards, tom lane
Jeff Boes <jboes@nexcerpt.com> writes: > At some point, someone was going to write a "white paper" detailing how one > might go about setting these parameters. In 7.4, it's relatively easy to check on whether your settings are reasonable: just do a VACUUM VERBOSE (database-wide) and check the FSM requirements indicated at the end of the tediously chatty output. All I have handy to illustrate with is a test server that has only the regression test database loaded in it, so these numbers are very small, but what I see is: INFO: free space map: 280 relations, 520 pages stored; 4720 total pages needed DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory. This says that what I actually need to keep track of the present free space in the database is 280 FSM relation slots and 4720 FSM page slots. So the allocated space is plenty comfy here. If the "pages needed" number is significantly larger than your max_fsm_pages setting, then you have a problem. regards, tom lane
Jeff Boes <jboes@nexcerpt.com> writes: > Tom Lane wrote: >> INFO: free space map: 280 relations, 520 pages stored; 4720 total pages needed >> DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory. > And I would assume that if I have three databases defined on my server, > I should run this for all three and sum the results? No, the quoted results are cluster-wide. It is a good idea to vacuum all three databases and then look at the final results, just to make sure you have reasonably up-to-date info about every table. But you don't need to sum anything. regards, tom lane
On Friday May 7 2004 12:48, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > 2) Would this low setting of 10000 explain the behavior we saw of > > seqscans of a perfectly analyzed table with 1000 rows requiring > > ridiculous amounts of time even after we cutoff the I/O load? > > Possibly. The undersized setting would cause leakage of disk space > (that is, new rows get appended to the end of the table even when space > is available within the table, because the system has "forgotten" about > that space due to lack of FSM slots to remember it in). If the physical > size of the table file gets large enough, seqscans will take a long time > no matter how few live rows there are. I don't recall now whether your > VACUUM VERBOSE results showed that the physical table size (number of > pages) was out of proportion to the actual number of live rows. But it > sure sounds like that might have been the problem. If it were indeed the case that we'd leaked a lot of diskspace, then after bumping max_fsm_pages up to a much higher number (4M), will these pages gradually be "remembered" as they are accessed by autovac and or queried, etc? Or is a dump/reload or 'vacuum full' the only way? Trying to avoid downtime...
On Monday May 10 2004 11:37, Ed L. wrote: > On Friday May 7 2004 12:48, Tom Lane wrote: > > "Ed L." <pgsql@bluepolka.net> writes: > > > 2) Would this low setting of 10000 explain the behavior we saw of > > > seqscans of a perfectly analyzed table with 1000 rows requiring > > > ridiculous amounts of time even after we cutoff the I/O load? > > > > Possibly. The undersized setting would cause leakage of disk space > > (that is, new rows get appended to the end of the table even when space > > is available within the table, because the system has "forgotten" about > > that space due to lack of FSM slots to remember it in). If the > > physical size of the table file gets large enough, seqscans will take a > > long time no matter how few live rows there are. I don't recall now > > whether your VACUUM VERBOSE results showed that the physical table size > > (number of pages) was out of proportion to the actual number of live > > rows. But it sure sounds like that might have been the problem. > > If it were indeed the case that we'd leaked a lot of diskspace, then > after bumping max_fsm_pages up to a much higher number (4M), will these > pages gradually be "remembered" as they are accessed by autovac and or > queried, etc? Or is a dump/reload or 'vacuum full' the only way? Trying > to avoid downtime... I mean, I see that our VACUUM (not full) does appear to be truncating and reducing the number of pages in some cases. Is that possible? If so, just thinking a DB restart will be much less complicated than dropping/reloading the individual table. VACUUM FULL has always been way too slow for our purposes, not sure why. TIA.
"Ed L." <pgsql@bluepolka.net> writes: > If it were indeed the case that we'd leaked a lot of diskspace, then after > bumping max_fsm_pages up to a much higher number (4M), will these pages > gradually be "remembered" as they are accessed by autovac and or queried, > etc? Or is a dump/reload or 'vacuum full' the only way? Trying to avoid > downtime... The next vacuum will add the "leaked" space back into the FSM, once there's space there to remember it. You don't need to do anything drastic, unless you observe that the amount of wasted space is so large that a vacuum full is needed. BTW, these days, a CLUSTER is a good alternative to a VACUUM FULL; it's likely to be faster if the VACUUM would involve moving most of the live data anyway. regards, tom lane