Thread: max_fsm_pages Sanity Check
Background: We have quite large production Postgres 7.2 DB which is out of control in terms of disk consumption. We made it thru the holiday shopping season, but it isn't over yet. We have taken the DB down once for a vacuum analyze but only vacuum'd 2 large tables which took FIVE HOURS WITH NO RESULTS. Posts to the newsgroup advised that I crank up the max_fsm_pages. Right now it is at roughly 65,000. So I went to the postgres.org site and searched the newsgroups.... Where I am now: I've searched the newsgroup archives for info on the max_fsm_pages setting and it seems that other than what the max_fsm_pages value means, I found no definitive answer as to exactly "how" to best determine the optimal setting. Nor could I find adequate documentation on it. It seems I saw alot of people asking others to "report back with your findings" but I don't find the results or followup in the newsgroup (please point me to an item if I'm mistaken). Seeing as I'm not entirely into playing guinea pig with my production system.... I welcome expert/been-there-only-non-theoritical advice. Here are some numbers, followed by a few questions: I. Looking at my production DB with the following query: select relname, relpages from pg_class where relkind in ('r', 't', 'i') relname | relpages ---------------------------------+---------- users | 408711 merchant_sessions | 236333 batch_load_awaiting | 173785 orders | 92241 II. But here is the same from my StandBy DB (restored from a pg_dump of production) ..... hmmmm not exactly what I would expect? relname | relpages ---------------------------------+---------- merchant_sessions | 615588 users | 202696 batch_load_awaiting | 143735 orders | 130894 Question: Now, why wouldn't a pg_restore into my standby db have smaller page sizes than the live one which is bloated and consuming tons of disk space? III. The results of a vacuum on the users table (production a couple weekends ago) yielded the below (not including all the index output): 2002-12-15 03:22:18 [22450] NOTICE: Removed 3254600 tuples in 295053 pages. CPU 111.50s/124.03u sec elapsed 2721.98 sec. 2002-12-15 03:22:18 [22450] NOTICE: Pages 408711: Changed 152946, Empty 0; Tup 4126716: Vac 3254600, Keep 0, UnUsed 28559. Total CPU 338.16s/1091.28u sec elapsed 8502.90 sec. Question: So should I hike my fsm up to 1,000,000 pages? Is this too high of a value or will it be ok? If it is too big, then How big is big? I will be most happy to summarize my results back to the newsgroup when I make this change and do a full vacuum. We cannot do a full vacuum without taking the site down which will have to wait till this weekend or next. Thanks in Advance
"HT" <htlevine@ebates.com> writes: > We have quite large production Postgres 7.2 DB which is out of control in > terms of disk consumption. We made it thru the holiday shopping season, > but it isn't over yet. We have taken the DB down once for a vacuum analyze > but only vacuum'd 2 large tables which took FIVE HOURS WITH NO > RESULTS. 1. You don't need to take down the DB to do vacuuming. 2. What do you mean by "WITH NO RESULTS"? > Posts to the newsgroup advised that I crank up the max_fsm_pages. Right > now it is at roughly 65,000. > select relname, relpages from pg_class where relkind in ('r', 't', 'i') > users | 408711 > merchant_sessions | 236333 > batch_load_awaiting | 173785 > orders | 92241 If you have not been vacuuming regularly then these relpages figures cannot be trusted too much, but it looks to me like you might need max_fsm_pages nearer to 1 million than 64k. If it's not large enough to cover all (or at least nearly all) pages with free space, then you'll have space-leakage problems. What is the tuple update/deletion rate in these tables, anyway? Also, you should probably think about updating to 7.3.1 sometime soon. There's a performance problem in the 7.2.* FSM code that shows up when a single table has more than ~10000 pages with useful amounts of free space --- VACUUM takes an unreasonable amount of time to record the free space. regards, tom lane
Thanks for the response. See my responses below. I'll crank it up to 1 million fsm pages. and report back when we finish with the results.... I know they aren't as interesting with 7.2.3 as they would be with 7.3 but it may help someone else. "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message news:10105.1041182337@sss.pgh.pa.us... > "HT" <htlevine@ebates.com> writes: > > We have quite large production Postgres 7.2 DB which is out of control in > > terms of disk consumption. We made it thru the holiday shopping season, > > but it isn't over yet. We have taken the DB down once for a vacuum analyze > > but only vacuum'd 2 large tables which took FIVE HOURS WITH NO > > RESULTS. > > 1. You don't need to take down the DB to do vacuuming. when I tried the vacuum with the site still up, the whole DB came to a stand-still... i.e. the pg_stat_activity table grew and grew and grew.... users couldn't log in, and the site was "broken".... I tried this several times and tho this group says you don't need to take the db down, I found we might as well cause it was so unresponsive to our users that we appeared busted. I'd rather be "down for maintenance" on purpose than appear busted. > 2. What do you mean by "WITH NO RESULTS"? by "no results" I mean the space was NOT freed up, in fact the db consumed MORE space after the vacuum full than before. > > > Posts to the newsgroup advised that I crank up the max_fsm_pages. Right > > now it is at roughly 65,000. > > > select relname, relpages from pg_class where relkind in ('r', 't', 'i') > > users | 408711 > > merchant_sessions | 236333 > > batch_load_awaiting | 173785 > > orders | 92241 > > If you have not been vacuuming regularly then these relpages figures > cannot be trusted too much, but it looks to me like you might need > max_fsm_pages nearer to 1 million than 64k. If it's not large enough > to cover all (or at least nearly all) pages with free space, then you'll > have space-leakage problems. What is the tuple update/deletion rate in > these tables, anyway? Users has a 0 deletion rate, and a fairly low update rate, unless we do a "mass" update of the whole table.... which happens a couple times a year (say sales/mktg want a new user email flag...) Batch_Load_awaiting has hardly no deletions but 100% of the rows are updated ONCE (maybe twice) after they are inserted, then never touched after that. Orders has no deletions, and a small update ratio. Merchant_Sessions has NO deletions or updates. there are tons more tables I didn't put in the list cause they are either small by comparison or completely static. > > Also, you should probably think about updating to 7.3.1 sometime soon. Yes, some bugs may be fixed in 7.3.1, but I fear the ones that may get me in bigger trouble than I am already :) I"m watching posts to this group. When I feel comfortable that users are not reporting bugs or problems against 7.3.1 (or whatever point release is stable) then I will definitly upgrade. > There's a performance problem in the 7.2.* FSM code that shows up when > a single table has more than ~10000 pages with useful amounts of free > space --- VACUUM takes an unreasonable amount of time to record the free > space. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Haven't been following this list too closely over the holiday break, hopefully this can still be of some use to you. On Mon, 2002-12-30 at 13:12, HT Levine wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message > > > > 1. You don't need to take down the DB to do vacuuming. > > > when I tried the vacuum with the site still up, the whole DB came to a > stand-still... i.e. the pg_stat_activity table grew and grew and grew.... > users couldn't log in, and the site was "broken".... I tried this several > times and tho this group says you don't need to take the db down, I found > we might as well cause it was so unresponsive to our users that we appeared > busted. I'd rather be "down for maintenance" on purpose than appear > busted. > Are you doing vacuum full or vacuum analyze? I would think you'd not have that problem with a vacuum analyze, which if your having large amounts of unused space you probably should be running more often. <snip> > > If you have not been vacuuming regularly then these relpages figures > > cannot be trusted too much, but it looks to me like you might need > > max_fsm_pages nearer to 1 million than 64k. If it's not large enough > > to cover all (or at least nearly all) pages with free space, then you'll > > have space-leakage problems. What is the tuple update/deletion rate in > > these tables, anyway? > Users has a 0 deletion rate, and a fairly low update rate, unless we do a > "mass" update of the whole table.... which happens a couple times a year > (say sales/mktg want a new user email flag...) > Batch_Load_awaiting has hardly no deletions but 100% of the rows are updated > ONCE (maybe twice) after they are inserted, then never touched after that. > Orders has no deletions, and a small update ratio. > Merchant_Sessions has NO deletions or updates. > there are tons more tables I didn't put in the list cause they are either > small by comparison or completely static. > > Tables with no deletions or updates won't benefit from vacuuming so there's no reason to vacuum them. On a table like Batch_Load_awaiting, you need to do a vacuum after the 100% are updated. If you plan to insert more into this table just do an --analyze, if you don't plan to insert, then --full is more appropriate. Keep an eye on those small tables if they have frequent turnover. Even a 1000 row table that gets updated every 5 minutes will generate more than 250,000 dead tuples a day if your not regularly vacuuming. I am also skeptical about your max_fsm_relations setting. This should be equal to at least the number of rows output by \d in psql plus 100. Any less and I question if your vacuum is being as effective as it should be. On a final note, you might want to try reindexing some of your tables to see if that helps. At least search the archives for "index bloat", "table out of control", "size grows and grows" or some such combination anyway, this might also be causing you some pain. Hope this helps, Robert Treat
see my answers below: "Robert Treat" <xzilla@users.sourceforge.net> wrote in message news:1041547656.32015.38.camel@camel... > Haven't been following this list too closely over the holiday break, > hopefully this can still be of some use to you. > > On Mon, 2002-12-30 at 13:12, HT Levine wrote: > > "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message > > > > > > 1. You don't need to take down the DB to do vacuuming. > > > > > > when I tried the vacuum with the site still up, the whole DB came to a > > stand-still... i.e. the pg_stat_activity table grew and grew and grew.... > > users couldn't log in, and the site was "broken".... I tried this several > > times and tho this group says you don't need to take the db down, I found > > we might as well cause it was so unresponsive to our users that we appeared > > busted. I'd rather be "down for maintenance" on purpose than appear > > busted. > > > > Are you doing vacuum full or vacuum analyze? I would think you'd not > have that problem with a vacuum analyze, which if your having large > amounts of unused space you probably should be running more often. > I tried this with vacuum analyze verbose table-name and the whole db went to sleep :( no worries, I took the site down and let it finish. We plan to take the site down this weekend and do a vacuum full (after changing the fsm properties tho) > <snip> > > > If you have not been vacuuming regularly then these relpages figures > > > cannot be trusted too much, but it looks to me like you might need > > > max_fsm_pages nearer to 1 million than 64k. If it's not large enough > > > to cover all (or at least nearly all) pages with free space, then you'll > > > have space-leakage problems. What is the tuple update/deletion rate in > > > these tables, anyway? > > Users has a 0 deletion rate, and a fairly low update rate, unless we do a > > "mass" update of the whole table.... which happens a couple times a year > > (say sales/mktg want a new user email flag...) > > Batch_Load_awaiting has hardly no deletions but 100% of the rows are updated > > ONCE (maybe twice) after they are inserted, then never touched after that. > > Orders has no deletions, and a small update ratio. > > Merchant_Sessions has NO deletions or updates. > > there are tons more tables I didn't put in the list cause they are either > > small by comparison or completely static. > > > > > > Tables with no deletions or updates won't benefit from vacuuming so > there's no reason to vacuum them. On a table like Batch_Load_awaiting, > you need to do a vacuum after the 100% are updated. If you plan to > insert more into this table just do an --analyze, if you don't plan to > insert, then --full is more appropriate. Keep an eye on those small > tables if they have frequent turnover. Even a 1000 row table that gets > updated every 5 minutes will generate more than 250,000 dead tuples a > day if your not regularly vacuuming. batch_load_awaiting (affectionately known as BLA) gets N inserts directly followed by N updates once a day. The previous day's update/inserts are not touched (except in a blue moon) so I assume this would mean a daily vacuum analyze on this guy? users depends on how often users decide to update their account, and orders would be even smaller (customer service changing user's orders) Got it on the small tables, I'll keep an eye for those too -- thanks. > > I am also skeptical about your max_fsm_relations setting. This should be > equal to at least the number of rows output by \d in psql plus 100. Any > less and I question if your vacuum is being as effective as it should > be. sorry to sound dumb here ... but \d in psql gives me just 510 rows? I was going to crank it up to 1 million (originally it had been set at 64,000) > > On a final note, you might want to try reindexing some of your tables to > see if that helps. At least search the archives for "index bloat", > "table out of control", "size grows and grows" or some such combination > anyway, this might also be causing you some pain. > I'll do this too, thanks > Hope this helps, > > Robert Treat > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
On Thu, 2003-01-02 at 19:15, HT Levine wrote: > see my answers below: > "Robert Treat" <xzilla@users.sourceforge.net> wrote in message > news:1041547656.32015.38.camel@camel... <snip> > > > > > > > Tables with no deletions or updates won't benefit from vacuuming so > > there's no reason to vacuum them. On a table like Batch_Load_awaiting, > > you need to do a vacuum after the 100% are updated. If you plan to > > insert more into this table just do an --analyze, if you don't plan to > > insert, then --full is more appropriate. Keep an eye on those small > > tables if they have frequent turnover. Even a 1000 row table that gets > > updated every 5 minutes will generate more than 250,000 dead tuples a > > day if your not regularly vacuuming. > > batch_load_awaiting (affectionately known as BLA) gets N inserts directly > followed by N updates once a day. The previous day's update/inserts are not > touched (except in a blue moon) so I assume this would mean a daily vacuum > analyze on this guy? > Yep, preferably right after your done updating. If done regularly you shouldn't have to vacuum full this table at all (any new rows can use the previous days dead updated tuples) > users depends on how often users decide to update their account, and orders > would be even smaller (customer service changing user's orders) > > Got it on the small tables, I'll keep an eye for those too -- thanks. > > > > > I am also skeptical about your max_fsm_relations setting. This should be > > equal to at least the number of rows output by \d in psql plus 100. Any > > less and I question if your vacuum is being as effective as it should > > be. > > sorry to sound dumb here ... but \d in psql gives me just 510 rows? I was > going to crank it up to 1 million (originally it had been set at 64,000) > what your cranking up to 1 million is max_fsm_pages, what I'm talking about is max_fsm_relations. _pages determines the total number of pages of free space the database is willing to track, _relations determines the total number of "objects" the database is willing to pay attention to. The default is set to 100, which means if you have more than 100 tables/indicies in your database, vacuum might ignore some tables that are being updated. Theres no rule on how it picks which tables it pays attention to (my guess is that its a first come first served thing, but thats just a guess) but given that there are around 90 system "objects" it's not hard to imagine that some things get left behind. You probably need this set to at least 610, though fwiw the default on this was recently bumped up to 1000 for future versions. Robert Treat
On Fri, 2003-01-03 at 09:44, Robert Treat wrote: > relations determines > the total number of "objects" the database is willing to pay attention > to. The default is set to 100, which means if you have more than 100 > tables/indicies in your database, vacuum might ignore some tables that > are being updated. Theres no rule on how it picks which tables it pays > attention to (my guess is that its a first come first served thing, but > thats just a guess) but given that there are around 90 system "objects" > it's not hard to imagine that some things get left behind. You probably > need this set to at least 610, though fwiw the default on this was > recently bumped up to 1000 for future versions. > Just to correct myself, the entries used in max_fsm_relations are tables and toast (not indexes) , of which there are 37 system tables taking up FSM entries. You can get a more accurate count of your need by doing select count(*) from pg_class where relkind in ('r','t'); in each database in your cluster. Robert Treat