Thread: max_fsm_pages Sanity Check

max_fsm_pages Sanity Check

From

"HT"

Date:

29 December 2002, 01:19:48

Background:
We have quite large production Postgres 7.2 DB  which is out of control in
terms of disk consumption.   We made it thru the holiday shopping season,
but it isn't over yet.   We have taken the DB down once for a vacuum analyze
but only vacuum'd  2 large tables which took FIVE HOURS WITH NO RESULTS.
Posts to the newsgroup advised that I crank up the max_fsm_pages.   Right
now it is at roughly 65,000.  So I went to the postgres.org site and
searched the newsgroups....

Where I am now:
I've searched the newsgroup archives for info on the max_fsm_pages setting
and it seems that other than what the max_fsm_pages value means, I found no
definitive answer as to exactly "how" to best determine the optimal setting.
Nor could I find adequate documentation on it.  It seems I saw alot of
people asking others to "report back with your findings"  but I don't find
the results or followup in the newsgroup (please point me to an item if I'm
mistaken).   Seeing as I'm not entirely into playing guinea pig with my
production system.... I welcome expert/been-there-only-non-theoritical
advice.


Here are some numbers, followed by a few questions:

I.  Looking at my production DB with the following query:

select relname, relpages  from pg_class where relkind in ('r', 't', 'i')

             relname                          | relpages
---------------------------------+----------
 users                                         |   408711
 merchant_sessions                     |   236333
 batch_load_awaiting                  |   173785
 orders                                       |    92241


II.  But here is the same from my StandBy DB  (restored from a pg_dump of
production)    ..... hmmmm not exactly what I would expect?

             relname                          | relpages
---------------------------------+----------
 merchant_sessions                      |   615588
 users                                          |   202696
 batch_load_awaiting                   |   143735
 orders                                        |   130894


Question:    Now, why wouldn't a pg_restore into my standby db have smaller
page sizes than the live one which is bloated and consuming tons of disk
space?

III.  The results of a vacuum on the users table (production a couple
weekends ago)  yielded the below (not including all the index output):
2002-12-15 03:22:18 [22450]  NOTICE:  Removed 3254600 tuples in 295053
pages.
        CPU 111.50s/124.03u sec elapsed 2721.98 sec.
2002-12-15 03:22:18 [22450]  NOTICE:  Pages 408711: Changed 152946, Empty 0;
Tup
 4126716: Vac 3254600, Keep 0, UnUsed 28559.
        Total CPU 338.16s/1091.28u sec elapsed 8502.90 sec.


Question:   So should I hike my fsm up to 1,000,000 pages?   Is this too
high of a value or will it be ok?  If it is too big, then How big is big?


I will be most happy to summarize my results back to the newsgroup when I
make this change and do a full vacuum.   We cannot do a full vacuum without
taking the site down which will have to wait till this weekend or next.

Thanks in Advance

Re: max_fsm_pages Sanity Check

From

Tom Lane

Date:

29 December 2002, 12:19:06

"HT" <htlevine@ebates.com> writes:
> We have quite large production Postgres 7.2 DB  which is out of control in
> terms of disk consumption.   We made it thru the holiday shopping season,
> but it isn't over yet.   We have taken the DB down once for a vacuum analyze
> but only vacuum'd  2 large tables which took FIVE HOURS WITH NO
> RESULTS.

1. You don't need to take down the DB to do vacuuming.
2. What do you mean by "WITH NO RESULTS"?

> Posts to the newsgroup advised that I crank up the max_fsm_pages.   Right
> now it is at roughly 65,000.

> select relname, relpages  from pg_class where relkind in ('r', 't', 'i')
>  users                                         |   408711
>  merchant_sessions                     |   236333
>  batch_load_awaiting                  |   173785
>  orders                                       |    92241

If you have not been vacuuming regularly then these relpages figures
cannot be trusted too much, but it looks to me like you might need
max_fsm_pages nearer to 1 million than 64k.  If it's not large enough
to cover all (or at least nearly all) pages with free space, then you'll
have space-leakage problems.  What is the tuple update/deletion rate in
these tables, anyway?

Also, you should probably think about updating to 7.3.1 sometime soon.
There's a performance problem in the 7.2.* FSM code that shows up when
a single table has more than ~10000 pages with useful amounts of free
space --- VACUUM takes an unreasonable amount of time to record the free
space.

            regards, tom lane

Re: max_fsm_pages Sanity Check

From

"HT Levine"

Date:

30 December 2002, 13:12:21

Thanks for the response.  See my responses below.   I'll crank it up to 1
million fsm pages.   and report back when we finish with the results.... I
know they aren't as interesting with 7.2.3 as they would be with 7.3 but it
may help someone else.
"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:10105.1041182337@sss.pgh.pa.us...
> "HT" <htlevine@ebates.com> writes:
> > We have quite large production Postgres 7.2 DB  which is out of control
in
> > terms of disk consumption.   We made it thru the holiday shopping
season,
> > but it isn't over yet.   We have taken the DB down once for a vacuum
analyze
> > but only vacuum'd  2 large tables which took FIVE HOURS WITH NO
> > RESULTS.
>
> 1. You don't need to take down the DB to do vacuuming.

when I tried the vacuum with the site still up,  the whole DB came to a
stand-still... i.e. the pg_stat_activity table grew and grew and grew....
users couldn't log in, and the site was "broken".... I tried this several
times and tho this group says you don't need to take the db down,  I found
we might as well cause it was so unresponsive to our users that we appeared
busted.   I'd rather be "down for maintenance"  on purpose than appear
busted.

> 2. What do you mean by "WITH NO RESULTS"?

by "no results"  I mean the space was NOT freed up, in fact the db consumed
MORE space after the vacuum full than before.

>
> > Posts to the newsgroup advised that I crank up the max_fsm_pages.
Right
> > now it is at roughly 65,000.
>
> > select relname, relpages  from pg_class where relkind in ('r', 't', 'i')
> >  users                                         |   408711
> >  merchant_sessions                     |   236333
> >  batch_load_awaiting                  |   173785
> >  orders                                       |    92241
>
> If you have not been vacuuming regularly then these relpages figures
> cannot be trusted too much, but it looks to me like you might need
> max_fsm_pages nearer to 1 million than 64k.  If it's not large enough
> to cover all (or at least nearly all) pages with free space, then you'll
> have space-leakage problems.  What is the tuple update/deletion rate in
> these tables, anyway?
Users has a 0 deletion rate, and a fairly low update rate, unless we do a
"mass" update of the whole table.... which happens a couple times a year
(say sales/mktg want a new user email flag...)

Batch_Load_awaiting has hardly no deletions but 100% of the rows are updated
ONCE (maybe twice) after they are inserted, then never touched after that.

Orders has no deletions,  and a small update ratio.

Merchant_Sessions has NO deletions or updates.

there are tons more tables I didn't put in the list cause they are either
small by comparison or completely static.

>
> Also, you should probably think about updating to 7.3.1 sometime soon.

Yes, some bugs may be fixed in 7.3.1, but I fear the ones that may get me in
bigger trouble than I am already :)   I"m watching posts to this group.
When I feel comfortable that users are not reporting bugs or problems
against 7.3.1  (or whatever point release is stable) then I will definitly
upgrade.

> There's a performance problem in the 7.2.* FSM code that shows up when
> a single table has more than ~10000 pages with useful amounts of free
> space --- VACUUM takes an unreasonable amount of time to record the free
> space.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: max_fsm_pages Sanity Check

From

Robert Treat

Date:

02 January 2003, 17:48:10

Haven't been following this list too closely over the holiday break,
hopefully this can still be of some use to you.

On Mon, 2002-12-30 at 13:12, HT Levine wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> >
> > 1. You don't need to take down the DB to do vacuuming.
>
>
> when I tried the vacuum with the site still up,  the whole DB came to a
> stand-still... i.e. the pg_stat_activity table grew and grew and grew....
> users couldn't log in, and the site was "broken".... I tried this several
> times and tho this group says you don't need to take the db down,  I found
> we might as well cause it was so unresponsive to our users that we appeared
> busted.   I'd rather be "down for maintenance"  on purpose than appear
> busted.
>

Are you doing vacuum full or vacuum analyze? I would think you'd not
have that problem with a vacuum analyze, which if your having large
amounts of unused space you probably should be running more often.

<snip>
> > If you have not been vacuuming regularly then these relpages figures
> > cannot be trusted too much, but it looks to me like you might need
> > max_fsm_pages nearer to 1 million than 64k.  If it's not large enough
> > to cover all (or at least nearly all) pages with free space, then you'll
> > have space-leakage problems.  What is the tuple update/deletion rate in
> > these tables, anyway?
> Users has a 0 deletion rate, and a fairly low update rate, unless we do a
> "mass" update of the whole table.... which happens a couple times a year
> (say sales/mktg want a new user email flag...)
> Batch_Load_awaiting has hardly no deletions but 100% of the rows are updated
> ONCE (maybe twice) after they are inserted, then never touched after that.
> Orders has no deletions,  and a small update ratio.
> Merchant_Sessions has NO deletions or updates.
> there are tons more tables I didn't put in the list cause they are either
> small by comparison or completely static.
>
>

Tables with no deletions or updates won't benefit from vacuuming so
there's no reason to vacuum them.  On a table like Batch_Load_awaiting,
you need to do a vacuum after the 100% are updated. If you plan to
insert more into this table just do an --analyze, if you don't plan to
insert, then --full is more appropriate.  Keep an eye on those small
tables if they have frequent turnover. Even a 1000 row table that gets
updated every 5 minutes will generate more than 250,000 dead tuples a
day if your not regularly vacuuming.

I am also skeptical about your max_fsm_relations setting. This should be
equal to at least the number of rows output by \d in psql plus 100. Any
less and I question if your vacuum is being as effective as it should
be.

On a final note, you might want to try reindexing some of your tables to
see if that helps. At least search the archives for "index bloat",
"table out of control", "size grows and grows" or some such combination
anyway, this might also be causing you some pain.

Hope this helps,

Robert Treat

Re: max_fsm_pages Sanity Check

From

"HT Levine"

Date:

02 January 2003, 19:15:15

see my answers below:
"Robert Treat" <xzilla@users.sourceforge.net> wrote in message
news:1041547656.32015.38.camel@camel...
> Haven't been following this list too closely over the holiday break,
> hopefully this can still be of some use to you.
>
> On Mon, 2002-12-30 at 13:12, HT Levine wrote:
> > "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> > >
> > > 1. You don't need to take down the DB to do vacuuming.
> >
> >
> > when I tried the vacuum with the site still up,  the whole DB came to a
> > stand-still... i.e. the pg_stat_activity table grew and grew and
grew....
> > users couldn't log in, and the site was "broken".... I tried this
several
> > times and tho this group says you don't need to take the db down,  I
found
> > we might as well cause it was so unresponsive to our users that we
appeared
> > busted.   I'd rather be "down for maintenance"  on purpose than appear
> > busted.
> >
>
> Are you doing vacuum full or vacuum analyze? I would think you'd not
> have that problem with a vacuum analyze, which if your having large
> amounts of unused space you probably should be running more often.
>
I tried this with vacuum analyze verbose table-name   and the whole db went
to sleep :(   no worries, I took the site down and let it finish.  We plan
to take the site down this weekend and do a vacuum full  (after changing the
fsm properties tho)

> <snip>
> > > If you have not been vacuuming regularly then these relpages figures
> > > cannot be trusted too much, but it looks to me like you might need
> > > max_fsm_pages nearer to 1 million than 64k.  If it's not large enough
> > > to cover all (or at least nearly all) pages with free space, then
you'll
> > > have space-leakage problems.  What is the tuple update/deletion rate
in
> > > these tables, anyway?
> > Users has a 0 deletion rate, and a fairly low update rate, unless we do
a
> > "mass" update of the whole table.... which happens a couple times a year
> > (say sales/mktg want a new user email flag...)
> > Batch_Load_awaiting has hardly no deletions but 100% of the rows are
updated
> > ONCE (maybe twice) after they are inserted, then never touched after
that.
> > Orders has no deletions,  and a small update ratio.
> > Merchant_Sessions has NO deletions or updates.
> > there are tons more tables I didn't put in the list cause they are
either
> > small by comparison or completely static.
> >
> >
>
> Tables with no deletions or updates won't benefit from vacuuming so
> there's no reason to vacuum them.  On a table like Batch_Load_awaiting,
> you need to do a vacuum after the 100% are updated. If you plan to
> insert more into this table just do an --analyze, if you don't plan to
> insert, then --full is more appropriate.  Keep an eye on those small
> tables if they have frequent turnover. Even a 1000 row table that gets
> updated every 5 minutes will generate more than 250,000 dead tuples a
> day if your not regularly vacuuming.

batch_load_awaiting (affectionately known as BLA)  gets N inserts directly
followed by N updates once a day.  The previous day's update/inserts are not
touched (except in a blue moon)   so I assume this would mean a daily vacuum
analyze on this guy?

users depends on how often users decide to update their account, and orders
would be even smaller (customer service changing user's orders)

Got it on the small tables,  I'll keep an eye for those too -- thanks.

>
> I am also skeptical about your max_fsm_relations setting. This should be
> equal to at least the number of rows output by \d in psql plus 100. Any
> less and I question if your vacuum is being as effective as it should
> be.

sorry to sound dumb here ... but \d in psql gives me just 510 rows?    I was
going to crank it up to 1 million (originally it had been set at 64,000)

>
> On a final note, you might want to try reindexing some of your tables to
> see if that helps. At least search the archives for "index bloat",
> "table out of control", "size grows and grows" or some such combination
> anyway, this might also be causing you some pain.
>

I'll do this too, thanks


> Hope this helps,
>
> Robert Treat
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

Re: max_fsm_pages Sanity Check

From

Robert Treat

Date:

03 January 2003, 09:44:33

On Thu, 2003-01-02 at 19:15, HT Levine wrote:
> see my answers below:
> "Robert Treat" <xzilla@users.sourceforge.net> wrote in message
> news:1041547656.32015.38.camel@camel...
<snip>
> > >
> >
> > Tables with no deletions or updates won't benefit from vacuuming so
> > there's no reason to vacuum them.  On a table like Batch_Load_awaiting,
> > you need to do a vacuum after the 100% are updated. If you plan to
> > insert more into this table just do an --analyze, if you don't plan to
> > insert, then --full is more appropriate.  Keep an eye on those small
> > tables if they have frequent turnover. Even a 1000 row table that gets
> > updated every 5 minutes will generate more than 250,000 dead tuples a
> > day if your not regularly vacuuming.
>
> batch_load_awaiting (affectionately known as BLA)  gets N inserts directly
> followed by N updates once a day.  The previous day's update/inserts are not
> touched (except in a blue moon)   so I assume this would mean a daily vacuum
> analyze on this guy?
>

Yep, preferably right after your done updating.  If done regularly you
shouldn't have to vacuum full this table at all (any new rows can use
the previous days dead updated tuples)

> users depends on how often users decide to update their account, and orders
> would be even smaller (customer service changing user's orders)
>
> Got it on the small tables,  I'll keep an eye for those too -- thanks.
>
> >
> > I am also skeptical about your max_fsm_relations setting. This should be
> > equal to at least the number of rows output by \d in psql plus 100. Any
> > less and I question if your vacuum is being as effective as it should
> > be.
>
> sorry to sound dumb here ... but \d in psql gives me just 510 rows?    I was
> going to crank it up to 1 million (originally it had been set at 64,000)
>

what your cranking up to 1 million is max_fsm_pages, what I'm talking
about is max_fsm_relations.  _pages determines the total number of pages
of free space the database is willing to track, _relations determines
the total number of "objects" the database is willing to pay attention
to.  The default is set to 100, which means if you have more than 100
tables/indicies in your database, vacuum might ignore some tables that
are being updated. Theres no rule on how it picks which tables it pays
attention to (my guess is that its a first come first served thing, but
thats just a guess) but given that there are around 90 system "objects"
it's not hard to imagine that some things get left behind. You probably
need this set to at least 610, though fwiw the default on this was
recently bumped up to 1000 for future versions.

Robert Treat

Re: max_fsm_pages Sanity Check

From

Robert Treat

Date:

03 January 2003, 10:05:01

On Fri, 2003-01-03 at 09:44, Robert Treat wrote:
> relations determines
> the total number of "objects" the database is willing to pay attention
> to.  The default is set to 100, which means if you have more than 100
> tables/indicies in your database, vacuum might ignore some tables that
> are being updated. Theres no rule on how it picks which tables it pays
> attention to (my guess is that its a first come first served thing, but
> thats just a guess) but given that there are around 90 system "objects"
> it's not hard to imagine that some things get left behind. You probably
> need this set to at least 610, though fwiw the default on this was
> recently bumped up to 1000 for future versions.
>

Just to correct myself, the entries used in max_fsm_relations are tables
and toast (not indexes) , of which there are 37 system tables taking up
FSM entries. You can get a more accurate count of your need by doing
select count(*) from pg_class where relkind in ('r','t'); in each
database in your cluster.

Robert Treat