Thread: Cause of moving-target FSM space-needed reports

Cause of moving-target FSM space-needed reports

From
Tom Lane
Date:
We've seen reports occasionally of how the system reports that
max_fsm_pages needs to be increased to at least X, and then when the DBA
does so, it complains that some larger amount is needed --- one recent
report is here:
http://archives.postgresql.org/pgsql-admin/2006-06/msg00176.php

I finally got around to looking at this, and I see what is going on.
The reported number is based on the aggregate of "request" page counts
passed by VACUUM to the FSM module.  However, vacuumlazy.c is
prefiltering its requests to at most MaxFSMPages, because it knows that
no more than that many pages will be accepted anyway.  So if you have a
single table containing more than max_fsm_pages pages with interesting
amounts of free space, the reported total is being artificially
constrained, and then when you relax the constraint, the reported total
jumps up.  In the case Jeff describes where it kept saying max_fsm_pages
plus 2608 were needed, I suppose he had boatloads of free space in one
table (probably pg_largeobject) and exactly 2608 interesting pages in
all other tables.

I think it's reasonable for vacuumlazy.c to track at most MaxFSMPages
pages as it's doing now --- but it should keep a separate count of the
total number of pages with at least threshold amount of free space, and
pass that as a separate argument to RecordRelationFreeSpace.  This will
not take any more space in shared memory than we already use, but it
will allow us to report a truthful value for "number of pages needed",
which we clearly are failing to do now.

It might also be a good idea if vacuum verbose reported this page count,
since when you've got a single table bloated like this, VACUUM FULL or
CLUSTER might be a more appropriate solution than increasing the FSM
size --- but there's no way to know which rel is the problem from the
FSM total.  In fact, maybe vacuum should just throw a WARNING when it
finds a single rel with more than MaxFSMPages pages with useful free space?

Comments?  I'd like to put in a fix for beta1, which means today ...
        regards, tom lane


Re: Cause of moving-target FSM space-needed reports

From
Markus Schaber
Date:
Hi, Tom,

Tom Lane wrote:

> I think it's reasonable for vacuumlazy.c to track at most MaxFSMPages
> pages as it's doing now --- but it should keep a separate count of the
> total number of pages with at least threshold amount of free space, and
> pass that as a separate argument to RecordRelationFreeSpace.  This will
> not take any more space in shared memory than we already use, but it
> will allow us to report a truthful value for "number of pages needed",
> which we clearly are failing to do now.
>
> It might also be a good idea if vacuum verbose reported this page count,
> since when you've got a single table bloated like this, VACUUM FULL or
> CLUSTER might be a more appropriate solution than increasing the FSM
> size --- but there's no way to know which rel is the problem from the
> FSM total.  In fact, maybe vacuum should just throw a WARNING when it
> finds a single rel with more than MaxFSMPages pages with useful free space?

+1 for both from my side, it has bitten me and our admins several times now.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


Re: Cause of moving-target FSM space-needed reports

From
Andrew Dunstan
Date:
Tom Lane wrote:
> I think it's reasonable for vacuumlazy.c to track at most MaxFSMPages
> pages as it's doing now --- but it should keep a separate count of the
> total number of pages with at least threshold amount of free space, and
> pass that as a separate argument to RecordRelationFreeSpace.  This will
> not take any more space in shared memory than we already use, but it
> will allow us to report a truthful value for "number of pages needed",
> which we clearly are failing to do now.
>
> It might also be a good idea if vacuum verbose reported this page count,
> since when you've got a single table bloated like this, VACUUM FULL or
> CLUSTER might be a more appropriate solution than increasing the FSM
> size --- but there's no way to know which rel is the problem from the
> FSM total.  In fact, maybe vacuum should just throw a WARNING when it
> finds a single rel with more than MaxFSMPages pages with useful free space?
>
> Comments?  I'd like to put in a fix for beta1, which means today ...
>   


Sounds reasonable - it's arguably a bug, albeit relatively benign. I 
guess it might be less likely in 8.2 anyway given that we will have more 
generous default max_fsm_pages settings in most cases.

cheers

andrew



Re: Cause of moving-target FSM space-needed reports

From
Kevin Brown
Date:
Tom Lane wrote:
> In fact, maybe vacuum should just throw a WARNING when it
> finds a single rel with more than MaxFSMPages pages with useful free
> space?

I fully agree with this.  This (in particular, how many useful free
pages a rel has) is information a good DBA can make very good use of,
and is needed in the case where it exceeds MaxFSMPages.

I would also suggest having vacuum verbose print an INFO message with
the rel's number of free pages for rels that don't exceed MaxFSMPages
(that number should be printed in the WARNING when the number exceeds
MaxFSMPages).

Are there other ways of getting this information from the system?  If
not, then I'd consider this proposed change important.


-- 
Kevin Brown                          kevin@sysexperts.com


Re: Cause of moving-target FSM space-needed reports

From
Jeff Frost
Date:
On Thu, 21 Sep 2006, Tom Lane wrote:

> We've seen reports occasionally of how the system reports that
> max_fsm_pages needs to be increased to at least X, and then when the DBA
> does so, it complains that some larger amount is needed --- one recent
> report is here:
> http://archives.postgresql.org/pgsql-admin/2006-06/msg00176.php
>
> I finally got around to looking at this, and I see what is going on.
> The reported number is based on the aggregate of "request" page counts
> passed by VACUUM to the FSM module.  However, vacuumlazy.c is
> prefiltering its requests to at most MaxFSMPages, because it knows that
> no more than that many pages will be accepted anyway.  So if you have a
> single table containing more than max_fsm_pages pages with interesting
> amounts of free space, the reported total is being artificially
> constrained, and then when you relax the constraint, the reported total
> jumps up.  In the case Jeff describes where it kept saying max_fsm_pages
> plus 2608 were needed, I suppose he had boatloads of free space in one
> table (probably pg_largeobject) and exactly 2608 interesting pages in
> all other tables.

It was indeed pg_largeobject that caused all the grief.  I have since put 
these settings in pg_autovacuum:

vsl_cs=# select * from pg_autovacuum ; vacrelid | enabled | vac_base_thresh | vac_scale_factor | anl_base_thresh |
anl_scale_factor| vac_cost_delay | vac_cost_limit
 

----------+---------+-----------------+------------------+-----------------+------------------+----------------+----------------
   2613 | t       |             150 |              0.1 |              75 |             0.05 |             -1 |
  -1
 
(1 row)

And it seems much happier these days:

INFO:  free space map contains 299025 pages in 144 relations
DETAIL:  A total of 296160 page slots are in use (including overhead).
296160 page slots are required to track all free space.
Current limits are:  5366499 page slots, 2000 relations, using 31572 KB.

Before the more aggressive autovacuum settings, we would have the problem crop 
up again when a researcher would delete several large objects at once. 
Apparently, each large object is around 80MB in size, so I suspect it's not 
difficult to overwhelm autovacuum if you remove quite a few of these at one 
time.

-- 
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954