Thread: Cause of moving-target FSM space-needed reports
We've seen reports occasionally of how the system reports that max_fsm_pages needs to be increased to at least X, and then when the DBA does so, it complains that some larger amount is needed --- one recent report is here: http://archives.postgresql.org/pgsql-admin/2006-06/msg00176.php I finally got around to looking at this, and I see what is going on. The reported number is based on the aggregate of "request" page counts passed by VACUUM to the FSM module. However, vacuumlazy.c is prefiltering its requests to at most MaxFSMPages, because it knows that no more than that many pages will be accepted anyway. So if you have a single table containing more than max_fsm_pages pages with interesting amounts of free space, the reported total is being artificially constrained, and then when you relax the constraint, the reported total jumps up. In the case Jeff describes where it kept saying max_fsm_pages plus 2608 were needed, I suppose he had boatloads of free space in one table (probably pg_largeobject) and exactly 2608 interesting pages in all other tables. I think it's reasonable for vacuumlazy.c to track at most MaxFSMPages pages as it's doing now --- but it should keep a separate count of the total number of pages with at least threshold amount of free space, and pass that as a separate argument to RecordRelationFreeSpace. This will not take any more space in shared memory than we already use, but it will allow us to report a truthful value for "number of pages needed", which we clearly are failing to do now. It might also be a good idea if vacuum verbose reported this page count, since when you've got a single table bloated like this, VACUUM FULL or CLUSTER might be a more appropriate solution than increasing the FSM size --- but there's no way to know which rel is the problem from the FSM total. In fact, maybe vacuum should just throw a WARNING when it finds a single rel with more than MaxFSMPages pages with useful free space? Comments? I'd like to put in a fix for beta1, which means today ... regards, tom lane
Hi, Tom, Tom Lane wrote: > I think it's reasonable for vacuumlazy.c to track at most MaxFSMPages > pages as it's doing now --- but it should keep a separate count of the > total number of pages with at least threshold amount of free space, and > pass that as a separate argument to RecordRelationFreeSpace. This will > not take any more space in shared memory than we already use, but it > will allow us to report a truthful value for "number of pages needed", > which we clearly are failing to do now. > > It might also be a good idea if vacuum verbose reported this page count, > since when you've got a single table bloated like this, VACUUM FULL or > CLUSTER might be a more appropriate solution than increasing the FSM > size --- but there's no way to know which rel is the problem from the > FSM total. In fact, maybe vacuum should just throw a WARNING when it > finds a single rel with more than MaxFSMPages pages with useful free space? +1 for both from my side, it has bitten me and our admins several times now. Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Tom Lane wrote: > I think it's reasonable for vacuumlazy.c to track at most MaxFSMPages > pages as it's doing now --- but it should keep a separate count of the > total number of pages with at least threshold amount of free space, and > pass that as a separate argument to RecordRelationFreeSpace. This will > not take any more space in shared memory than we already use, but it > will allow us to report a truthful value for "number of pages needed", > which we clearly are failing to do now. > > It might also be a good idea if vacuum verbose reported this page count, > since when you've got a single table bloated like this, VACUUM FULL or > CLUSTER might be a more appropriate solution than increasing the FSM > size --- but there's no way to know which rel is the problem from the > FSM total. In fact, maybe vacuum should just throw a WARNING when it > finds a single rel with more than MaxFSMPages pages with useful free space? > > Comments? I'd like to put in a fix for beta1, which means today ... > Sounds reasonable - it's arguably a bug, albeit relatively benign. I guess it might be less likely in 8.2 anyway given that we will have more generous default max_fsm_pages settings in most cases. cheers andrew
Tom Lane wrote: > In fact, maybe vacuum should just throw a WARNING when it > finds a single rel with more than MaxFSMPages pages with useful free > space? I fully agree with this. This (in particular, how many useful free pages a rel has) is information a good DBA can make very good use of, and is needed in the case where it exceeds MaxFSMPages. I would also suggest having vacuum verbose print an INFO message with the rel's number of free pages for rels that don't exceed MaxFSMPages (that number should be printed in the WARNING when the number exceeds MaxFSMPages). Are there other ways of getting this information from the system? If not, then I'd consider this proposed change important. -- Kevin Brown kevin@sysexperts.com
On Thu, 21 Sep 2006, Tom Lane wrote: > We've seen reports occasionally of how the system reports that > max_fsm_pages needs to be increased to at least X, and then when the DBA > does so, it complains that some larger amount is needed --- one recent > report is here: > http://archives.postgresql.org/pgsql-admin/2006-06/msg00176.php > > I finally got around to looking at this, and I see what is going on. > The reported number is based on the aggregate of "request" page counts > passed by VACUUM to the FSM module. However, vacuumlazy.c is > prefiltering its requests to at most MaxFSMPages, because it knows that > no more than that many pages will be accepted anyway. So if you have a > single table containing more than max_fsm_pages pages with interesting > amounts of free space, the reported total is being artificially > constrained, and then when you relax the constraint, the reported total > jumps up. In the case Jeff describes where it kept saying max_fsm_pages > plus 2608 were needed, I suppose he had boatloads of free space in one > table (probably pg_largeobject) and exactly 2608 interesting pages in > all other tables. It was indeed pg_largeobject that caused all the grief. I have since put these settings in pg_autovacuum: vsl_cs=# select * from pg_autovacuum ; vacrelid | enabled | vac_base_thresh | vac_scale_factor | anl_base_thresh | anl_scale_factor| vac_cost_delay | vac_cost_limit ----------+---------+-----------------+------------------+-----------------+------------------+----------------+---------------- 2613 | t | 150 | 0.1 | 75 | 0.05 | -1 | -1 (1 row) And it seems much happier these days: INFO: free space map contains 299025 pages in 144 relations DETAIL: A total of 296160 page slots are in use (including overhead). 296160 page slots are required to track all free space. Current limits are: 5366499 page slots, 2000 relations, using 31572 KB. Before the more aggressive autovacuum settings, we would have the problem crop up again when a researcher would delete several large objects at once. Apparently, each large object is around 80MB in size, so I suspect it's not difficult to overwhelm autovacuum if you remove quite a few of these at one time. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954