On Fri, Mar 17, 2017 at 9:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > With some intensive crash-recovery testing, I've run into a situation where > I get some bad table bloat. There will be large swaths of the table which > are empty (all results from heap_page_items other than lp are either zero or > NULL), but have zero available space in the fsm, and are marked as > all-visible and all-frozen in the vm. > > I guess it is a result of a crash causing updates to the fsm to be lost. > Then due to the (crash-recovered) visibility map showing them as all visible > and all frozen, vacuum never touches the pages again, so the fsm never gets > corrected.
I guess that this happens only if heap_xlog_clean applies FPI. Right? Updating fsm can be lost but fsm is updated by replaying HEAP2_CLEAN record during crash recovery.
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then can't leave the block as all visible or all frozen). I think the issue is here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly, that neither of those ever update the FSM, regardless of FPI?
I don't know how to test the issue of which record is most responsible. I could turn off FPW globally and see what happens, with some tweaking to my testing harness.