Tomas Szepe <szepe@pinerecords.com> writes:
>> Hmm, please see if you can get a stack trace from that (set the
>> breakpoint at errfinish()). You might want to use vacuum verbose
>> first so that you can figure out which individual table is causing it.
> Ok, I recompiled CVS head with --enable-debug and with --enable-cassert
> and hit the following assert on "vacuum full verbose analyze":
> [etc]
It seems fairly clear that both of these symptoms mean that
empty_end_pages got to be larger than fraged_pages->num_pages.
In the first case the Assert catches that directly, but with
asserts disabled the code just allows num_pages to go negative
and then the space calculation in vac_update_fsm goes nuts.
So the question is, how could that happen? There are only three places
where empty_end_pages is incremented, and the first two definitely add
the page to fraged_pages as well. What I'm thinking is you must have
had a few pages where notup was true but do_frag didn't get set, and
it's not quite clear how that could be. It seems most likely that the
page contained only LP_DEAD tuples but didn't have free space large
enough to get it put into the fraged_pages list. But the only place
that would mark tuples LP_DEAD is pruneheap.c, and it should have
done a PageRepairFragmentation() after doing so.
Do you perhaps have a ridiculously low fillfactor attached to
the system catalogs?
The fix should probably be to force pages to be put in fraged_pages
if notup is true, but first I want to understand exactly how it got
into this state --- there may be something else going on here.
regards, tom lane