I have been looking some more at the vacuum-process-size issue, and
I am having a hard time understanding why the VPageList data structure
is the critical one. As far as I can see, there should be at most one
pointer in it for each disk page of the relation. OK, you were
vacuuming a table with something like a quarter million pages, so
the end size of the VPageList would have been something like a megabyte,
and given the inefficient usage of repalloc() in the original code,
a lot more space than that would have been wasted as the list grew.
So doubling the array size at each step is a good change.
But there are a lot more tuples than pages in most relations.
I see two lists with per-tuple data in vacuum.c, "vtlinks" in
vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with
essentially the same technique of repalloc() after every N entries.
I'm not entirely clear on how many tuples get put into each of these
lists, but it sure seems like in ordinary circumstances they'd be much
bigger space hogs than any of the three VPageList lists.
I recommend going to a doubling approach for each of these lists as
well as for VPageList.
There is a fourth usage of repalloc with the same method, for "ioid"
in vc_getindices. This only gets one entry per index on the current
relation, so it's unlikely to be worth changing on its own merit.
But it might be worth building a single subroutine that expands a
growable list of entries (taking sizeof() each entry as a parameter)
and applying it in all four places.
regards, tom lane