Simon Riggs <simon@2ndquadrant.com> writes:
> Changing the idea slightly might be better: if a row update would cause
> a block split, then if there is more than one row version then we vacuum
> the whole block first, then re-attempt the update.
"Block split"? I think you are confusing tables with indexes.
Chasing down prior versions of the same row is not very practical
anyway, since there is no direct way to find them.
One possibility is, if you tried to insert a row on a given page but
there's not room, to look through the other rows on the same page to see
if any are deletable (xmax below the GlobalXmin event horizon). This
strikes me as a fairly expensive operation though, especially when you
take into account the need to get rid of their index entries first.
Moreover, the check would often be unproductive.
The real issue with any such scheme is that you are putting maintenance
costs into the critical paths of foreground processes that are executing
user queries. I think that one of the primary advantages of the
Postgres storage design is that we keep that work outside the critical
path and delegate it to maintenance processes that can run in the
background. We shouldn't lightly toss away that advantage.
There was some discussion in Toronto this week about storing bitmaps
that would tell VACUUM whether or not there was any need to visit
individual pages of each table. Getting rid of useless scans through
not-recently-changed areas of large tables would make for a significant
reduction in the cost of VACUUM.
regards, tom lane