Me & Greg just had a little chat, and came up with this scheme:
1. on heap_update, if the page is full, you prune the page (but don't
defragment it, because you can't get the vacuum lock). That hopefully
leaves behind a large enough gap to put the new tuple in. Insert the new
tuple in the gap, and mark the page as Fragmented. Also make a note in
some backend-private data structure that we've left that page in
fragmented state.
2. In UnpinBuffer, if the pin count falls to zero and it's a page we've
pruned (check the backend-private data structure), defragment it.
Under little contention, all the cost of pruning will be carried by
transactions that do updates. Whether we need to prune in heap_fetch in
addition to that to keep the chains short, I don't know.
One problem with this scheme is that when the page gets full, you have
to hope that you can create a wide enough gap by pruning. It will work
well with fixed-size tuples, but not so well otherwise.
Hmm. I wonder if we could prune/defragment in bgwriter?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com