The attached patch allows the vacuum to continue by emitting WARNING
for the corrupted tuple instead of immediately error out as discussed
at [1].
Basically, it provides a new GUC called vacuum_tolerate_damage, to
control whether to continue the vacuum or to stop on the occurrence of
a corrupted tuple. So if the vacuum_tolerate_damage is set then in
all the cases in heap_prepare_freeze_tuple where the corrupted xid is
detected, it will emit a warning and return that nothing is changed in
the tuple and the 'tuple_totally_frozen' will also be set to false.
Since we are returning false the caller will not try to freeze such
tuple and the tuple_totally_frozen is also set to false so that the
page will not be marked to all frozen even if all other tuples in the
page are frozen.
Alternatively, we can try to freeze other XIDs in the tuple which is
not corrupted but I don't think we will gain anything from this,
because if one of the xmin or xmax is wrong then next time also if we
run the vacuum then we are going to get the same WARNING or the ERROR.
Is there any other opinion on this?
[1] http://postgr.es/m/CA+TgmoaZwZHtFFU6NUJgEAp6adDs-qWfNOXpZGQpZMmm0VTDfg@mail.gmail.com
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com