On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote:
> Ok, committed and backpatched that.
Thanks.
> >I wonder if we need to integrate any mitigating logic? Currently the
> >corruption may only become apparent long after it occurred, that's
> >pretty bad. And instructing people run a vacuum after the ugprade will
> >cause the corrupted data being lost if they are already 2^31 xids.
>
> Ugh :-(. Running vacuum after the upgrade is the right thing to do to
> prevent further damage, but you're right that it will cause any
> already-wrapped around data to be lost forever. Nasty.
> >But integrating logic to fix things into heap_page_prune() looks
> >somewhat ugly as well.
>
> I think any mitigating logic we might add should go into vacuum. It should
> be possible for a DBA to run a command, and after it's finished, be
> confident that you're safe. That means vacuum.
Well, heap_page_prune() is the first thing that's executed by
lazy_scan_heap(), that's why I was talking about it. So anything we do
need to happen in there or before.
> >Afaics the likelihood of the issue occuring on non-all-visible pages is
> >pretty low, since they'd need to be skipped due to lock contention
> >repeatedly.
> Hmm. If a page has its visibility-map flag set, but contains a tuple that
> appears to be dead because you've wrapped around, vacuum will give a
> warning: "page containing dead tuples is marked as all-visible in relation
> \"%s\" page %u".
I don't think this warning is likely to be hit as the code stands -
heap_page_prune() et. al. will have removed all dead tuples already,
right and so has_dead_tuples won't be set.
Independent from this, ISTM we should add a else if (PageIsAllVisible(page) && all_visible)
to those checks.
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services