Re: HOT vs freezing issue causing "cannot freeze committed xmax" - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HOT vs freezing issue causing "cannot freeze committed xmax"
Date
Msg-id 20200724165514.dnu5hr4vvgkssf5p@alap3.anarazel.de
Whole thread Raw
In response to Re: HOT vs freezing issue causing "cannot freeze committed xmax"  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-07-24 11:06:58 -0400, Robert Haas wrote:
> On Thu, Jul 23, 2020 at 2:10 PM Andres Freund <andres@anarazel.de> wrote:
> > In the case the HOT logic triggers, we'll call
> > heap_prepare_freeze_tuple() even when the tuple is dead.
>
> I think this is very bad. I've always been confused about these
> comments, but I couldn't quite put my finger on the problem. Now I
> think I can: the comments here imagine that we have the option either
> to set tupgone, causing the line pointer to be marked unused by an
> eventual call to lazy_vacuum_page(), or that we can decline to set
> tupgone, which will leave the tuple around to be handled by the next
> vacuum.

Yea. I think the only saving grace is that it's not obvious when the
situation can arise without prior corruption. But even if that's actuall
impossible, it seems extremely fragile. I stared at heap_prune_chain()
for quite a while and couldn't convince myself either way.


> However, we don't really have a choice at all. A choice implies that
> either option is correct, and therefore we can elect the one we
> prefer. But here, it's not just that one option is incorrect, but that
> both options are incorrect. Setting tupgone controls whether or not
> the tuple is considered for freezing. If we DON'T consider freezing
> it, then we might manage to advance relfrozenxid while an older XID
> still exists in the table. If we DO consider freezing it, we will
> correctly conclude that it needs to be frozen, but then the freezing
> code is in an impossible situation, because it has no provision for
> getting rid of tuples, only for keeping them. Its logic assumes that
> whenever we are freezing xmin or xmax we do that in a way that causes
> the tuple to be visible to everyone, but this tuple should be visible
> to no one. So with your changes it now errors out instead of
> corrupting data, but that's just replacing one bad thing (data
> corruption) with another (VACUUM failures).

I suspect that the legitimate cases hitting this branch won't error out,
because then xmin/xmax aren't old enough to need to be frozen.


> I think the actual correct behavior here is to mark the line pointer
> as dead.

That's not trivial, because just doing that naively will break HOT.


> The easiest way to accomplish that is probably to have
> lazy_scan_heap() just emit an extra XLOG_HEAP2_CLEAN record beyond
> whatever HOT-pruning already did, if it's necessary. A better solution
> would probably be to merge HOT-pruning with setting things all-visible
> and have a single function that does both, but that seems a lot more
> invasive, and definitely unsuitable for back-patching.

I suspect that merging pruning and this logic in lazy_scan_heap() really
is the only proper way to solve this kind of issue.

I wonder if, given we don't know if this issue can be hit in a real
database, and given that it already triggers an error, the right way to
deal with this in the back-branches is to emit a more precise error
message. I.e. if we hit this branch, and either xmin/xmax are older than
the cutoff, then we issue a more specific ERROR.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Making CASE error handling less surprising
Next
From: Pavel Stehule
Date:
Subject: Re: Making CASE error handling less surprising