Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING
Date
Msg-id CA+TgmoZ4Ne_Lx42xgfYfY07yydLvKnFjzs7Pioi94eJw2E-SRA@mail.gmail.com
Whole thread Raw
In response to Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Sep 14, 2020 at 3:00 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> FWIW I agree with Andres' stance on this.  The current system is *very*
> complicated and bugs are obscure already.  If we hide them, what we'll
> be getting is a system where data can become corrupted for no apparent
> reason.

I think I might have to give up on this proposal given the level of
opposition to it, but the nature of the opposition doesn't make any
sense to me on a technical level. Suppose a tuple with tid A has been
updated, producing a new version at tid B. The argument that is now
being offered is that if A has been found to be corrupt then we'd
better stop vacuuming the table altogether lest we reach B and vacuum
it too, further corrupting the table and destroying forensic evidence.
But even ignoring the fact that many users want to get the database
running again more than they want to do forensics, it's entirely
possible that B < A, in which case the damage has already been done.
Therefore, I can't see any argument that this patch creates any
scenario that can't happen already. It seems entirely reasonable to me
to say, as a review comment, hey, you haven't sufficiently considered
this particular scenario, that still needs work. But the argument here
is much more about whether this is a reasonable thing to do in general
and under any circumstances, and it feels to me like you guys are
saying "no" without offering any really convincing evidence that there
are unfixable problems here. IOW, I agree that having a GUC
corrupt_my_tables_more=true is not a reasonable thing, but I disagree
that the proposal on the table is tantamount to that.

The big picture here is that people have terabyte-scale tables, 1 or 2
tuples get corrupted, and right now the only real fix is to dump and
restore the whole table, which leads to prolonged downtime. The
pg_surgery stuff should help with that, and the work to make VACUUM
report the exact TID will also help, and if we can get the heapcheck
stuff Mark Dilger is working on committed, that will provide an
alternative and probably better way of finding this kind of
corruption, which is all to the good. However, I disagree with the
idea that a typical user who has a 2TB with one corrupted tuple on
page 0 probably wants VACUUM to fail over and over again, letting the
table bloat like crazy, instead of bleating loudly but still vacuuming
the other 0.999999% of the table. I mean, somebody probably wants
that, and that's fine. But I have a hard time imagining it as a
typical view. Am I just lacking in imagination?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_restore causing deadlocks on partitioned tables
Next
From: Andres Freund
Date:
Subject: Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING