Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING
Date
Msg-id 20200914201335.hs57uuqs2thzchqg@alap3.anarazel.de
Whole thread Raw
In response to Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-09-14 15:50:49 -0400, Robert Haas wrote:
> On Mon, Sep 14, 2020 at 3:00 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > FWIW I agree with Andres' stance on this.  The current system is *very*
> > complicated and bugs are obscure already.  If we hide them, what we'll
> > be getting is a system where data can become corrupted for no apparent
> > reason.
> 
> I think I might have to give up on this proposal given the level of
> opposition to it, but the nature of the opposition doesn't make any
> sense to me on a technical level. Suppose a tuple with tid A has been
> updated, producing a new version at tid B. The argument that is now
> being offered is that if A has been found to be corrupt then we'd
> better stop vacuuming the table altogether lest we reach B and vacuum
> it too, further corrupting the table and destroying forensic evidence.
> But even ignoring the fact that many users want to get the database
> running again more than they want to do forensics, it's entirely
> possible that B < A, in which case the damage has already been done.

My understanding of the case we're discussing is that it's corruption
(e.g. relfrozenxid being different than table contents) affecting a HOT
chain. I.e. by definition all within a single page.  We won't have
modified part of it independent of B < A, because freezing is
all-or-nothing.  Just breaking the HOT chain into two or something like
that will just make things worse, because indexes won't find tuples, and
because reindexing might then get confused e.g. by HOT chains without a
valid start, or by having two visible tuples for the same PK.


> But even ignoring the fact that many users want to get the database
> running again more than they want to do forensics

The user isn't always right. And I am not objecting against providing a
tool to get things running. I'm objecting to VACUUM doing so, especially
when it's a system wide config option triggering that behaviour.


> Therefore, I can't see any argument that this patch creates any
> scenario that can't happen already. It seems entirely reasonable to me
> to say, as a review comment, hey, you haven't sufficiently considered
> this particular scenario, that still needs work. But the argument here
> is much more about whether this is a reasonable thing to do in general
> and under any circumstances, and it feels to me like you guys are
> saying "no" without offering any really convincing evidence that there
> are unfixable problems here.

I don't think that's quite the calculation. You're suggesting to make
already really complicated and failure prone code even more complicated
by adding heuristic error recovery to it. That has substantial cost,
even if we were to get it perfectly right (which I don't believe we
will).


> The big picture here is that people have terabyte-scale tables, 1 or 2
> tuples get corrupted, and right now the only real fix is to dump and
> restore the whole table, which leads to prolonged downtime. The
> pg_surgery stuff should help with that, and the work to make VACUUM
> report the exact TID will also help, and if we can get the heapcheck
> stuff Mark Dilger is working on committed, that will provide an
> alternative and probably better way of finding this kind of
> corruption, which is all to the good.

Agreed.


> However, I disagree with the idea that a typical user who has a 2TB
> with one corrupted tuple on page 0 probably wants VACUUM to fail over
> and over again, letting the table bloat like crazy, instead of
> bleating loudly but still vacuuming the other 0.999999% of the
> table. I mean, somebody probably wants that, and that's fine. But I
> have a hard time imagining it as a typical view. Am I just lacking in
> imagination?

I know that that kind of user exists, but yea, I disagree extremely
strongly that that's a reasonable thing that the majority of users
want. And I don't think that that's something we should encourage. Those
cases indicate that either postgres has a bug, or their storage / memory
/ procedures have an issue. Reacting by making it harder to diagnose is
just a bad idea.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING
Next
From: Neha Sharma
Date:
Subject: Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS