Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING
Date
Msg-id CA+TgmoakiKNbQcz9Cmiqfk66uJOyu_Tj85gv=rz82k5ut98ppQ@mail.gmail.com
Whole thread Raw
In response to Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING  (Andres Freund <andres@anarazel.de>)
Responses Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Sep 14, 2020 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:
> My understanding of the case we're discussing is that it's corruption
> (e.g. relfrozenxid being different than table contents) affecting a HOT
> chain. I.e. by definition all within a single page.  We won't have
> modified part of it independent of B < A, because freezing is
> all-or-nothing.  Just breaking the HOT chain into two or something like
> that will just make things worse, because indexes won't find tuples, and
> because reindexing might then get confused e.g. by HOT chains without a
> valid start, or by having two visible tuples for the same PK.

If we adopt the proposal made by Dilip, we will not do that. We must
have a.xmax = b.xmin, and that value is either less than relfrozenxid
or it is not. If we skip an entire tuple because one XID is bad, then
we could break the HOT chain when a.xmin is bad and the remaining
values are OK. But if we decide separately for xmin and xmax then we
should be alright. Alternately, if we're only concerned about HOT
chains, we could skip the entire page if any tuple on the page shows
evidence of damage.

> I don't think that's quite the calculation. You're suggesting to make
> already really complicated and failure prone code even more complicated
> by adding heuristic error recovery to it. That has substantial cost,
> even if we were to get it perfectly right (which I don't believe we
> will).

That's a legitimate concern, but I think it would make more sense to
first make the design as good as we can and then decide whether it's
adequate than to decide ab initio that there's no way to make it good
enough.

> > However, I disagree with the idea that a typical user who has a 2TB
> > with one corrupted tuple on page 0 probably wants VACUUM to fail over
> > and over again, letting the table bloat like crazy, instead of
> > bleating loudly but still vacuuming the other 0.999999% of the
> > table. I mean, somebody probably wants that, and that's fine. But I
> > have a hard time imagining it as a typical view. Am I just lacking in
> > imagination?
>
> I know that that kind of user exists, but yea, I disagree extremely
> strongly that that's a reasonable thing that the majority of users
> want. And I don't think that that's something we should encourage. Those
> cases indicate that either postgres has a bug, or their storage / memory
> / procedures have an issue. Reacting by making it harder to diagnose is
> just a bad idea.

Well, the people I tend to deal with are not going to let me conduct a
lengthy investigation almost no matter what, and the more severe the
operational consequences of the problem are, the less likely it is
that I'm going to have time to figure anything out. Being able to
create some kind of breathing room is pretty valuable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: "Unified logging system" breaks access to pg_dump debug outputs
Next
From: Andres Freund
Date:
Subject: Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING