Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: new heapcheck contrib module
Date
Msg-id CAH2-WzmBB4-Fr8_qz7faEe8s=5wzyUrNrbxdaeJesTvjp+5TEg@mail.gmail.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: new heapcheck contrib module  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Hmm.  I think we should (try to?) write code that avoids all crashes
> with production builds, but not extend that to assertion failures.

Assertions are only a problem at all because Mark would like to write
tests that involve a selection of truly corrupt data. That's a new
requirement, and one that I have my doubts about.

> > I'll stick with your example. You're calling
> > TransactionIdDidCommit() from check_tuphdr_xids(), which will
> > interrogate the commit log and pg_subtrans. It's just not under your
> > control.
>
> in a production build this would just fail with an error that the
> pg_xact file cannot be found, which is fine -- if this happens in a
> production system, you're not disturbing any other sessions.  Or maybe
> the file is there and the byte can be read, in which case you would get
> the correct response; but that's fine too.

I think that this is fine, too, since I don't consider assertion
failures with corrupt data all that important. I'd make some effort to
avoid it, but not too much, and not at the expense of a useful general
purpose assertion that could catch bugs in many different contexts.

I would be willing to make a larger effort to avoid crashing a
backend, since that affects production. I might go to some effort to
not crash with downright adversarial inputs, for example. But it seems
inappropriate to take extreme measures just to avoid a crash with
extremely contrived inputs that will probably never occur. My sense is
that this is subject to sharply diminishing returns. Completely
nailing down hard crashes from corrupt data seems like the wrong
priority, at the very least. Pursuing that objective over other
objectives sounds like zero-risk bias.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgstat_read_statsfiles() and reset timestamp
Next
From: David Rowley
Date:
Subject: Re: making update/delete of inheritance trees scale better