Re: pg_amcheck contrib application - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: pg_amcheck contrib application
Date
Msg-id FCF74E43-318E-44C4-92E0-D03AB07588F3@enterprisedb.com
Whole thread Raw
In response to Re: pg_amcheck contrib application  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_amcheck contrib application  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

> On Apr 1, 2021, at 8:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> These changes got lost between v11 and v12.  I've put them back, as well as updating to use your language.
>
> Here's an updated patch that includes your 0001 and 0002 plus a bunch
> of changes by me. I intend to commit this version unless somebody
> spots a problem with it.
>
> Here are the things I changed:
>
> - I renamed tuple_can_be_pruned to tuple_could_be_pruned because I
> think it does a better job suggesting that we're uncertain about what
> will happen.

+1

> - I got rid of bool header_garbled and changed it to bool result,
> inverting the sense, because it didn't seem useful to have a function
> that ended with if (some_boolean) return false; return true when I
> could end the function with return some_other_boolean.

+1

> - I removed all the one-word comments that said /* corrupt */ or /*
> checkable */ because they seemed redundant.

Ok.

> - In the xmin_status section of check_tuple_visibility(), I got rid of
> the xmin_status == XID_IS_CURRENT_XID and xmin_status ==
> XID_IN_PROGRESS cases because they were redundant with the xmin_status
> != XID_COMMITTED case.

Ok.

> - If xmax is a multi but seems to be garbled, I changed it to return
> true rather than false. The inserter is known to have committed by
> that point, so I think it's OK to try to deform the tuple. We just
> shouldn't try to check TOAST.

It is hard to know what to do when at least one tuple header field is corrupt.  You don't necesarily know which one it
is. For example, if HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it is out of bounds, we
reportit as corrupt.  But was the xmax corrupt?  Or was the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took
theview that if either xmin or xmax appear to be corrupt when interpreted in light of the various tuple header bits,
allwe really know is that the set of fields/bits don't make sense as a whole, so we report corruption, don't trust any
ofthem, and abort further checking of the tuple.  You have be burden of proof the other way around.  If the xmin
appearsfine, and xmax appears corrupt, then we only know that xmax is corrupt, so the tuple is checkable because
accordingto the xmin it committed. 

I don't think how you have it causes undue problems, since deforming the tuple when you shouldn't merely risks a bunch
ofextra not-so-helpful corruption messages.  And hey, maybe they're helpful to somebody clever enough to diagnose why
thatparticular bit of noise was generated.  

> - I changed both switches over xmax_status to break in each case and
> then return true after instead of returning true for each case. I
> think this is clearer.

Ok.

> - I changed get_xid_status() to perform the TransactionIdIs... checks
> in the same order that HeapTupleSatisfies...() does them. I believe
> that it's incorrect to conclude that the transaction must be in
> progress because it neither IsCurrentTransaction nor DidCommit nor
> DidAbort, because all of those things will be false for a transaction
> that is running at the time of a system crash. The correct rule is
> that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit
> then it's aborted.

Ok.

> - I moved a few comments and rewrote some others, including some of
> the ones that you took from my earlier draft patch. The thing is, the
> comment needs to be adjusted based on where you put it. Like, I had a
> comment that says"It should be impossible for xvac to still be
> running, since we've removed all that code, but even if it were, it
> ought to be safe to read the tuple, since the original inserter must
> have committed.  But, if the xvac transaction committed, this tuple
> (and its associated TOAST tuples) could be pruned at any time." which
> in my version was right before a TransactionIdDidCommit() test, and
> explains why that test is there and why the code does what it does as
> a result. But in your version you've moved it to a place where we've
> already tested that the transaction has committed, and more
> importantly, where we've already tested that it's not still running.
> Saying that it "should" be impossible for it not to be running when
> we've *actually checked that* doesn't make nearly as much sense as it
> does when we haven't checked that and aren't going to do so.


            * If xmin_status happens to be XID_IN_PROGRESS, then in theory

Did you mean to say XID_IS_CURRENT_XID here?

/* xmax is an MXID, not an MXID. Sanity check it. */

Is it an MXID or isn't it?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: vignesh C
Date:
Subject: Data type correction in pgstat_report_replslot function parameters