Re: pg_amcheck contrib application - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_amcheck contrib application
Date
Msg-id CA+TgmoawwA4zJwUCJr9UcwPzivjjZS2nMB+YWm=0yACgsXTL+A@mail.gmail.com
Whole thread Raw
In response to Re: pg_amcheck contrib application  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: pg_amcheck contrib application
List pgsql-hackers
On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I have added the verb "has" rather than "contains" because "has" is more consistent with the phrasing of other
similarcorruption reports.
 

That makes sense.

I think it's odd that a range of extraneous chunks is collapsed into a
single report if the size of each chunk happens to be
TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the
first and last extraneous chunk and the size of each? If the next
chunk you see is the next one in sequence and the same size as all the
others, extend your notion of the sequence end by 1. Otherwise, report
the range accumulated so far. It seems to me that this wouldn't be any
more code than you have now, and might actually be less.

I think that report_missing_chunks() could probably just report the
range of missing chunks and not bother reporting how big they were
expected to be. But, if it is going to report how big they were
expected to be, I think it should have only 2 cases rather than 4:
either a range of missing chunks of equal size, or a single missing
chunk of some size. If, as I propose, it doesn't report the expected
size, then you still have just 2 cases: a range of missing chunks, or
a single missing chunk.

Somehow I have a hard time feeling confident that check_toast_tuple()
is going to do the right thing. The logic is really complex and hard
to understand. 'chunkno' is a counter that advances every time we move
to the next chunk, and 'curchunk' is the value we actually find in the
TOAST tuple. This terminology is not easy to understand. Most messages
now report 'curchunk', but some still report 'chunkno'. Why does
'chunkno' need to exist at all? AFAICS the combination of 'curchunk'
and 'tctx->last_chunk_seen' ought to be sufficient. I can see no
particular reason why what you're calling 'chunkno' needs to exist
even as a local variable, let alone be printed out. Either we haven't
yet validated that the chunk_id extracted from the tuple is non-null
and greater than the last chunk number we saw, in which case we can
just complain about it if we find it to be otherwise, or we have
already done that validation, in which case we should complain about
that value and not 'chunkno' in any subsequent messages.

The conditionals between where you set max_valid_prior_chunk and where
you set last_chunk_seen seem hard to understand, particularly the
bifurcated way that missing chunks are reported. Initial missing
chunks are detected by (chunkno == 0 && max_valid_prior_chunk >= 0)
and later missing chunks are detected by (chunkno > 0 &&
max_valid_prior_chunk > tctx->last_chunk_seen). I'm not sure if this
is correct; I find it hard to get my head around what
max_valid_prior_chunk is supposed to represent. But in any case I
think it can be written more simply. Just keep track of what chunk_id
we expect to extract from the next TOAST tuple. Initially it's 0.
Then:

if (chunkno < tctx->expected_chunkno)
{
   // toast value %u index scan returned chunk number %d when chunk %d
was expected
   // don't modify tctx->expected_chunkno here, just hope the next
thing matches our previous expectation
}
else
{
    if (chunkno > tctx->expected_chunkno)
        // chunks are missing from tctx->expected_chunkno through
Min(chunkno - 1, tctx->final_expected_chunk), provided that the latter
value is greater than or equal to the former
    tctx->expected_chunkno = chunkno + 1;
}

If you do this, you only need to report extraneous chunks when chunkno
> tctx->final_expected_chunk, since chunkno < 0 is guaranteed to
trigger the first of the two complaints shown above.

In check_tuple_attribute I suggest "bool valid = false" rather than
"bool invalid = true". I think it's easier to understand.

I object to check_toasted_attribute() using 'chunkno' in a message for
the same reasons as above in regards to check_toast_tuple() i.e. I
think it's a concept which should not exist.

I think this patch could possibly be split up into multiple patches.
There's some question in my mind whether it's getting too late to
commit any of this, since some of it looks suspiciously like new
features after feature freeze. However, I kind of hate to ship this
release without at least doing something about the chunkno vs.
curchunk stuff, which is even worse in the committed code than in your
patch, and which I think will confuse the heck out of users if those
messages actually fire for anyone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Bogus collation version recording in recordMultipleDependencies
Next
From: Tom Lane
Date:
Subject: Re: Bogus collation version recording in recordMultipleDependencies