Re: pg_amcheck contrib application - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_amcheck contrib application
Date
Msg-id CA+TgmoYw5YjGRt3hjg3XdZQ4rNnSB22JVdRO9SGE+MfjZayGZQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_amcheck contrib application  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: pg_amcheck contrib application  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked owing to how I had separated the changes in
v17-0002vs. v17-0003
 

Committed.

> v18-0002 - Postpones the toast checks for a page until after the main table page lock is released

Committed, but I changed list_free() to list_free_deep() to avoid a
memory leak, and I revised the commit message to mention the important
point that we need to avoid following TOAST pointers from
potentially-prunable tuples.

> v18-0003 - Improves the corruption messages in ways already discussed earlier in this thread.  Changes the tests to
expectthe new messages, but adds no new checks
 

Kibitizing your message wording:

"toast value %u chunk data is null" -> "toast value %u chunk %d has
null data". We can mention the chunk number this way.

"toast value %u corrupt extended chunk has invalid varlena header: %0x
(sequence number %d)" -> "toast value %u chunk %d has invalid varlena
header %0x". We can be more consistent about how we incorporate the
chunk number into the text, and we don't really need to include the
word corrupt, because all of these are corruption complaints, and I
think it looks better without the colon.

"toast value %u chunk sequence number %u does not match the expected
sequence number %u" -> "toast value %u contains chunk %d where chunk
%d was expected". Shorter. Uses %d for a sequence number instead of
%u, which I think is correct -- anyway we should have them all one way
or all the other. I think I'd rather ditch the "sequence number"
technology and just talk about "chunk %d" or whatever.

"toast value %u chunk sequence number %u exceeds the end chunk
sequence number %u" -> "toast value %u chunk %d follows last expected
chunk %d"

"toast value %u chunk size %u differs from the expected size %u" ->
"toast value %u chunk %d has size %u, but expected size %u"

Other complaints:

Your commit message fails to mention the addition of
VARATT_EXTERNAL_GET_POINTER, which is a significant change/bug fix
unrelated to message wording.

It feels like we have a non-minimal number of checks/messages for the
series of toast chunks. I think that if we find a chunk after the last
chunk we were expecting to find (curchunk > endchunk) and you also get
a message if we have the wrong number of chunks in total (chunkno !=
(endchunk + 1)). Now maybe I'm wrong, but if the first message
triggers, it seems like the second message must also trigger. Is that
wrong? If not, maybe we can get rid of the first one entirely? That's
such a small change I think we could include it in this same patch, if
it's a correct idea.

On a related note, as I think I said before, I still think we should
be rejiggering this so that we're not testing both the size of each
individual chunk and the total size, because that ought to be
redundant. That might be better done as a separate patch but I think
we should try to clean it up.

> v18-0004 - Adding corruption checks of toast pointers.  Extends the regression tests to cover the new checks.

I think we could check that the result of
VARATT_EXTERNAL_GET_COMPRESS_METHOD is one of the values we expect to
see.

Using AllocSizeIsValid() seems pretty vile. I know that MaxAllocSize
is 0x3FFFFFFF in no small part because that's the maximum length that
can be represented by a varlena, but I'm not sure it's a good idea to
couple the concepts so closely like this. Maybe we can just #define
VARLENA_SIZE_LIMIT in this file and use that, and a message that says
size %u exceeds limit %u.

I'm a little worried about whether the additional test cases are
Endian-dependent at all. I don't immediately know what might be wrong
with them, but I'm going to think about that some more later. Any
chance you have access to a Big-endian box where you can test this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Next
From: Heikki Linnakangas
Date:
Subject: Re: Yet another fast GiST build