Re: pg_amcheck contrib application - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_amcheck contrib application |
Date | |
Msg-id | CA+TgmoYw5YjGRt3hjg3XdZQ4rNnSB22JVdRO9SGE+MfjZayGZQ@mail.gmail.com Whole thread Raw |
In response to | Re: pg_amcheck contrib application (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: pg_amcheck contrib application
|
List | pgsql-hackers |
On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked owing to how I had separated the changes in v17-0002vs. v17-0003 Committed. > v18-0002 - Postpones the toast checks for a page until after the main table page lock is released Committed, but I changed list_free() to list_free_deep() to avoid a memory leak, and I revised the commit message to mention the important point that we need to avoid following TOAST pointers from potentially-prunable tuples. > v18-0003 - Improves the corruption messages in ways already discussed earlier in this thread. Changes the tests to expectthe new messages, but adds no new checks Kibitizing your message wording: "toast value %u chunk data is null" -> "toast value %u chunk %d has null data". We can mention the chunk number this way. "toast value %u corrupt extended chunk has invalid varlena header: %0x (sequence number %d)" -> "toast value %u chunk %d has invalid varlena header %0x". We can be more consistent about how we incorporate the chunk number into the text, and we don't really need to include the word corrupt, because all of these are corruption complaints, and I think it looks better without the colon. "toast value %u chunk sequence number %u does not match the expected sequence number %u" -> "toast value %u contains chunk %d where chunk %d was expected". Shorter. Uses %d for a sequence number instead of %u, which I think is correct -- anyway we should have them all one way or all the other. I think I'd rather ditch the "sequence number" technology and just talk about "chunk %d" or whatever. "toast value %u chunk sequence number %u exceeds the end chunk sequence number %u" -> "toast value %u chunk %d follows last expected chunk %d" "toast value %u chunk size %u differs from the expected size %u" -> "toast value %u chunk %d has size %u, but expected size %u" Other complaints: Your commit message fails to mention the addition of VARATT_EXTERNAL_GET_POINTER, which is a significant change/bug fix unrelated to message wording. It feels like we have a non-minimal number of checks/messages for the series of toast chunks. I think that if we find a chunk after the last chunk we were expecting to find (curchunk > endchunk) and you also get a message if we have the wrong number of chunks in total (chunkno != (endchunk + 1)). Now maybe I'm wrong, but if the first message triggers, it seems like the second message must also trigger. Is that wrong? If not, maybe we can get rid of the first one entirely? That's such a small change I think we could include it in this same patch, if it's a correct idea. On a related note, as I think I said before, I still think we should be rejiggering this so that we're not testing both the size of each individual chunk and the total size, because that ought to be redundant. That might be better done as a separate patch but I think we should try to clean it up. > v18-0004 - Adding corruption checks of toast pointers. Extends the regression tests to cover the new checks. I think we could check that the result of VARATT_EXTERNAL_GET_COMPRESS_METHOD is one of the values we expect to see. Using AllocSizeIsValid() seems pretty vile. I know that MaxAllocSize is 0x3FFFFFFF in no small part because that's the maximum length that can be represented by a varlena, but I'm not sure it's a good idea to couple the concepts so closely like this. Maybe we can just #define VARLENA_SIZE_LIMIT in this file and use that, and a message that says size %u exceeds limit %u. I'm a little worried about whether the additional test cases are Endian-dependent at all. I don't immediately know what might be wrong with them, but I'm going to think about that some more later. Any chance you have access to a Big-endian box where you can test this? -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: