Re: pg_amcheck contrib application - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_amcheck contrib application |
Date | |
Msg-id | CA+TgmoaHzOuFUzYKZ3rwJqJOOPH3P2c+M4AtLJeORpPdXBoTeA@mail.gmail.com Whole thread Raw |
In response to | Re: pg_amcheck contrib application (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: pg_amcheck contrib application
|
List | pgsql-hackers |
On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > Sure, here are four patches which do the same as the single v12 patch did. Thanks. Here are some comments on 0003 and 0004: When you posted v11, you said that "Rather than print out all four toast pointer fields for each toast failure, va_rawsize, va_extsize, and va_toastrelid are only mentioned in the corruption message if they are related to the specific corruption. Otherwise, just the va_valueid is mentioned in the corruption message." I like that principal; in fact, as you know, I suggested it. But, with the v13 patches applied, exactly zero of the callers to report_toast_corruption() appear to be following it, because none of them include the value ID. I think you need to revise the messages, e.g. "toasted value for attribute %u missing from toast table" -> "toast value %u not found in toast table"; "final toast chunk number %u differs from expected value %u" -> "toast value %u was expected to end at chunk %u, but ended at chunk %u"; "toast chunk sequence number is null" -> "toast value %u has toast chunk with null sequence number". In the first of those example cases, I think you need not mention the attribute number because it's already there in its own column. On a related note, it doesn't look like you are actually checking va_toastrelid here. Doing so seems like it would be a good idea. It also seems like it would be good to check that the compressed size is less than or equal to the uncompressed size. I do not like the name tuple_is_volatile, because volatile has a couple of meanings already, and this isn't one of them. A SQL-callable function is volatile if it might return different outputs given the same inputs, even within the same SQL statement. A C variable is volatile if it might be magically modified in ways not known to the compiler. I had suggested tuple_cannot_die_now, which is closer to the mark. If you want to be even more precise, you could talk about whether the tuple is potentially prunable (e.g. tuple_can_be_pruned, which inverts the sense). That's really what we're worried about: whether MVCC rules would permit the tuple to be pruned after we release the buffer lock and before we check TOAST. I would ideally prefer not to rename report_corruption(). The old name is clearer, and changing it produces a bunch of churn that I'd rather avoid. Perhaps the common helper function could be called report_corruption_internal(), and the callers could be report_corruption() and report_toast_corruption(). Regarding 0001 and 0002, I think the logic in 0002 looks a lot closer to correct now, but I want to go through it in more detail. I think, though, that you've made some of my comments worse. For example, I wrote: "It should be impossible for xvac to still be running, since we've removed all that code, but even if it were, it ought to be safe to read the tuple, since the original inserter must have committed. But, if the xvac transaction committed, this tuple (and its associated TOAST tuples) could be pruned at any time." You changed that to read "We don't bother comparing against safe_xmin because the VACUUM FULL must have committed prior to an upgrade and can't still be running." Your comment is shorter, which is a point in its favor, but what I was trying to emphasize is that the logic would be correct EVEN IF we again started to use HEAP_MOVED_OFF and HEAP_MOVED_IN again. Your version makes it sound like the code would need to be revised in that case. If that's true, then my comment was wrong, but I didn't think it was true, or I wouldn't have written the comment in that way. Also, and maybe this is a job for a separate patch, but then again maybe not, I wonder if it's really a good idea for get_xid_status to return both a XidBoundsViolation and an XidCommitStatus. It seems to me (without checking all that carefully) that it might be better to just flatten all of that into a single enum, because right now it seems like you often end up with two consecutive switch statements where, perhaps, just one would suffice. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: