Re: pg_amcheck contrib application - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: pg_amcheck contrib application
Date
Msg-id 4F10540A-9022-49EE-A6FC-7FE90797793E@enterprisedb.com
Whole thread Raw
Responses Re: pg_amcheck contrib application  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pg_amcheck contrib application  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers

> On Mar 16, 2021, at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> It is unfortunate that the failing test only runs pg_amcheck after creating numerous corruptions, as we can't know
ifpg_amcheck would have complained about pg_statistic before the corruptions were created in other tables, or if it
onlydoes so after.  The attached patch v7-0003 adds a call to pg_amcheck after all tables are created and populated,
butbefore any corruptions are caused.  This should help narrow down what is happening, and doesn't hurt to leave in
placelong-term. 
>>
>> I don't immediately see anything wrong with how pg_statistic uses a pseudo-type, but it leads me to want to poke a
bitmore at pg_statistic on hornet and tern, though I don't have any regression tests specifically for doing so. 
>>
>> Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.
>
> Since we now know that shutting autovacuum off makes the problem go
> away, I don't see a reason to commit 0001. We should fix pg_amcheck
> instead, if, as presently seems to be the case, that's where the
> problem is.

If you get unlucky, autovacuum will process one of the tables that the test intentionally corrupted, with bad
consequences,ultimately causing build farm intermittent test failures.  We could wait to see if this ever happens
beforefixing it, if you prefer.  I'm not sure what that buys us, though. 

The right approach, I think, is to extend the contrib/amcheck tests to include regressing this particular case to see
ifit fails.  I've written that test and verified that it fails against the old code and passes against the new. 

> I just committed 0002.

Thanks!

> I think 0003 is probably a good idea, but I haven't committed it yet.

It won't do anything for us in this particular case, but it might make debugging failures quicker in the future.

> As for 0004, it seems to me that we might want to do a little more
> rewording of these messages and perhaps we should try to do it all at
> once. Like, for example, your other change to print out the toast
> value ID seems like a good idea, and could apply to any new messages
> as well as some existing ones. Maybe there are also more fields in the
> TOAST pointer for which we could add checks.

Of the toast pointer fields:

    int32       va_rawsize;     /* Original data size (includes header) */
    int32       va_extsize;     /* External saved size (doesn't) */
    Oid         va_valueid;     /* Unique ID of value within TOAST table */
    Oid         va_toastrelid;  /* RelID of TOAST table containing it */

all seem worth getting as part of any toast error message, even if these fields themselves are not corrupt.  It just
makesit easier to understand the context of the error you're looking at.  At first I tried putting these into each
message,but it is very wordy to say things like "toast pointer with rawsize %u and extsize %u pointing at relation with
oid%u" and such.  It made more sense to just add these four fields to the verify_heapam tuple format.  That saves
puttingthem in the message text itself, and has the benefit that you could filter the rows coming from verify_heapam()
forones where valueid is or is not null, for example.  This changes the external interface of verify_heapam, but I
didn'tbother with a amcheck--1.3--1.4.sql because amcheck--1.2--1.3. sql was added as part of the v14 development work
andhas not yet been released.  My assumption is that I can just change it, rather than making a new upgrade file. 

These patches fix the visibility rules and add extra toast checking.  Some of the previous patch material is not
included,since it is not clear to me if you wanted to commit any of it. 




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Attachment

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: libpq debug log
Next
From: Pavel Stehule
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references