Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: new heapcheck contrib module
Date
Msg-id 23E60C3C-C20A-4508-84AC-E0CEA59E784A@enterprisedb.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Andres Freund <andres@anarazel.de>)
Responses Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: new heapcheck contrib module  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

> On Apr 20, 2020, at 12:42 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-04-20 10:59:28 -0700, Mark Dilger wrote:
>> I have been talking with Robert about table corruption that occurs
>> from time to time. The page checksum feature seems sufficient to
>> detect most random corruption problems, but it can't detect "logical"
>> corruption, where the page is valid but inconsistent with the rest of
>> the database cluster. This can happen due to faulty or ill-conceived
>> backup and restore tools, or bad storage, or user error, or bugs in
>> the server itself. (Also, not everyone enables checksums.)
>
> This is something we really really really need. I'm very excited to see
> progress!

Thanks for the review!

>> From 2a1bc0bb9fa94bd929adc1a408900cb925ebcdd5 Mon Sep 17 00:00:00 2001
>> From: Mark Dilger <mark.dilger@enterprisedb.com>
>> Date: Mon, 20 Apr 2020 08:05:58 -0700
>> Subject: [PATCH v2] Adding heapcheck contrib module.
>>
>> The heapcheck module introduces a new function for checking a heap
>> relation and associated toast relation, if any, for corruption.
>
> Why not add it to amcheck?

That seems to be the general consensus.  The functionality has been moved there, renamed as "verify_heapam", as that
seemsmore in line with the "verify_nbtree" name already present in that module.  The docs have also been moved there,
althoughnot very gracefully.  It seems premature to polish the documentation given that the interface will likely
changeat least one more time, to incorporate more of Peter's suggestions.  There are still design differences between
thetwo implementations that need to be harmonized.  The verify_heapam function returns rows detailing the corruption
found,which is inconsistent with how verify_heapam does things. 

> I wonder if a mode where heapcheck optionally would only checks
> non-frozen (perhaps also non-all-visible) regions of a table would be a
> good idea? Would make it a lot more viable to run this regularly on
> bigger databases. Even if there's a window to not check some data
> (because it's frozen before the next heapcheck run).

Perhaps we should come back to that.  Version 3 of this patch addresses concerns about the v2 patch without adding too
manynew features. 

>> The attached module provides the means to scan a relation and sanity
>> check it. Currently, it checks xmin and xmax values against
>> relfrozenxid and relminmxid, and also validates TOAST pointers. If
>> people like this, it could be expanded to perform additional checks.
>
>
>> The postgres backend already defends against certain forms of
>> corruption, by checking the page header of each page before allowing
>> it into the page cache, and by checking the page checksum, if enabled.
>> Experience shows that broken or ill-conceived backup and restore
>> mechanisms can result in a page, or an entire file, being overwritten
>> with an earlier version of itself, restored from backup.  Pages thus
>> overwritten will appear to have valid page headers and checksums,
>> while potentially containing xmin, xmax, and toast pointers that are
>> invalid.
>
> We also had a *lot* of bugs that we'd have found a lot earlier, possibly
> even during development, if we had a way to easily perform these checks.

I certainly hope this is useful for testing.

>> contrib/heapcheck introduces a function, heapcheck_relation, that
>> takes a regclass argument, scans the given heap relation, and returns
>> rows containing information about corruption found within the table.
>> The main focus of the scan is to find invalid xmin, xmax, and toast
>> pointer values.  It also checks for structural corruption within the
>> page (such as invalid t_hoff values) that could lead to the backend
>> aborting should the function blindly trust the data as it finds it.
>
>
>> +typedef struct CorruptionInfo
>> +{
>> +    BlockNumber blkno;
>> +    OffsetNumber offnum;
>> +    int16        lp_off;
>> +    int16        lp_flags;
>> +    int16        lp_len;
>> +    int32        attnum;
>> +    int32        chunk;
>> +    char       *msg;
>> +}            CorruptionInfo;
>
> Adding a short comment explaining what this is for would be good.

This struct has been removed.

>> +/* Internal implementation */
>> +void        record_corruption(HeapCheckContext * ctx, char *msg);
>> +TupleDesc    heapcheck_relation_tupdesc(void);
>> +
>> +void        beginRelBlockIteration(HeapCheckContext * ctx);
>> +bool        relBlockIteration_next(HeapCheckContext * ctx);
>> +void        endRelBlockIteration(HeapCheckContext * ctx);
>> +
>> +void        beginPageTupleIteration(HeapCheckContext * ctx);
>> +bool        pageTupleIteration_next(HeapCheckContext * ctx);
>> +void        endPageTupleIteration(HeapCheckContext * ctx);
>> +
>> +void        beginTupleAttributeIteration(HeapCheckContext * ctx);
>> +bool        tupleAttributeIteration_next(HeapCheckContext * ctx);
>> +void        endTupleAttributeIteration(HeapCheckContext * ctx);
>> +
>> +void        beginToastTupleIteration(HeapCheckContext * ctx,
>> +                                     struct varatt_external *toast_pointer);
>> +void        endToastTupleIteration(HeapCheckContext * ctx);
>> +bool        toastTupleIteration_next(HeapCheckContext * ctx);
>> +
>> +bool        TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid);
>> +bool        HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
>> +void        check_toast_tuple(HeapCheckContext * ctx);
>> +bool        check_tuple_attribute(HeapCheckContext * ctx);
>> +void        check_tuple(HeapCheckContext * ctx);
>> +
>> +List       *check_relation(Oid relid);
>> +void        check_relation_relkind(Relation rel);
>
> Why aren't these static?

They are now, except for the iterator style functions, which are gone.

>> +/*
>> + * record_corruption
>> + *
>> + *   Record a message about corruption, including information
>> + *   about where in the relation the corruption was found.
>> + */
>> +void
>> +record_corruption(HeapCheckContext * ctx, char *msg)
>> +{
>
> Given that you went through the trouble of adding prototypes for all of
> these, I'd start with the most important functions, not the unimportant
> details.

Yeah, good idea.  The most important functions are now at the top.

>> +/*
>> + * Helper function to construct the TupleDesc needed by heapcheck_relation.
>> + */
>> +TupleDesc
>> +heapcheck_relation_tupdesc()
>
> Missing (void) (it's our style, even though you could theoretically not
> have it as long as you have a prototype).

That was unintentional, and is now fixed.

>> +{
>> +    TupleDesc    tupdesc;
>> +    AttrNumber    maxattr = 8;
>
> This 8 is in multiple places, I'd add a define for it.

Done.

>> +    AttrNumber    a = 0;
>> +
>> +    tupdesc = CreateTemplateTupleDesc(maxattr);
>> +    TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
>> +    TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
>> +    TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
>> +    TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
>> +    TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
>> +    TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
>> +    TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
>> +    TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
>> +    Assert(a == maxattr);
>> +
>> +    return BlessTupleDesc(tupdesc);
>> +}
>
>
>> +/*
>> + * heapcheck_relation
>> + *
>> + *   Scan and report corruption in heap pages or in associated toast relation.
>> + */
>> +Datum
>> +heapcheck_relation(PG_FUNCTION_ARGS)
>> +{
>> +    FuncCallContext *funcctx;
>> +    CheckRelCtx *ctx;
>> +
>> +    if (SRF_IS_FIRSTCALL())
>> +    {
>
> I think it'd be good to have a version that just returned a boolean. For
> one, in many cases that's all we care about when scripting things. But
> also, on a large relation, there could be a lot of errors.

There is now a second parameter to the function, "stop_on_error".  The function performs exactly the same checks, but
returnsafter the first page that contains corruption. 

>> +        Oid            relid = PG_GETARG_OID(0);
>> +        MemoryContext oldcontext;
>> +
>> +        /*
>> +         * Scan the entire relation, building up a list of corruption found in
>> +         * ctx->corruption, for returning later.  The scan must be performed
>> +         * in a memory context that will survive until after all rows are
>> +         * returned.
>> +         */
>> +        funcctx = SRF_FIRSTCALL_INIT();
>> +        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
>> +        funcctx->tuple_desc = heapcheck_relation_tupdesc();
>> +        ctx = (CheckRelCtx *) palloc0(sizeof(CheckRelCtx));
>> +        ctx->corruption = check_relation(relid);
>> +        ctx->idx = 0;            /* start the iterator at the beginning */
>> +        funcctx->user_fctx = (void *) ctx;
>> +        MemoryContextSwitchTo(oldcontext);
>
> Hm. This builds up all the errors in memory. Is that a good idea? I mean
> for a large relation having one returned value for each tuple could be a
> heck of a lot of data.
>
> I think it'd be better to use the spilling SRF protocol here.  It's not
> like you're benefitting from deferring the tuple construction to the
> return currently.

Done.

>> +/*
>> + * beginRelBlockIteration
>> + *
>> + *   For the given heap relation being checked, as recorded in ctx, sets up
>> + *   variables for iterating over the heap's pages.
>> + *
>> + *   The caller should have already opened the heap relation, ctx->rel
>> + */
>> +void
>> +beginRelBlockIteration(HeapCheckContext * ctx)
>> +{
>> +    ctx->nblocks = RelationGetNumberOfBlocks(ctx->rel);
>> +    ctx->blkno = InvalidBlockNumber;
>> +    ctx->bstrategy = GetAccessStrategy(BAS_BULKREAD);
>> +    ctx->buffer = InvalidBuffer;
>> +    ctx->page = NULL;
>> +}
>> +
>> +/*
>> + * endRelBlockIteration
>> + *
>> + *   Releases resources that were reserved by either beginRelBlockIteration or
>> + *   relBlockIteration_next.
>> + */
>> +void
>> +endRelBlockIteration(HeapCheckContext * ctx)
>> +{
>> +    /*
>> +     * Clean up.  If the caller iterated to the end, the final call to
>> +     * relBlockIteration_next will already have released the buffer, but if
>> +     * the caller is bailing out early, we have to release it ourselves.
>> +     */
>> +    if (InvalidBuffer != ctx->buffer)
>> +        UnlockReleaseBuffer(ctx->buffer);
>> +}
>
> These seem mighty granular and generically named to me.

Removed.

>> + * pageTupleIteration_next
>> + *
>> + *   Advances the state tracked in ctx to the next tuple on the page.
>> + *
>> + *   Caller should have already set up the iteration via
>> + *   beginPageTupleIteration, and should stop calling when this function
>> + *   returns false.
>> + */
>> +bool
>> +pageTupleIteration_next(HeapCheckContext * ctx)
>
> I don't think this is a naming scheme we use anywhere in postgres. I
> don't think it's a good idea to add yet more of those.

Removed.

>> +{
>> +    /*
>> +     * Iterate to the next interesting line pointer, if any. Unused, dead and
>> +     * redirect line pointers are of no interest.
>> +     */
>> +    do
>> +    {
>> +        ctx->offnum = OffsetNumberNext(ctx->offnum);
>> +        if (ctx->offnum > ctx->maxoff)
>> +            return false;
>> +        ctx->itemid = PageGetItemId(ctx->page, ctx->offnum);
>> +    } while (!ItemIdIsUsed(ctx->itemid) ||
>> +             ItemIdIsDead(ctx->itemid) ||
>> +             ItemIdIsRedirected(ctx->itemid));
>
> This is an odd loop. Part of the test is in the body, part of in the
> loop header.

Refactored.

>> +/*
>> + * Given a TransactionId, attempt to interpret it as a valid
>> + * FullTransactionId, neither in the future nor overlong in
>> + * the past.  Stores the inferred FullTransactionId in *fxid.
>> + *
>> + * Returns whether the xid is newer than the oldest clog xid.
>> + */
>> +bool
>> +TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid)
>
> I don't at all like the naming of this function. This isn't a reliable
> check. As before, it obviously also shouldn't be static.

Renamed and refactored.

>> +{
>> +    FullTransactionId fnow;
>> +    uint32        epoch;
>> +
>> +    /* Initialize fxid; we'll overwrite this later if needed */
>> +    *fxid = FullTransactionIdFromEpochAndXid(0, xid);
>
>> +    /* Special xids can quickly be turned into invalid fxids */
>> +    if (!TransactionIdIsValid(xid))
>> +        return false;
>> +    if (!TransactionIdIsNormal(xid))
>> +        return true;
>> +
>> +    /*
>> +     * Charitably infer the full transaction id as being within one epoch ago
>> +     */
>> +    fnow = ReadNextFullTransactionId();
>> +    epoch = EpochFromFullTransactionId(fnow);
>> +    *fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
>
> So now you're overwriting the fxid value from above unconditionally?
>
>
>> +    if (!FullTransactionIdPrecedes(*fxid, fnow))
>> +        *fxid = FullTransactionIdFromEpochAndXid(epoch - 1, xid);
>
>
> I think it'd be better to do the conversion the following way:
>
>    *fxid = FullTransactionIdFromU64(U64FromFullTransactionId(fnow)
>                                    + (int32) (XidFromFullTransactionId(fnow) - xid));

This has been refactored to the point that these review comments cannot be directly replied to.

>> +    if (!FullTransactionIdPrecedes(*fxid, fnow))
>> +        return false;
>> +    /* The oldestClogXid is protected by CLogTruncationLock */
>> +    Assert(LWLockHeldByMe(CLogTruncationLock));
>> +    if (TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid))
>> +        return false;
>> +    return true;
>> +}
>
> Why is this testing oldestClogXid instead of oldestXid?

References to clog have been refactored out of this module.

>> +/*
>> + * HeapTupleIsVisible
>> + *
>> + *    Determine whether tuples are visible for heapcheck.  Similar to
>> + *  HeapTupleSatisfiesVacuum, but with critical differences.
>> + *
>> + *  1) Does not touch hint bits.  It seems imprudent to write hint bits
>> + *     to a table during a corruption check.
>> + *  2) Gracefully handles xids that are too old by calling
>> + *     TransactionIdStillValid before TransactionLogFetch, thus avoiding
>> + *     a backend abort.
>
> I think it'd be better to protect against this by avoiding checks for
> xids that are older than relfrozenxid. And ones that are newer than
> ReadNextTransactionId().  But all of those cases should be errors
> anyway, so it doesn't seem like that should be handled within the
> visibility routine.

The new implementation caches a range of expected xids.  With the relation locked against concurrent vacuum runs, it
cantrust that the old end of the range won't move during the course of the scan.  The newest end may move, but it only
hasto check for that when it encounters a newer than expected xid, and it updates the cache with the new maximum. 

>
>> + *  3) Only makes a boolean determination of whether heapcheck should
>> + *     see the tuple, rather than doing extra work for vacuum-related
>> + *     categorization.
>> + */
>> +bool
>> +HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
>> +{
>
>> +    FullTransactionId fxmin,
>> +                fxmax;
>> +    uint16        infomask = tuphdr->t_infomask;
>> +    TransactionId xmin = HeapTupleHeaderGetXmin(tuphdr);
>> +
>> +    if (!HeapTupleHeaderXminCommitted(tuphdr))
>> +    {
>
> Hm. I wonder if it'd be good to crosscheck the xid committed hint bits
> with clog?

This is not done in v3, as it no longer checks clog.

>> +        else if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuphdr)))
>> +        {
>> +            LWLockRelease(CLogTruncationLock);
>> +            return false;        /* HEAPTUPLE_DEAD */
>> +        }
>
> Note that this actually can error out, if xmin is a subtransaction xid,
> because pg_subtrans is truncated a lot more aggressively than anything
> else. I think you'd need to filter against subtransactions older than
> RecentXmin before here, and treat that as an error.

Calls to TransactionIdDidCommit are now preceded by checks that the xid argument is not too old.

>> +    if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
>> +    {
>> +        if (infomask & HEAP_XMAX_IS_MULTI)
>> +        {
>> +            TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
>> +
>> +            /* not LOCKED_ONLY, so it has to have an xmax */
>> +            if (!TransactionIdIsValid(xmax))
>> +            {
>> +                record_corruption(ctx, _("heap tuple with XMAX_IS_MULTI is "
>> +                                         "neither LOCKED_ONLY nor has a "
>> +                                         "valid xmax"));
>> +                return false;
>> +            }
>
> I think it's bad to have code like this in a routine that's named like a
> generic visibility check routine.

Renamed.

>> +            if (TransactionIdIsInProgress(xmax))
>> +                return false;    /* HEAPTUPLE_DELETE_IN_PROGRESS */
>> +
>> +            LWLockAcquire(CLogTruncationLock, LW_SHARED);
>> +            if (!TransactionIdStillValid(xmax, &fxmax))
>> +            {
>> +                LWLockRelease(CLogTruncationLock);
>> +                record_corruption(ctx, psprintf("tuple xmax = %u (interpreted "
>> +                                                "as " UINT64_FORMAT
>> +                                                ") not or no longer valid",
>> +                                                xmax, fxmax.value));
>> +                return false;
>> +            }
>> +            else if (TransactionIdDidCommit(xmax))
>> +            {
>> +                LWLockRelease(CLogTruncationLock);
>> +                return false;    /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
>> +            }
>> +            LWLockRelease(CLogTruncationLock);
>> +            /* Ok, the tuple is live */
>
> I don't think random interspersed uses of CLogTruncationLock are a good
> idea. If you move to only checking visibility after tuple fits into
> [relfrozenxid, nextXid), then you don't need to take any locks here, as
> long as a lock against vacuum is taken (which I think this should do
> anyway).

Done.

>> +/*
>> + * check_tuple
>> + *
>> + *   Checks the current tuple as tracked in ctx for corruption.  Records any
>> + *   corruption found in ctx->corruption.
>> + *
>> + *   The caller should have iterated to a tuple via pageTupleIteration_next.
>> + */
>> +void
>> +check_tuple(HeapCheckContext * ctx)
>> +{
>> +    bool        fatal = false;
>
> Wait, aren't some checks here duplicate with ones in
> HeapTupleIsVisible()?

Yeah, there was some overlap.  That should be better now.

>> +    /* Check relminmxid against mxid, if any */
>> +    if (ctx->infomask & HEAP_XMAX_IS_MULTI &&
>> +        MultiXactIdPrecedes(ctx->xmax, ctx->relminmxid))
>> +    {
>> +        record_corruption(ctx, psprintf("tuple xmax = %u precedes relation "
>> +                                        "relminmxid = %u",
>> +                                        ctx->xmax, ctx->relminmxid));
>> +    }
>
> It's pretty weird that the routines here access xmin/xmax/... via
> HeapCheckContext, but HeapTupleIsVisible() doesn't.

Fair point.  HeapCheckContext no longer has fields for xmin/xmax after the refactoring.

>> +    /* Check xmin against relfrozenxid */
>> +    if (TransactionIdIsNormal(ctx->relfrozenxid) &&
>> +        TransactionIdIsNormal(ctx->xmin) &&
>> +        TransactionIdPrecedes(ctx->xmin, ctx->relfrozenxid))
>> +    {
>> +        record_corruption(ctx, psprintf("tuple xmin = %u precedes relation "
>> +                                        "relfrozenxid = %u",
>> +                                        ctx->xmin, ctx->relfrozenxid));
>> +    }
>> +
>> +    /* Check xmax against relfrozenxid */
>> +    if (TransactionIdIsNormal(ctx->relfrozenxid) &&
>> +        TransactionIdIsNormal(ctx->xmax) &&
>> +        TransactionIdPrecedes(ctx->xmax, ctx->relfrozenxid))
>> +    {
>> +        record_corruption(ctx, psprintf("tuple xmax = %u precedes relation "
>> +                                        "relfrozenxid = %u",
>> +                                        ctx->xmax, ctx->relfrozenxid));
>> +    }
>
> these all should be fatal. You definitely cannot just continue
> afterwards given the justification below:

They are now fatal.

>> +    /*
>> +     * Iterate over the attributes looking for broken toast values. This
>> +     * roughly follows the logic of heap_deform_tuple, except that it doesn't
>> +     * bother building up isnull[] and values[] arrays, since nobody wants
>> +     * them, and it unrolls anything that might trip over an Assert when
>> +     * processing corrupt data.
>> +     */
>> +    beginTupleAttributeIteration(ctx);
>> +    while (tupleAttributeIteration_next(ctx) &&
>> +           check_tuple_attribute(ctx))
>> +        ;
>> +    endTupleAttributeIteration(ctx);
>> +}
>
> I really don't find these helpers helpful.

Removed.

>> +/*
>> + * check_relation
>> + *
>> + *   Checks the relation given by relid for corruption, returning a list of all
>> + *   it finds.
>> + *
>> + *   The caller should set up the memory context as desired before calling.
>> + *   The returned list belongs to the caller.
>> + */
>> +List *
>> +check_relation(Oid relid)
>> +{
>> +    HeapCheckContext ctx;
>> +
>> +    memset(&ctx, 0, sizeof(HeapCheckContext));
>> +
>> +    /* Open the relation */
>> +    ctx.relid = relid;
>> +    ctx.corruption = NIL;
>> +    ctx.rel = relation_open(relid, AccessShareLock);
>
> I think you need to protect at least against concurrent schema changes
> given some of your checks. But I think it'd be better to also conflict
> with vacuum here.

The relation is now opened with ShareUpdateExclusiveLock.

>
>> +    check_relation_relkind(ctx.rel);
>
> I think you also need to ensure that the table is actually using heap
> AM, not another tableam. Oh - you're doing that inside the check. But
> that's confusing, because that's not 'relkind'.

It is checking both relkind and relam.  The function has been renamed to reflect that.

>> +    ctx.relDesc = RelationGetDescr(ctx.rel);
>> +    ctx.rel_natts = RelationGetDescr(ctx.rel)->natts;
>> +    ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
>> +    ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
>
> three naming schemes in three lines...

Fixed.

>> +    /* check all blocks of the relation */
>> +    beginRelBlockIteration(&ctx);
>> +    while (relBlockIteration_next(&ctx))
>> +    {
>> +        /* Perform tuple checks */
>> +        beginPageTupleIteration(&ctx);
>> +        while (pageTupleIteration_next(&ctx))
>> +            check_tuple(&ctx);
>> +        endPageTupleIteration(&ctx);
>> +    }
>> +    endRelBlockIteration(&ctx);
>
> I again do not find this helper stuff helpful.

Removed.

>> +    /* Close the associated toast table and indexes, if any. */
>> +    if (ctx.has_toastrel)
>> +    {
>> +        toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
>> +                            AccessShareLock);
>> +        table_close(ctx.toastrel, AccessShareLock);
>> +    }
>> +
>> +    /* Close the main relation */
>> +    relation_close(ctx.rel, AccessShareLock);
>
> Why the closing here?

As opposed to where...?  It seems fairly standard to close the relation in the function where it was opened.  Do you
preferthat the relation not be closed?  Or that it be closed but the lock retained? 

>> +# This regression test demonstrates that the heapcheck_relation() function
>> +# supplied with this contrib module correctly identifies specific kinds of
>> +# corruption within pages.  To test this, we need a mechanism to create corrupt
>> +# pages with predictable, repeatable corruption.  The postgres backend cannot be
>> +# expected to help us with this, as its design is not consistent with the goal
>> +# of intentionally corrupting pages.
>> +#
>> +# Instead, we create a table to corrupt, and with careful consideration of how
>> +# postgresql lays out heap pages, we seek to offsets within the page and
>> +# overwrite deliberately chosen bytes with specific values calculated to
>> +# corrupt the page in expected ways.  We then verify that heapcheck_relation
>> +# reports the corruption, and that it runs without crashing.  Note that the
>> +# backend cannot simply be started to run queries against the corrupt table, as
>> +# the backend will crash, at least for some of the corruption types we
>> +# generate.
>> +#
>> +# Autovacuum potentially touching the table in the background makes the exact
>> +# behavior of this test harder to reason about.  We turn it off to keep things
>> +# simpler.  We use a "belt and suspenders" approach, turning it off for the
>> +# system generally in postgresql.conf, and turning it off specifically for the
>> +# test table.
>> +#
>> +# This test depends on the table being written to the heap file exactly as we
>> +# expect it to be, so we take care to arrange the columns of the table, and
>> +# insert rows of the table, that give predictable sizes and locations within
>> +# the table page.
>
> I have a hard time believing this is going to be really
> reliable. E.g. the alignment requirements will vary between platforms,
> leading to different layouts. In particular, MAXALIGN differs between
> platforms.
>
> Also, it's supported to compile postgres with a different pagesize.

It's simple enough to extend the tap test a little to check for those things.  In v3, the tap test skips tests if the
pagesize is not 8k, and also if the tuples do not fall on the page where expected (which would happen due to alignment
issues,gremlins, or whatever.).  There are other approaches, though.  The HeapFile/HeapPage/HeapTuple perl modules
recentlysubmitted on another thread *could* be used here, but only if those modules are likely to be committed.  This
test*could* be extended to autodetect the page size and alignment issues and calculate at runtime where tuples will be
onthe page, but only if folks don't mind the test having that extra complexity in it.  (There is a school of thought
thatregression tests should avoid excess complexity.). Do you have a recommendation about which way to go with this? 

Here is the work thus far:


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Parallel Append can break run-time partition pruning
Next
From: David Rowley
Date:
Subject: Re: create partition table caused server crashed withself-referencing foreign key