Re: Failures in constraints regression test, "read only 0 of 8192 bytes" - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Date
Msg-id CA+hUKG+XOrCi3UwiK5dNL_B8Eav6hMk334L4Qpctfw4MPDUYaw@mail.gmail.com
Whole thread Raw
In response to Re: Failures in constraints regression test, "read only 0 of 8192 bytes"  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
List pgsql-hackers
On Sun, Mar 10, 2024 at 5:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Thanks, reproduced here (painfully slowly).  Looking...

I changed that ERROR to a PANIC and now I can see that
_bt_metaversion() is failing to read a meta page (block 0), and the
file is indeed of size 0 in my filesystem.  Which is not cool, for a
btree.  Looking at btbuildempty(), we have this sequence:

    bulkstate = smgr_bulk_start_rel(index, INIT_FORKNUM);

    /* Construct metapage. */
    metabuf = smgr_bulk_get_buf(bulkstate);
    _bt_initmetapage((Page) metabuf, P_NONE, 0, allequalimage);
    smgr_bulk_write(bulkstate, BTREE_METAPAGE, metabuf, true);

    smgr_bulk_finish(bulkstate);

Ooh.  One idea would be that the smgr lifetime stuff is b0rked,
introducing corruption.  Bulk write itself isn't pinning the smgr
relation, it's relying purely on the object not being invalidated,
which the theory of 21d9c3ee's commit message allowed for but ... here
it's destroyed (HASH_REMOVE'd) sooner under CACHE_CLOBBER_ALWAYS,
which I think we failed to grok.  If that's it, I'm surprised that
things don't implode more spectacularly.  Perhaps HASH_REMOVE should
clobber objects in debug builds, similar to pfree?

For that hypothesis, the corruption might not be happening in the
above-quoted code itself, because it doesn't seem to have an
invalidation acceptance point (unless I'm missing it).  Some other
bulk write got mixed up?  Not sure yet.

I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).



pgsql-hackers by date:

Previous
From: "Leung, Anthony"
Date:
Subject: Re: Allow non-superuser to cancel superuser tasks.
Next
From: Thomas Munro
Date:
Subject: Re: Failures in constraints regression test, "read only 0 of 8192 bytes"