Hi,
On 3/31/23 6:33 AM, Andres Freund wrote:
> Hi,
>
> On 2023-03-30 18:23:41 +0200, Drouvot, Bertrand wrote:
>> On 3/30/23 9:04 AM, Andres Freund wrote:
>>> I think this commit is ready to go. Unless somebody thinks differently, I
>>> think I might push it tomorrow.
>>
>> Great! Once done, I'll submit a new patch so that GlobalVisTestFor() can make
>> use of the heap relation in vacuumRedirectAndPlaceholder() (which will be possible
>> once 0001 is committed).
>
> Unfortunately I did find an issue doing a pre-commit review of the patch.
>
> The patch adds VISIBILITYMAP_IS_CATALOG_REL to xl_heap_visible.flags - but it
> does not remove the bit before calling visibilitymap_set().
>
> This ends up corrupting the visibilitymap, because the we'll set a bit for
> another page.
>
Oh I see, I did not think about that (not enough experience in the VM area).
Nice catch and thanks for pointing out!
> On a casual read, one very well might think that VISIBILITYMAP_IS_CATALOG_REL
> is a valid bit that could be set in the VM.
>
I see what you're saying now and do agree that's confusing.
> I am thinking of instead creating a separate namespace for the "xlog only"
> bits:
>
> /*
> * To detect recovery conflicts during logical decoding on a standby, we need
> * to know if a table is a user catalog table. For that we add an additional
> * bit into xl_heap_visible.flags, in addition to the above.
> *
> * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
> */
> #define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
> #define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
>
>
> That allows heap_xlog_visible() to do:
>
> Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
> vmbits = (xlrec->flags & VISIBILITYMAP_VALID_BITS);
>
> and pass vmbits istead of xlrec->flags to visibilitymap_set().
>
That sounds good to me. That way you'd ensure that VISIBILITYMAP_XLOG_CATALOG_REL is not
passed to visibilitymap_set().
>
> I'm also thinking of splitting the patch into two. One patch to pass down the
> heap relation into the new places, and another for the rest.
I think that makes sense. I don't know how far you've work on the split but please
find attached V54 doing such a split + implementing your VISIBILITYMAP_XLOG_VALID_BITS
suggestion.
>
> Note that gistXLogDelete() continues to register data with two different
> XLogRegisterData() calls. This will append data without any padding:
>
> XLogRegisterData((char *) &xlrec, SizeOfGistxlogDelete);
>
> /*
> * We need the target-offsets array whether or not we store the whole
> * buffer, to allow us to find the snapshotConflictHorizon on a standby
> * server.
> */
> XLogRegisterData((char *) todelete, ntodelete * sizeof(OffsetNumber));
>
>
> But replay now uses the new offset member:
>
>> @@ -177,6 +177,7 @@ gistRedoDeleteRecord(XLogReaderState *record)
>> gistxlogDelete *xldata = (gistxlogDelete *) XLogRecGetData(record);
>> Buffer buffer;
>> Page page;
>> + OffsetNumber *toDelete = xldata->offsets;
>>
>> /*
>> * If we have any conflict processing to do, it must happen before we
>
>
> That doesn't look right. If there's any padding before offsets, we'll afaict
> read completely bogus data?
>
> As it turns out, there is padding:
>
> struct gistxlogDelete {
> TransactionId snapshotConflictHorizon; /* 0 4 */
> uint16 ntodelete; /* 4 2 */
> _Bool isCatalogRel; /* 6 1 */
>
> /* XXX 1 byte hole, try to pack */
>
> OffsetNumber offsets[]; /* 8 0 */
>
> /* size: 8, cachelines: 1, members: 4 */
> /* sum members: 7, holes: 1, sum holes: 1 */
> /* last cacheline: 8 bytes */
> };
>
>
> I am frankly baffled how this works at all, this should just about immediately
> crash?
>
>
Oh, I see. Hm, don't we have already the same issue for spgxlogVacuumRoot / vacuumLeafRoot() / spgRedoVacuumRoot()?
>
> I'm not going to commit a nontrivial change to these WAL records without some
> minimal tests.
>
That makes fully sense.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com