Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Minimal logical decoding on standbys
Date
Msg-id 21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  (Andres Freund <andres@anarazel.de>)
Responses Re: Minimal logical decoding on standbys
Re: Minimal logical decoding on standbys
List pgsql-hackers
Hi,

On 3/31/23 6:33 AM, Andres Freund wrote:
> Hi,
> 
> On 2023-03-30 18:23:41 +0200, Drouvot, Bertrand wrote:
>> On 3/30/23 9:04 AM, Andres Freund wrote:
>>> I think this commit is ready to go. Unless somebody thinks differently, I
>>> think I might push it tomorrow.
>>
>> Great! Once done, I'll submit a new patch so that GlobalVisTestFor() can make
>> use of the heap relation in vacuumRedirectAndPlaceholder() (which will be possible
>> once 0001 is committed).
> 
> Unfortunately I did find an issue doing a pre-commit review of the patch.
> 
> The patch adds VISIBILITYMAP_IS_CATALOG_REL to xl_heap_visible.flags - but it
> does not remove the bit before calling visibilitymap_set().
> 
> This ends up corrupting the visibilitymap, because the we'll set a bit for
> another page.
> 

Oh I see, I did not think about that (not enough experience in the VM area).
Nice catch and thanks for pointing out!

> On a casual read, one very well might think that VISIBILITYMAP_IS_CATALOG_REL
> is a valid bit that could be set in the VM.
> 

I see what you're saying now and do agree that's confusing.

> I am thinking of instead creating a separate namespace for the "xlog only"
> bits:
> 
> /*
>   * To detect recovery conflicts during logical decoding on a standby, we need
>   * to know if a table is a user catalog table. For that we add an additional
>   * bit into xl_heap_visible.flags, in addition to the above.
>   *
>   * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
>   */
> #define VISIBILITYMAP_XLOG_CATALOG_REL    0x04
> #define VISIBILITYMAP_XLOG_VALID_BITS    (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
> 
> 
> That allows heap_xlog_visible() to do:
> 
>         Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
>         vmbits = (xlrec->flags & VISIBILITYMAP_VALID_BITS);
> 
> and pass vmbits istead of xlrec->flags to visibilitymap_set().
> 

That sounds good to me. That way you'd ensure that VISIBILITYMAP_XLOG_CATALOG_REL is not
passed to visibilitymap_set().

> 
> I'm also thinking of splitting the patch into two. One patch to pass down the
> heap relation into the new places, and another for the rest.

I think that makes sense. I don't know how far you've work on the split but please
find attached V54 doing such a split + implementing your VISIBILITYMAP_XLOG_VALID_BITS
suggestion.

> 
> Note that gistXLogDelete() continues to register data with two different
> XLogRegisterData() calls. This will append data without any padding:
> 
>     XLogRegisterData((char *) &xlrec, SizeOfGistxlogDelete);
> 
>     /*
>      * We need the target-offsets array whether or not we store the whole
>      * buffer, to allow us to find the snapshotConflictHorizon on a standby
>      * server.
>      */
>     XLogRegisterData((char *) todelete, ntodelete * sizeof(OffsetNumber));
> 
> 
> But replay now uses the new offset member:
> 
>> @@ -177,6 +177,7 @@ gistRedoDeleteRecord(XLogReaderState *record)
>>       gistxlogDelete *xldata = (gistxlogDelete *) XLogRecGetData(record);
>>       Buffer        buffer;
>>       Page        page;
>> +    OffsetNumber *toDelete = xldata->offsets;
>>
>>       /*
>>        * If we have any conflict processing to do, it must happen before we
> 
> 
> That doesn't look right. If there's any padding before offsets, we'll afaict
> read completely bogus data?
> 
> As it turns out, there is padding:
> 
> struct gistxlogDelete {
>          TransactionId              snapshotConflictHorizon; /*     0     4 */
>          uint16                     ntodelete;            /*     4     2 */
>          _Bool                      isCatalogRel;         /*     6     1 */
> 
>          /* XXX 1 byte hole, try to pack */
> 
>          OffsetNumber               offsets[];            /*     8     0 */
> 
>          /* size: 8, cachelines: 1, members: 4 */
>          /* sum members: 7, holes: 1, sum holes: 1 */
>          /* last cacheline: 8 bytes */
> };
> 
> 
> I am frankly baffled how this works at all, this should just about immediately
> crash?
> 
> 

Oh, I see. Hm, don't we have already the same issue for spgxlogVacuumRoot / vacuumLeafRoot() / spgRedoVacuumRoot()?

> 
> I'm not going to commit a nontrivial change to these WAL records without some
> minimal tests.
> 

That makes fully sense.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Infinite Interval
Next
From: Daniel Gustafsson
Date:
Subject: Re: TAP output format in pg_regress