Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Adding REPACK [concurrently]
Date
Msg-id 11247.1767609087@localhost
Whole thread Raw
In response to Re: Adding REPACK [concurrently]  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Adding REPACK [concurrently]
List pgsql-hackers
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> On 2025-Dec-13, Antonin Houska wrote:
>
> > From 6279394135f2b693b6fffd174822509e0a067cbf Mon Sep 17 00:00:00 2001
> > From: Antonin Houska <ah@cybertec.at>
> > Date: Sat, 13 Dec 2025 19:27:18 +0100
> > Subject: [PATCH 4/6] Add CONCURRENTLY option to REPACK command.
>
> > diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
> > index cc03f0706e9..a956892f42f 100644
> > --- a/src/backend/replication/logical/decode.c
> > +++ b/src/backend/replication/logical/decode.c
> > @@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
>
> > +    /*
> > +     * Second, skip records which do not contain sufficient information for
> > +     * the decoding.
> > +     *
> > +     * The problem we solve here is that REPACK CONCURRENTLY generates WAL
> > +     * when doing changes in the new table. Those changes should not be useful
> > +     * for any other user (such as logical replication subscription) because
> > +     * the new table will eventually be dropped (after REPACK CONCURRENTLY has
> > +     * assigned its file to the "old table").
> > +     */
> > +    switch (info)
> > +    {
> > +        case XLOG_HEAP_INSERT:
> > +            {
> > +                xl_heap_insert *rec;
> > +
> > +                rec = (xl_heap_insert *) XLogRecGetData(buf->record);
> > +
> > +                /*
> > +                 * This does happen when 1) raw_heap_insert marks the TOAST
> > +                 * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
> > +                 * replays inserts performed by other backends.
> > +                 */
> > +                if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
> > +                    return;
> > +
> > +                break;
> > +            }
> > +
> > +        case XLOG_HEAP_HOT_UPDATE:
> > +        case XLOG_HEAP_UPDATE:
> > +            {
> > +                xl_heap_update *rec;
> > +
> > +                rec = (xl_heap_update *) XLogRecGetData(buf->record);
> > +                if ((rec->flags &
> > +                     (XLH_UPDATE_CONTAINS_NEW_TUPLE |
> > +                      XLH_UPDATE_CONTAINS_OLD_TUPLE |
> > +                      XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
> > +                    return;
> > +
> > +                break;
> > +            }
> > +
> > +        case XLOG_HEAP_DELETE:
> > +            {
> > +                xl_heap_delete *rec;
> > +
> > +                rec = (xl_heap_delete *) XLogRecGetData(buf->record);
> > +                if (rec->flags & XLH_DELETE_NO_LOGICAL)
> > +                    return;
> > +                break;
> > +            }
> > +    }
>
> I'm confused as to the purpose of this addition.  I took this whole
> block out, and no tests seem to fail.

This is just an optimization, to avoid unnecessary decoding of data changes
that the output plugin would ignore anyway. Note that REPACK (CONCURRENTLY)
can generate a huge amount of WAL itself.

> Moreover, some of the cases that
> are being skipped because of this, would already be skipped by code in
> DecodeInsert / DecodeUpdate anyway.

By checking earlier I tried to avoid calling ReorderBufferProcessXid()
unnecessarily.

> The case for XLOG_HEAP_DELETE seems
> to have no effect (that is, the "return" there never hits for any tests
> as far as I can tell.)

The current tests do not cover this, but it should be hit by backends
performing logical decoding unrelated to REPACK. The typical case is that WAL
sender involved in logical replication reads a DELETE record generated by
REPACK (CONCURRENTLY) due to replaying a DELETE statement on the new relation.

> The reason I ask is that the line immediately below does this:
>
> >      ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
>
> which means the Xid is tracked for snapshot building purposes.  Which is
> probably important, because of what the comment right below it says:
>
>     /*
>      * If we don't have snapshot or we are just fast-forwarding, there is no
>      * point in decoding data changes. However, it's crucial to build the base
>      * snapshot during fast-forward mode (as is done in
>      * SnapBuildProcessChange()) because we require the snapshot's xmin when
>      * determining the candidate catalog_xmin for the replication slot. See
>      * SnapBuildProcessRunningXacts().
>      */
>
> So what happens here is that we would skip processing the Xid of a xlog
> record during snapshot-building, on the grounds that it doesn't contain
> logical changes.  I'm not sure this is okay.

I think I missed the fact that SnapBuildProcessChange() relies on
ReorderBufferProcessXid() having been called.

> If we do indeed need this,
> then perhaps it should be done after ReorderBufferProcessXid().

... and after SnapBuildProcessChange(). Thus the changes being discussed here
should be removed from the patch. I'll do that in the next version. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: Wrong comment for ReplicationSlotCreate
Next
From: Antonin Houska
Date:
Subject: Re: Adding REPACK [concurrently]