Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers
| From | Alvaro Herrera |
|---|---|
| Subject | Re: Adding REPACK [concurrently] |
| Date | |
| Msg-id | 202512151349.vlq3mpfniyk3@alvherre.pgsql Whole thread Raw |
| In response to | Re: Adding REPACK [concurrently] (Antonin Houska <ah@cybertec.at>) |
| List | pgsql-hackers |
On 2025-Dec-13, Antonin Houska wrote:
> From 6279394135f2b693b6fffd174822509e0a067cbf Mon Sep 17 00:00:00 2001
> From: Antonin Houska <ah@cybertec.at>
> Date: Sat, 13 Dec 2025 19:27:18 +0100
> Subject: [PATCH 4/6] Add CONCURRENTLY option to REPACK command.
> diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
> index cc03f0706e9..a956892f42f 100644
> --- a/src/backend/replication/logical/decode.c
> +++ b/src/backend/replication/logical/decode.c
> @@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
> + /*
> + * Second, skip records which do not contain sufficient information for
> + * the decoding.
> + *
> + * The problem we solve here is that REPACK CONCURRENTLY generates WAL
> + * when doing changes in the new table. Those changes should not be useful
> + * for any other user (such as logical replication subscription) because
> + * the new table will eventually be dropped (after REPACK CONCURRENTLY has
> + * assigned its file to the "old table").
> + */
> + switch (info)
> + {
> + case XLOG_HEAP_INSERT:
> + {
> + xl_heap_insert *rec;
> +
> + rec = (xl_heap_insert *) XLogRecGetData(buf->record);
> +
> + /*
> + * This does happen when 1) raw_heap_insert marks the TOAST
> + * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
> + * replays inserts performed by other backends.
> + */
> + if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
> + return;
> +
> + break;
> + }
> +
> + case XLOG_HEAP_HOT_UPDATE:
> + case XLOG_HEAP_UPDATE:
> + {
> + xl_heap_update *rec;
> +
> + rec = (xl_heap_update *) XLogRecGetData(buf->record);
> + if ((rec->flags &
> + (XLH_UPDATE_CONTAINS_NEW_TUPLE |
> + XLH_UPDATE_CONTAINS_OLD_TUPLE |
> + XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
> + return;
> +
> + break;
> + }
> +
> + case XLOG_HEAP_DELETE:
> + {
> + xl_heap_delete *rec;
> +
> + rec = (xl_heap_delete *) XLogRecGetData(buf->record);
> + if (rec->flags & XLH_DELETE_NO_LOGICAL)
> + return;
> + break;
> + }
> + }
I'm confused as to the purpose of this addition. I took this whole
block out, and no tests seem to fail. Moreover, some of the cases that
are being skipped because of this, would already be skipped by code in
DecodeInsert / DecodeUpdate anyway. The case for XLOG_HEAP_DELETE seems
to have no effect (that is, the "return" there never hits for any tests
as far as I can tell.)
The reason I ask is that the line immediately below does this:
> ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
which means the Xid is tracked for snapshot building purposes. Which is
probably important, because of what the comment right below it says:
/*
* If we don't have snapshot or we are just fast-forwarding, there is no
* point in decoding data changes. However, it's crucial to build the base
* snapshot during fast-forward mode (as is done in
* SnapBuildProcessChange()) because we require the snapshot's xmin when
* determining the candidate catalog_xmin for the replication slot. See
* SnapBuildProcessRunningXacts().
*/
So what happens here is that we would skip processing the Xid of a xlog
record during snapshot-building, on the grounds that it doesn't contain
logical changes. I'm not sure this is okay. If we do indeed need this,
then perhaps it should be done after ReorderBufferProcessXid().
Or did you intend to make this conditional on the backend running
REPACK?
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
pgsql-hackers by date: