Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Shlok Kyal |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CANhcyEVoMLojVk7pt6fwOepZzj5LJFV9VpnjhBwiOmcNcq+5oQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
|
List | pgsql-hackers |
On Tue, 2 Sept 2025 at 17:05, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Aug 29, 2025 at 9:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I've attached the updated patch. > > > > Few comments: > ============= > 1. > + * When XLogLogicalInfoActive() is true, guarantee that a subtransaction's > + * xid can only be seen in the WAL stream if its toplevel xid has been > + * logged before. If necessary we log an xact_assignment record with fewer > + * than PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if didLogXid isn't > + * set for a transaction even though it appears in a WAL record, we just > + * might superfluously log something. That can happen when an xid is > + * included somewhere inside a wal record, but not in XLogRecord->xl_xid, > + * like in xl_standby_locks. > */ > if (isSubXact && XLogLogicalInfoActive() && > !TopTransactionStateData.didLogXid) > > Instead of writing XLogLogicalInfoActive() is true in comments, can we > say When effective wal_level is logical and then also point to some > place if required where the patch has explained about effective > wal_level? Otherwise, it sounds like we are writing what is apparent > from code and may not be very clear. > > 2. > - /* > - * Invalidate logical slots if we are in hot standby and the primary > - * does not have a WAL level sufficient for logical decoding. No need > - * to search for potentially conflicting logically slots if standby is > - * running with wal_level lower than logical, because in that case, we > - * would have either disallowed creation of logical slots or > - * invalidated existing ones. > - */ > - if (InRecovery && InHotStandby && > - xlrec.wal_level < WAL_LEVEL_LOGICAL && > - wal_level >= WAL_LEVEL_LOGICAL) > - InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL, > - 0, InvalidOid, > - InvalidTransactionId); > - > LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); > ControlFile->MaxConnections = xlrec.MaxConnections; > ControlFile->max_worker_processes = xlrec.max_worker_processes; > @@ -8605,6 +8643,50 @@ xlog_redo(XLogReaderState *record) > { > /* nothing to do here, just for informational purposes */ > } > + else if (info == XLOG_LOGICAL_DECODING_STATUS_CHANGE) > + { > + bool logical_decoding; > + > + /* Update the status on shared memory */ > + memcpy(&logical_decoding, XLogRecGetData(record), sizeof(bool)); > + UpdateLogicalDecodingStatus(logical_decoding, true); > + > + if (InRecovery && InHotStandby) > + { > + if (!logical_decoding) > > Like previously, shouldn't we have a check for standby's wal_level as > well due to the reasons mentioned in the removed comments? > > 3. > + errmsg("logical decoding needs to be enabled to publish logical changes"), > > This message doesn't sound intuitive. How about "logical decoding > should be allowed to publish logical changes"? > > 4. > + else if (info == XLOG_LOGICAL_DECODING_STATUS_CHANGE) > ... > + /* > + * Request to launch or shutdown the slotsync worker depending on > + * the new logical decoding status. > + */ > > If we see a similar part in existing code as a handling of > XLOG_PARAMETER_CHANGE, we don't shutdown or restart slotsync worker, > so why do it as part of this patch? This new behaviour may be better > but shouldn't we try to handle it as a separate HEAD patch? Also, a > few additional comments explaining the rationale behind this would be > good. > I tested the behaviour with HEAD and with Patch. And I confirmed the change in behaviour between HEAD and Patch Suppose we have a primary and a standby with wal_level = logical and guc parameters to enable slot sync worker are set accordingly. A slot sync worker will be running. Now we change the value of wal_level for primary to replica. And restart the primary server With HEAD, during restart the existing sync_slot_worker will exit with: 2025-09-02 11:49:08.846 IST [3877882] ERROR: synchronization worker "" could not connect to the primary server: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? 2025-09-02 11:49:11.380 IST [3877885] FATAL: streaming replication receiver "walreceiver" could not connect to the primary server: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? and after the restart of the primary server, slot sync worker will restart and it is able to connect to the primary. With Patch, during restart the existing sync_slot_worker will exit. But after the restart of the primary server, slot sync worker cannot start and we can see following log: 2025-09-02 12:44:51.497 IST [3947520] LOG: replication slot synchronization worker is shutting down on receiving SIGINT 2025-09-02 12:44:51.498 IST [3943504] LOG: replication slot synchronization requires logical decoding to be enabled 2025-09-02 12:44:51.498 IST [3943504] HINT: To enable logical decoding on primary, set "wal_level" >= "logical" or create at least one logical slot when "wal_level" = "replica". 2025-09-02 12:45:51.537 IST [3943504] LOG: replication slot synchronization requires logical decoding to be enabled 2025-09-02 12:45:51.537 IST [3943504] HINT: To enable logical decoding on primary, set "wal_level" >= "logical" or create at least one logical slot when "wal_level" = "replica". So, with HEAD, after we restart the primary server with 'wal_level = replica', the slot sync worker can restart and connect to the primary but with patch it cannot start after restart due to the check in ValidateSlotSyncParams. > 5. > Assert(RecoveryInProgress()); > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > - errmsg("logical decoding on standby requires \"wal_level\" >= > \"logical\" on the primary"))); > + errmsg("logical decoding must be enabled on the primary"))); > > Can't we keep the tone of the existing message as it is? How about > "logical decoding on standby requires \"effective_wal_level\" >= > \"logical\" on the primary"? Also, if we agree with this, we could > have a similar change for other messages in the patch. > > 6. Can we write some comments as to why we didn't support wal_level to > be changed from minimal to logical? It will be helpful for future > readers/authors to understand what it would require to further extend > this functionality. > Thanks, Shlok Kyal
pgsql-hackers by date: