Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAD21AoBObzgdOgZOXPb0zGWcNXdkXABj_1u+9TFZyQBt4aY=pw@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (vignesh C <vignesh21@gmail.com>) |
List | pgsql-hackers |
On Mon, Jul 28, 2025 at 9:44 PM vignesh C <vignesh21@gmail.com> wrote: > > On Fri, 25 Jul 2025 at 11:45, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Jul 22, 2025 at 11:44 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Tue, Jul 22, 2025 at 5:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > Yes, I agree. The main patch focuses on the part where we > > > > automatically change the effective WAL level upon the logical slot > > > > creation and deletion (and potentially remove 'logical' from > > > > wal_level), and other things are implemented as additional features in > > > > a separate patch. > > > > > > I am keeping my focus on patch001 until we decide further on how to > > > protect the slot. > > > > Yeah, I also dropped the additional feature patch from the patch set for now. > > > > > Apart from few comments in [1], please find one > > > more concern: > > > > > > There is a race condition between creating and dropping a replication > > > slot when enabling or disabling logical decoding. We might end up with > > > logical decoding disabled even when a logical slot is present. > > > > > > Steps: > > > 1) Set wal_level=replica on primary. > > > 2) Create logical_slot1 which will enable logical decoding, causing > > > effective_wal_level to become logical. > > > 3) Drop logical_slot1 and pause execution inside > > > DisableLogicalDecodingIfNecessary() right after the > > > 'n_inuse_logical_slots' check using a debugger. > > > 4) In another session, create logical_slot2. It will attempt to enable > > > logical-decoding but since it is already enabled, > > > EnsureLogicalDecodingEnabled() will be a no-op. > > > 5) Release debugger of drop-slot, it will disable logical decoding. > > > > > > Ultimately, logical_slot2is present while logical decoding is disabled > > > and thus we see this: > > > > > > postgres=# select slot_name from pg_replication_slots; > > > slot_name > > > --------------- > > > logical_slot2 > > > > > > postgres=# show effective_wal_level; > > > effective_wal_level > > > --------------------- > > > replica > > > (1 row) > > > > > > postgres=# select pg_logical_slot_get_changes('logical_slot2', NULL, > > > NULL, 'proto_version', '4', 'publication_names', 'pub'); > > > ERROR: logical decoding is not enabled > > > HINT: Set "wal_level" >= "logical" or create at least one logical slot. > > > > > > Shall we acquire LogicalDecodingControlLock in exclusive mode at a > > > little earlier stage? Currently we acquire it after > > > IsLogicalDecodingEnabled() check. I think we shall acquire it before > > > this check in in both enable and disable flow? > > > > Thank you for testing the patch! > > > > I've reworked the locking part in the patch. The attached v4 patch > > should address all review comments including your previous > > comments[1]. > > Few comments: > 1) pg_waldump not handled for the new WAL record added > XLOG_LOGICAL_DECODING_STATUS_CHANGE: > + XLogRegisterData(&logical_decoding, sizeof(bool)); > + recptr = XLogInsert(RM_XLOG_ID, > XLOG_LOGICAL_DECODING_STATUS_CHANGE); > + XLogFlush(recptr); > > rmgr: XLOG len (rec/tot): 54/ 54, tx: 0, lsn: > 0/017633D8, prev 0/01763360, desc: PARAMETER_CHANGE > max_connections=100 max_worker_processes=8 max_wal_senders=10 > max_prepared_xacts=10 max_locks_per_xact=64 wal_level=replica > wal_log_hints=off track_commit_timestamp=off > rmgr: XLOG len (rec/tot): 27/ 27, tx: 0, lsn: > 0/01763410, prev 0/017633D8, desc: UNKNOWN (f0) > rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: > 0/01763430, prev 0/01763410, desc: RUNNING_XACTS nextXid 754 > latestCompletedXid 753 oldestRunningXid 754 > > 2) Similarly pg_walinspect also should be handled for the new WAL record added: > postgres=# SELECT * FROM pg_get_wal_records_info('0/017633D8', '0/01763468'); > start_lsn | end_lsn | prev_lsn | xid | resource_manager | > record_type | record_length | main_data_length | fpi_length | > description > | block_ref > ------------+------------+------------+-----+------------------+------------------+---------------+------------------+------------+----------------------------------------------------------- > ---------------------------------------------------------------------------------------------------------------+----------- > 0/017633D8 | 0/01763410 | 0/01763360 | 0 | XLOG | > PARAMETER_CHANGE | 54 | 28 | 0 | > max_connections=100 max_worker_processes=8 max_wal_senders > =10 max_prepared_xacts=10 max_locks_per_xact=64 wal_level=replica > wal_log_hints=off track_commit_timestamp=off | > 0/01763410 | 0/01763430 | 0/017633D8 | 0 | XLOG | > UNKNOWN (f0) | 27 | 1 | 0 | > > | > 0/01763430 | 0/01763468 | 0/01763410 | 0 | Standby | > RUNNING_XACTS | 50 | 24 | 0 | > nextXid 754 latestCompletedXid 753 oldestRunningXid 754 > > | > (3 rows) > > 3) Should this be the other way around? Would it be better to throw > the error earlier, instead of waiting for the running transactions to > finish? > @@ -136,6 +137,9 @@ create_logical_replication_slot(char *name, char *plugin, > temporary ? > RS_TEMPORARY : RS_EPHEMERAL, two_phase, > failover, false); > > + EnsureLogicalDecodingEnabled(); > + CheckLogicalDecodingRequirements(); > > 4) The includes xlog_internal, xlogutils, atomics, lwlock, procsignal, > shmem, standby and guc is not required, I was able to compile without > it: > + * src/backend/replication/logical/logicalctl.c > + * > + *------------------------------------------------------------------------- > + */ > +#include "postgres.h" > + > +#include "access/xlog_internal.h" > +#include "access/xlogutils.h" > +#include "access/xloginsert.h" > +#include "catalog/pg_control.h" > +#include "port/atomics.h" > +#include "miscadmin.h" > +#include "storage/lwlock.h" > +#include "storage/procarray.h" > +#include "storage/procsignal.h" > +#include "storage/ipc.h" > +#include "storage/lmgr.h" > +#include "storage/shmem.h" > +#include "storage/standby.h" > +#include "replication/logicalctl.h" > +#include "replication/slot.h" > +#include "utils/guc.h" > +#include "utils/wait_event_types.h" > > 5) I felt this change is not related to this patch: > @@ -1144,7 +1152,7 @@ slotsync_reread_config(void) > if (old_sync_replication_slots != sync_replication_slots) > { > ereport(LOG, > - /* translator: %s is a GUC variable name */ > + /* translator: %s is a GUC variable name */ > errmsg("replication slot > synchronization worker will shut down because \"%s\" is disabled", > "sync_replication_slots")); > proc_exit(0); > > 6) Can we include the high level design in the commit message and also > the other possible designs that were considered before finalizing on > this, it will help new reviewers to get a head start as the thread is > a long thread. > > 7) I did not see documentation added, can we add the required > documentation for this. > > 8) The new test file added should be included in meson.build file > Thank you for reviewing the patch! These comments have been addressed in the latest patch I've just submitted[1]. Regards, [1] https://www.postgresql.org/message-id/CAD21AoB%3DRf-SASOJR2WqvWcrA5Q3S2oUBACVLdJPaA8x6EchBA%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: