Thread: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
From
Bertrand Drouvot
Date:
Hi, On Mon, Dec 30, 2024 at 10:44:38PM -0600, Masahiko Sawada wrote: > Hi all, > > Logical decoding (and logical replication) are available only when > wal_level = logical. As the documentation says[1], Using the 'logical' > level increases the WAL volume which could negatively affect the > performance. For that reason, users might want to start with using > 'replica', but when they want to use logical decoding they need a > server restart to increase wal_level to 'logical'. My goal is to allow > users who are using 'replica' level to use logical decoding without a > server restart. Thanks for starting that thread and +1 for the idea! > The first idea I came up with is to make the wal_level a PGC_SIGHUP > parameter. However, it affects not only setting 'replica' to 'logical' > but also setting 'minimal' to 'replica' or higher. I'm not sure the > latter case is common and it might require a checkpoint. I don't want > to make the patch complex for uncommon cases. > > The second idea is to somehow allow both WAL-logging logical info and > logical decoding even when wal_level is 'replica'. I've attached a PoC > patch for that. The patch introduces new SQL functions such as > pg_activate_logical_decoding() and pg_deactivate_logical_decoding(). > These functions are available only when wal_level is 'repilca'(or > higher). In pg_activate_logical_decoding(), we set the status of > logical decoding stored on the shared memory from 'disabled' to > 'xlog-logical-info', allowing all processes to write logical > information to WAL records for logical decoding. But the logical > decoding is still not allowed. Once we confirm all in-progress > transactions completed, we switch the status to > 'logical-decoding-ready', meaning that users can create logical > replication slots and use logical decoding. > > Overall, with the patch, there are two ways to enable logical > decoding: setting wal_level to 'logical' and calling > pg_activate_logical_decoding() when wal_level is 'replica'. I left the > 'logical' level for backward compatibility and for users who want to > enable the logical decoding without calling that SQL function. If we > can automatically enable the logical decoding when creating the first > logical replication slot, probably we no longer need the 'logical' > level. There is room to discuss the user interface. Feedback is very > welcome. If we don't want to force wal_level = logical to enable logical decoding (your second idea) then I think that it would be better to "hide" everything behind logical replication slot creation / deletion. That would mean that having at least one logical replication slot created would be synonym to "activate logical decoding" and zero logical replication slot created would be synonym to "deactivate logical decoding". That way: 1. an end user don't need to manipulate new functions and would just rely on replication slots existence 2. we ensure that no extra WAL data is generated if not absolutely "needed" Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
From
"Euler Taveira"
Date:
On Fri, Jan 3, 2025, at 10:14 AM, Bertrand Drouvot wrote:
If we don't want to force wal_level = logical to enable logical decoding (yoursecond idea) then I think that it would be better to "hide" everything behindlogical replication slot creation / deletion. That would mean that having atleast one logical replication slot created would be synonym to "activate logicaldecoding" and zero logical replication slot created would be synonym to "deactivatelogical decoding".
I like this idea. The logical replication slot existence already has the
required protections and guarantees (no running transactions from the past while
creating) for logical decoding.
ERROR: logical decoding requires "wal_level" >= "logical"
STATEMENT: select pg_create_logical_replication_slot('test', 'pgoutput');
FATAL: logical replication slot "test" exists, but "wal_level" < "logical"
HINT: Change "wal_level" to be "logical" or higher.
Having said that, you are basically folding 'logical' machinery into 'replica'.
The 'logical' option can still exists but it will be less attractive because it
increases the WAL volume even if you are not using logical replication. I don't
know if the current 'logical' behavior (WAL records for logical decoding even
if there is no active logical replication) has any use case (maybe someone
inspects these extra records for analysis) but one suggestion (separate patch)
is to make 'logical' synonymous with the new 'replica' behavior (logical
decoding capability). This opens the door to remove 'logical' in future
releases (accepted as synonym but not documented).
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
From
Masahiko Sawada
Date:
On Fri, Jan 3, 2025 at 6:31 AM Euler Taveira <euler@eulerto.com> wrote: > > On Fri, Jan 3, 2025, at 10:14 AM, Bertrand Drouvot wrote: > > If we don't want to force wal_level = logical to enable logical decoding (your > second idea) then I think that it would be better to "hide" everything behind > logical replication slot creation / deletion. That would mean that having at > least one logical replication slot created would be synonym to "activate logical > decoding" and zero logical replication slot created would be synonym to "deactivate > logical decoding". > > > I like this idea. The logical replication slot existence already has the > required protections and guarantees (no running transactions from the past while > creating) for logical decoding. I agree that it's better behavior. > > Having said that, you are basically folding 'logical' machinery into 'replica'. > The 'logical' option can still exists but it will be less attractive because it > increases the WAL volume even if you are not using logical replication. I don't > know if the current 'logical' behavior (WAL records for logical decoding even > if there is no active logical replication) has any use case (maybe someone > inspects these extra records for analysis) but one suggestion (separate patch) > is to make 'logical' synonymous with the new 'replica' behavior (logical > decoding capability). This opens the door to remove 'logical' in future > releases (accepted as synonym but not documented). To enable the logical decoding automatically when the first slot is created, probably we need to do something like: 1. enable WAL-logging logical info. 2. wait for all running transactions to finish. 3. enable logical decoding, and write a XLOG_LOGICAL_DECODING_STATUS record, 4. create a logical slot 4-1. reserve the WAL and write an XLOG_RUNNING_XACTS record. 4-2. wait for all transactions in the XLOG_RUNNING_XACTS record to finish during creating the initial snapshot. That is, we could need to wait for different sets of transactions to finish (at 2 and 4-2). Which could surprise users since the first slot creation could have users wait for a longer time. Using the 'logical' level would avoid such waits. Also, if we make the 'logical' level synonymous with the new 'replica' behavior, we need to adjust regression tests as well. In some regression tests (e.g., ondisk_startup.spec), we create a logical slot while a transaction is concurrently running. We expect we wait for the concurrent transaction during the slot creation but with the above algorithm, we would wait for them to activate logical decoding. So I agree that making the 'logical' level synonymous with the new 'replica' behavior is done in a separate patch. It would be a good start to implement the new 'replica' level behavior. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
From
Ashutosh Bapat
Date:
On Sat, Jan 4, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Jan 3, 2025 at 6:31 AM Euler Taveira <euler@eulerto.com> wrote: > > > > On Fri, Jan 3, 2025, at 10:14 AM, Bertrand Drouvot wrote: > > > > If we don't want to force wal_level = logical to enable logical decoding (your > > second idea) then I think that it would be better to "hide" everything behind > > logical replication slot creation / deletion. That would mean that having at > > least one logical replication slot created would be synonym to "activate logical > > decoding" and zero logical replication slot created would be synonym to "deactivate > > logical decoding". > > > > > > I like this idea. The logical replication slot existence already has the > > required protections and guarantees (no running transactions from the past while > > creating) for logical decoding. > > I agree that it's better behavior. > > > > > Having said that, you are basically folding 'logical' machinery into 'replica'. > > The 'logical' option can still exists but it will be less attractive because it > > increases the WAL volume even if you are not using logical replication. I don't > > know if the current 'logical' behavior (WAL records for logical decoding even > > if there is no active logical replication) has any use case (maybe someone > > inspects these extra records for analysis) but one suggestion (separate patch) > > is to make 'logical' synonymous with the new 'replica' behavior (logical > > decoding capability). This opens the door to remove 'logical' in future > > releases (accepted as synonym but not documented). > > To enable the logical decoding automatically when the first slot is > created, probably we need to do something like: > > 1. enable WAL-logging logical info. > 2. wait for all running transactions to finish. > 3. enable logical decoding, and write a XLOG_LOGICAL_DECODING_STATUS record, > 4. create a logical slot > 4-1. reserve the WAL and write an XLOG_RUNNING_XACTS record. > 4-2. wait for all transactions in the XLOG_RUNNING_XACTS record to > finish during creating the initial snapshot. A naive question: Can we combine step 2 and 4-2 - may be we need to combine 4-1 and 3 too. -- Best Wishes, Ashutosh Bapat
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
From
Masahiko Sawada
Date:
On Mon, Jan 6, 2025 at 3:20 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Sat, Jan 4, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Jan 3, 2025 at 6:31 AM Euler Taveira <euler@eulerto.com> wrote: > > > > > > On Fri, Jan 3, 2025, at 10:14 AM, Bertrand Drouvot wrote: > > > > > > If we don't want to force wal_level = logical to enable logical decoding (your > > > second idea) then I think that it would be better to "hide" everything behind > > > logical replication slot creation / deletion. That would mean that having at > > > least one logical replication slot created would be synonym to "activate logical > > > decoding" and zero logical replication slot created would be synonym to "deactivate > > > logical decoding". > > > > > > > > > I like this idea. The logical replication slot existence already has the > > > required protections and guarantees (no running transactions from the past while > > > creating) for logical decoding. > > > > I agree that it's better behavior. > > > > > > > > Having said that, you are basically folding 'logical' machinery into 'replica'. > > > The 'logical' option can still exists but it will be less attractive because it > > > increases the WAL volume even if you are not using logical replication. I don't > > > know if the current 'logical' behavior (WAL records for logical decoding even > > > if there is no active logical replication) has any use case (maybe someone > > > inspects these extra records for analysis) but one suggestion (separate patch) > > > is to make 'logical' synonymous with the new 'replica' behavior (logical > > > decoding capability). This opens the door to remove 'logical' in future > > > releases (accepted as synonym but not documented). FYI one possible use case of the 'logical' level would be that if we support retrieving WAL records for logical decoding from somewhere like restore_command in the future, users would like to keep the wal_level 'logical' or keep the logical decoding active. > > > > To enable the logical decoding automatically when the first slot is > > created, probably we need to do something like: > > > > 1. enable WAL-logging logical info. > > 2. wait for all running transactions to finish. > > 3. enable logical decoding, and write a XLOG_LOGICAL_DECODING_STATUS record, > > 4. create a logical slot > > 4-1. reserve the WAL and write an XLOG_RUNNING_XACTS record. > > 4-2. wait for all transactions in the XLOG_RUNNING_XACTS record to > > finish during creating the initial snapshot. > > A naive question: Can we combine step 2 and 4-2 - may be we need to > combine 4-1 and 3 too. Yeah, maybe we can somewhat optimize this process. Probably we can do like: 1. create a logical slot. 2. enable WAL-logging logical info. 3. reserve the WAL and write an XLOG_RUNNING_XACTS record. 4. wait for all transactions in the XLOG_RUNNING_XACTS record to finish during creating the initial snapshot. 4-1. write a XLOG_LOGICAL_DECODING_STATUS record if a process detects all transactions in the XLOG_RUNNING_XACTS record to finish. However, we need to note that 4-1 should be done by only one process. Also, it's a bit weird and concerns to me that the snapbuild initialization contains the process of activating logical decoding. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
From
Peter Smith
Date:
Hi Sawada-San. FWIW, I also thought it was a good idea suggested by Bertrand [1] to "hide" everything behind the slot create/delete, and thereby eliminate the need for user intervention using those new pg_activate/deactivate_logical_decoding functions. But, one concern doing it this way is how to prevent a user (accidentally?) getting themselves into a different replication mode without realising it? Or say they change the mode and then "forget" that they did that. For example, If wal_level=replica and then somebody does CREATE PUBLICATION/SUBSCRIPTION won't that implicitly enable the logical decoding and then leave it enabled until all the logical replication slots are eventually dropped? Now, when the user examines wal_level it is still going to show 'replica', which could be misleading --- e.g. how will they even know that they can't really trust that anymore, and they must also check the pg_get_logical_decoding_status? At least, there should be some big NOTICE/WARNING logs written whenever the logical decoding status is getting implicitly changed by logical replication slot create/delete.. ~~~ Meanwhile, although it is premature to give detailed code review comments for what is still a PoC, during my reading of your patch I made copious notes for myself, so I thought I might as well post what I have regardless. See the details below in this post. Ignore, or do with them as you wish. But, hopefully, a lot of the following feedback will still be relevant regardless of any design changes. ////////// Commit message 1. Patch description is missing ====== GENERAL 2. git apply warning [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/PoC_online_activate_logical_decoding.patch ../patches_misc/PoC_online_activate_logical_decoding.patch:93: new blank line at EOF. + warning: 1 line adds whitespace errors. ~~~ 3. Copyright date All the new files of this patch have the wrong copyright date (2024 instead of 2025). ~~~ 4. Missing documentation There are missing docs changes, but that is to be expected while the design is in flux. ~~~ 5. Terminology There are multiple places where similar terms like "active" and "enabled" and "ready" are being used apparently interchangeably making it quite confusing. Perhaps it was your intention to convey the idea that the logical mode might be "enabled" but is not yet "active", but I didn't think that was the case. Mostly it just seemed arbitrary to me. IMO the state starting out as "disabled" should end up as "enabled" (not "ready"). Meanwhile, the API to change the status is currently called "activate/deactivate" instead of "enable/disable" (??) Meanwhile, many/most of the error messages talk about "enabled" (not "activated" or "ready"). e.g. "logical decoding needs to be enabled to publish logical changes" Meanwhile, macros are called 'IsLogicalDecodingActive'; why isn't that called 'IsLogicalDecodingEnabled'. ~ OTOH, maybe you chose the term "active" because master already had the macro XLogLogicalInfoActive(). But in that case maybe it should be saying inactive/active everywhere instead of disabled/enabled/ready... ~~~ 6. Typedefs There are some new typedefs introduced in the new files, So these should be updated in the typedefs.list file. ~~~ 7. Tri-state logical decoding status Currently there is a tri-state logical decoding status: "disabled" -> "xlog-logicalinfo" -> "ready" AFAICT that middle state is just transitory. It is easier for me to visualize it like: logical decoding status: "disabled" -> "pending" -> "enabled". IIUC the "pending" part is when it is just waiting for any running transaction to complete. IMO it becomes simpler if you do not make any distinction between WAL logging and logical decoding ability. E.g. Do not allow logical WAL logging for that 'middle' transitory state; only allow it after logical decoding becomes fully activated. IIUC, it would just a small change to do this: CURRENT (pseudocode): XLogLogicalInfoEnabled(void) { return XLogLogicalInfoCtl->status >= LOGICAL_DECODING_STATUS_XLOG_LOGICALINFO; } SUGGESTION (pseudocode) XLogLogicalInfoEnabled(void) { return XLogLogicalInfoCtl->status == LOGICAL_DECODING_STATUS_READY; } ====== contrib/test_decoding/t/002_online_activation.pl 8. +# Copyright (c) 2021-2024, PostgreSQL Global Development Group /2024/2025/ ~~~ 9. +# check if a logical replication slot cannot be craeted while logical /check/Check/ /cannot/can/ /craeted/created/ ~~~ 10. +ok( $stderr =~ /ERROR: logical decoding requires "wal_level" >= "logical"/, + "logical replication slot cannot be created while logical decoding is disableed"); /disableed/disabled/ ~~~ 11. +# Activate logical info logging. +$node->safe_psql('postgres', + 'SELECT pg_activate_logical_decoding()'); +$ret = $node->safe_psql('postgres', 'SELECT pg_get_logical_decoding_status()'); +is($ret, 'ready', 'logical decoding gets activated'); Perhaps there should be another "show wal_level" done here to show that even after logical decoding is enabled, the wal_level remains the same. ~~~ 12. +$node->safe_psql( + 'postgres', + q[ +CREATE TABLE test (a int primary key, b int); +INSERT INTO test values (1, 100), (2, 200); +UPDATE test SET b = b + 10 WHERE a = 1; + ]); + +$ret = $node->safe_psql( + 'postgres', + q[SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');] + ); +is($ret, 'BEGIN +table public.test: INSERT: a[integer]:1 b[integer]:100 +table public.test: INSERT: a[integer]:2 b[integer]:200 +COMMIT +BEGIN +table public.test: UPDATE: a[integer]:1 b[integer]:110 +COMMIT', 'logical decoding works fine'); + There seems to be some comment missing for all this part. I expected it to say something like "Check that logical decoding is working OK". ====== src/backend/access/heap/heapam.c 13. - * This is only used in wal_level >= WAL_LEVEL_LOGICAL, and only for catalog - * tuples. + * This is only used when XLogLogicalInfoActive() is true, and only for + * catalog tuples. I felt it would be better to Assert this in the code, instead of just mentioning it in the function comment. ====== src/backend/access/rmgrdesc/xlogdesc.c 14. + else if (info == XLOG_LOGICAL_DECODING_STATUS) + { + bool logical_decoding; + + memcpy(&logical_decoding, rec, sizeof(bool)); + appendStringInfoString(buf, logical_decoding ? "true" : "false"); + } Should there be some textual context for this bool value, like wal_level has? SUGGESTION appendStringInfoString(buf, "logical decoding %s", logical_decoding ? "true" : "false"); ~~~ 15. I noticed there are lots of other descriptions in this function that are showing the value of 'wal_level' via the static function call: get_wal_level_string(wal_level). But, since the wal_level now might be out-of-step with the logcal_decoding boolean value, shouldn't you be showing the logical decoding state everywhere too? e.g. you might have a combination like: wal_level replica logical decoding true ====== src/backend/access/transam/xlog.c BootStrapXLOG: 16. + checkPoint.logicalDecoding = false; checkPoint.wal_level = wal_level; Shouldn't the default for 'logicalDecoding' be consistent with the assigned wal_level? It seems strange if it is possible to have a situation where wal_level = logical but simultaneously logicalDecoding = false. ~~~ xlog_redo: 17. + else if (info == XLOG_LOGICAL_DECODING_STATUS) + { + bool enabled; A less generic variable name like "logical_decoding_enabled" might be better. ====== src/backend/commands/publicationcmds.c CreatePublication: 18. - if (wal_level != WAL_LEVEL_LOGICAL) + if (!XLogLogicalInfoActive()) ereport(WARNING, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("\"wal_level\" is insufficient to publish logical changes"), - errhint("Set \"wal_level\" to \"logical\" before creating subscriptions."))); + errmsg("logical decoding needs to be enabled to publish logical changes"), + errhint("Set \"wal_level\" to \"logical\" or activate logical decoding before creating subscriptions."))); 18a. /needs to be enabled/must be enabled/ ~ 18b. The hint seems less useful than it could be because it does not say *how* the user can activate logical decoding. SUGGESTION Set \"wal_level\" to \"logical\" or use pg_activate_logical_decoding before creating subscriptions. ====== src/backend/replication/logical/logical.c CheckLogicalDecodingRequirements: 19. ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("logical decoding on standby requires \"wal_level\" >= \"logical\" on the primary"))); + errmsg("logical decoding on standby requires to enable logical decoding on the primary"))); 19a /requires to enable logical decoding/requires logical decoding to be enabled/ ~ 19b. This message is like what was in CreatePublication, so might be better split into an errmsg and errhint. e.g. errmsg("logical decoding on standby requires logical decoding to be enabled on the primary"); errhint("Set \"wal_level\" to \"logical\" or use pg_activate_logical_decoding."); ====== src/backend/replication/logical/logicalxlog.c 20. General Would it be better if all the exposed functions from here used the common "LogicalXLog" prefix? These ones do: - LogicalXlogShmemSize - LogicalXlogShmemInit These ones don't: - StartupLogicalDecodingStatus - UpdateLogicalDecodingStatus - XLogLogicalInfoEnabled (this one is backwards) - IsLogicalDecodingActive ~~~ 21. + * logicalxlog.c + * This module contains the codes for enabling or disabling to include + * logical information into WAL records and logical decoding. /codes/code/ Also, this is worded as if you can set those states independently, but AFAICT the "logical information into WAL" is just a transition-state to the LOGICAL_DECODING_STATUS_READY. SUGGESTION This module contains code to enable/disable and fetch the logical decoding status. ~~~ 22. + * Copyright (c) 2012-2024, PostgreSQL Global Development Group /2024/2025/ ~~~ 23. +XLogLogicalInfoCtlData *XLogLogicalInfoCtl = NULL; Add blank line above this. ~~~ LogicalXlogShmemSize: 24. +/* + * Initialization of shared memory. + */ + +Size +LogicalXlogShmemSize(void) +{ + return sizeof(XLogLogicalInfoCtlData); +} Unnecessary blank line after the function comment. ~~~ LogicalXlogShmemInit: 25. +void +LogicalXlogShmemInit(void) Missing function comment. ~~~ StartupLogicalDecodingStatus: 26. +/* + * This must be called ONCE during postmaster or standalone-backend startup, + * before calling StartupReplicationSlots(). + */ +void +StartupLogicalDecodingStatus(bool enabled_at_last_checkpoint) The function comment does not actually say what this function does. Maybe add a first sentence like: "Assign the initial LogicalDecodingStatus value." ~~~ 27. + /* + * On standbys, we always start with the status in the last checkpoint + * record. If changes of wal_level or logical decoding status is sent from + * the primary, we will enable or disable the logical decoding while + * replaying the WAL record and invalidate slots if necessary. + */ /is sent from/are sent from/ /invalidate/invalidating/ ~~~ 28. + if (!StandbyMode) + { + /* + * Disable logical decoding if replication slots are not available. + */ + if (max_replication_slots == 0) + status = LOGICAL_DECODING_STATUS_DISABLED; + + /* + * If previously the logical decoding was active but the server + * restarted with wal_level < 'replica', we disable the logical + * decoding. + */ + if (wal_level < WAL_LEVEL_REPLICA) + status = LOGICAL_DECODING_STATUS_DISABLED; + + /* + * Setting wal_level to 'logical' immediately enables logical decoding + * and WAL-logging logical info. + */ + if (wal_level >= WAL_LEVEL_LOGICAL) + status = LOGICAL_DECODING_STATUS_READY; + } 28a. The comment is a bit confusing because it is only talking about Standby, whereas the code block is all for !StandbyMode. It would be more easy to read if refactored to put the comment in an empty code block like below: if (StandByMode) { /* * On standbys, we always start with the status in the last checkpoint * record... * ... */ } else { <code> } ~ 28b. What if the first condition (max_replication_slots is 0) and last condition (WAL_LEVEL_LOGICAL) are true at the same time? Currently this initializes the status as READY. Is it OK? ~~~ UpdateLogicalDecodingStatus: 29. +void +UpdateLogicalDecodingStatus(bool activate) +{ + XLogLogicalInfoCtl->status = activate + ? LOGICAL_DECODING_STATUS_READY + : LOGICAL_DECODING_STATUS_DISABLED; +} 29a. Missing function comment. ~ 29b. IMO the parameter should be in past tense, but I also think should be called "enabled" (same as what it was called in the extern in logicalxlog.h) ~~~ XLogLogicalInfoEnabled: 30. +/* + * Is WAL-Logging logical info enabled? + */ +bool inline +XLogLogicalInfoEnabled(void) +{ + /* + * Use volatile pointer to make sure we make a fresh read of the shared + * variable. + */ + volatile XLogLogicalInfoCtlData *ctl = XLogLogicalInfoCtl; + + return ctl->status >= LOGICAL_DECODING_STATUS_XLOG_LOGICALINFO; +} How necessary is this function? I thought that LOGICAL_DECODING_STATUS_XLOG_LOGICALINFO is more like just a transitory state between LOGICAL_DECODING_STATUS_DISABLED and LOGICAL_DECODING_STATUS_READY. So in that case, isn't it simpler just to wait for the READY? e.g. remove this function and just use IsLogicalDecodingActive() instead? ~~~ do_activate_logical_decoding: 31. Missing function comment. ~~~ 32. + /* + * Get the latest status of logical info logging. If it's already + * activated, quick return. + */ + SpinLockAcquire(&XLogLogicalInfoCtl->mutex); + if (XLogLogicalInfoCtl->status >= LOGICAL_DECODING_STATUS_XLOG_LOGICALINFO) + { + SpinLockRelease(&XLogLogicalInfoCtl->mutex); + return; + } 32a. I didn't think the comment should mention "logical info logging" here. SUGGESTION Quick return if logical decoding is already activated (or activation was already requested and is pending). ~ 32b. Related to this, it seems simpler to check condition "if (XLogLogicalInfoCtl->status > LOGICAL_DECODING_STATUS_DISABLED)" ~~~ 33. + PG_ENSURE_ERROR_CLEANUP(logical_decoding_activation_abort_callback, 0); + { ... + } + PG_END_ENSURE_ERROR_CLEANUP(logical_decoding_activation_abort_callback, 0); IIUC, this whole chunk of code lies between the status of "xlog-logicalinfo" and "ready". I think it needs a comment something like: /* Confirm that all in-progress transactions have completed before logical decoding can be enabled. */ ~~~ logical_decoding_activation_abort_callback: 34. Missing function comment. ~~~ 35. Since this is a callback for the function do_activate_logical_decoding(), I thought the function should be slightly renamed to match the caller. /logical_decoding_activation_abort_callback/activate_logical_decoding_abort_callback/ ~~~ 36. IIUC, the status could have reached LOGICAL_DECODING_STATUS_XLOG_LOGICALINFO before we encountered some problem trying to get to the READY status, and so reset everything back to DISABLED. So some (unusable?) logical WAL records could have been written while in that intermediate state? It is another reason why I was unsure about the need for the earlier XLogLogicalInfoEnabled function. Won't it be simpler to just have a DISABLED and ENABLED state and just keep the transition phase for internal logic? IOW, make XLogLogicalInfoEnabled() return true only when READY/ENABLED. ~~~ pg_activate_logical_decoding: 37. Missing function comment. ~~~ 38. + if (IsTransactionState() && + GetTopTransactionIdIfAny() != InvalidTransactionId) + ereport(ERROR, + (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION), + errmsg("cannot activate logical decoding in transaction that has performed writes"))); + + Double blank lines. ~~~ 39. + SpinLockAcquire(&(XLogLogicalInfoCtl->mutex)); + status = XLogLogicalInfoCtl->status; + SpinLockRelease(&(XLogLogicalInfoCtl->mutex)); + + if (status == LOGICAL_DECODING_STATUS_READY) + PG_RETURN_VOID(); Should have some comment like: "If logical decoding is already enabled then there is nothing to do." OTOH, can't all this code be removed, since the top-of-loop code will do the same thing anyway (I didn't think the extra sleep prepare/cancel that it is doing was something to worry about). ~~~ 40. + ConditionVariableSleep(&XLogLogicalInfoCtl->cv, + WAIT_EVENT_LOGICAL_DECODING_ACTIVATION); + + continue; Remove the blank line. ~~~ 41. + /* Activate logical decoding */ + do_activate_logical_decoding(); The comment is not helpful because it is practically the same as the function name. ~~~ pg_deactivate_logical_decoding: 42. Missing function comment. ~~~ 43. + /* + * Hold ReplicationSlotAllocationLock to prevent slots from newly + * being created while deactivating the logical decoding. + */ + LWLockAcquire(ReplicationSlotAllocationLock, LW_SHARED); /newly being created/being created/ ~~~ 44. + for (int i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i]; + + if (!s->in_use) + continue; + + if (SlotIsPhysical(s)) + continue; + + if (s->data.invalidated != RS_INVAL_NONE) + continue; + + valid_logical_slots++; + } IMO it is more natural here to check "if (!SlotIsLogical(s))" instead of "if (SlotIsPhysical(s))". ~~~ 45. + if (valid_logical_slots > 0) + ereport(ERROR, + (errmsg("cannot deactivate logical decoding while having valid logical replication slots"))); /while having valid logical replication slots/while valid logical replication slots are present/ ~~~ 46. + if (XLogStandbyInfoActive() && !recoveryInProgress) + { + bool enabled = false; + + XLogBeginInsert(); + XLogRegisterData((char *) (&enabled), sizeof(bool)); + + XLogInsert(RM_XLOG_ID, XLOG_LOGICAL_DECODING_STATUS); + } + + SpinLockAcquire(&(XLogLogicalInfoCtl->mutex)); + XLogLogicalInfoCtl->status = LOGICAL_DECODING_STATUS_DISABLED; + SpinLockRelease(&(XLogLogicalInfoCtl->mutex)); In the pg_activate_logical_decoding function, the assignment of the status and the writing of the XLog record was done within a critical section. Why isn't the code doing the same here? ~~~ pg_get_logical_decoding_status: 47. Missing function comment. ====== src/backend/replication/logical/slotsync.c ValidateSlotSyncParams: 48. - if (wal_level < WAL_LEVEL_LOGICAL) + if (!XLogLogicalInfoActive()) ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("replication slot synchronization requires \"wal_level\" >= \"logical\"")); + errmsg("replication slot synchronization requires \"wal_level\" >= \"logical\" or to activate logical decoding")); This is another message like what was in CreatePublication, and that might be better split into an errmsg and errhint. e.g. errmsg("replication slot synchronization requires logical decoding to be enabled"); errhint("Set \"wal_level\" to \"logical\" or use pg_activate_logical_decoding."); ====== src/backend/replication/slot.c RestoreSlotFromDisk: 49. + if (cp.slotdata.database != InvalidOid && !XLogLogicalInfoActive()) ereport(FATAL, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("logical replication slot \"%s\" exists, but \"wal_level\" < \"logical\"", NameStr(cp.slotdata.name)), - errhint("Change \"wal_level\" to be \"logical\" or higher."))); + errhint("Change \"wal_level\" to be \"logical\" or higher, or activate logical decoding with \"replica\"."))); Another message that could be made more consistent with the similar ones from CreatePublication, ValidateSlotSyncParams, etc... e.g. errmsg("logical replication slot \"%s\" exists, but logical decoding is not enabled"); errhint("Set \"wal_level\" to \"logical\" or use pg_activate_logical_decoding."); ====== src/backend/utils/activity/wait_event_names.txt 50. +LOGICAL_DECODING_ACTIVATION "Waiting for logical decoding to be activated." Should this be more like "Waiting for a requested logical decoding activation to complete." ====== src/include/access/xlog.h 51. -#define XLogLogicalInfoActive() (wal_level >= WAL_LEVEL_LOGICAL) +#define XLogLogicalInfoActive() (XLogLogicalInfoEnabled()) Those extra parentheses seem redundant now. ====== src/include/replication/logicalxlog.h 52. +/*------------------------------------------------------------------------- + * logicalxlog.h + * Exports from replication/logical/logicalxlog.c. + * + * Copyright (c) 2012-2024, PostgreSQL Global Development Group + * + *------------------------------------------------------------------------- /2024/2025/ ~~~ 53. +typedef enum LogicalDecodingStatus +{ + LOGICAL_DECODING_STATUS_DISABLED = 0, + LOGICAL_DECODING_STATUS_XLOG_LOGICALINFO, + LOGICAL_DECODING_STATUS_READY, +} LogicalDecodingStatus; Maybe some more comments about these status phases would be helpful. Specifically, it's not immediately clear if the LOGICAL_DECODING_STATUS_XLOG_LOGICALINFO is a real state we could find ourselves permanently, or if it is more like just a temporary "pending" state while transitioning from DISABLED to READY. AFAICT it is just transitory, so in that case, maybe LOGICAL_DECODING_PENDING might be a more appropriate name. ~~~ 54. +typedef struct XLogLogicalInfoCtlData XLogLogicalInfoCtlData; +extern XLogLogicalInfoCtlData *XLogLogicalInfoCtl; +extern bool XLogLogicalInfo; Add a blank line between the typedef and the externs. ====== [1] Bertrand's hide everything idea - https://www.postgresql.org/message-id/Z3fimYj0fbkLmWJb%40ip-10-97-1-34.eu-west-3.compute.internal Kind Regards, Peter Smith. Fujitsu Australia