Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Peter Smith
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAHut+PtB4zZrMKm_vUd-JO29Uq2479unx1KR=hGu=sQxfSVXJg@mail.gmail.com
Whole thread Raw
In response to Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi Sawada-San.

Here are some mostly cosmetic review comments for patch v19 (excluding
test code).

======
doc/src/sgml/logicaldecoding.sgml

1.
+    <para>
+     If either condition is met, the operational WAL level becomes equivalent
+     to <literal>logical</literal>, which can be monitored through
+     <xref linkend="guc-effective-wal-level"/> parametr.
+    </para>

Typo? /through/through the/

Typo: /parametr/parameter/

~~~

2.
+    <para>
+     When <varname>wal_level</varname> is set to <literal>replica</literal>,
+     logical decoding is automatically activated upon creation of the first
+     logical replication slot. This activation process involves several steps
+     and requires synchronization among processes, ensuring system-wide
+     onsistency. Conversely, when the last logical replication slot is dropped

Typo: /onsistency/consistency/

~~~

3.
+    <caution>
+     <para>
+      When <varname>wal_level</varname> is set to <literal>replica</literal>,
+      dropping the last logical slot may disable logical decoding on primary,
+      resulting in slots on standbys being invalidated.
+     </para>
+    </caution>

Typo? /on primary/on the primary/

======
doc/src/sgml/system-views.sgml

4.
          <para>
           <literal>wal_level_insufficient</literal> means that the
-          primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to
-          perform logical decoding.  It is set only for logical slots.
+          logical decoding is disabled on primary (See
+          <xref linkend="logicaldecoding-explanation-log-dec"/>). It is set
+          only for logical slots.
          </para>

Typo? /on primary/on the primary/

======
src/backend/access/transam/xlog.c

5.
+ * XXX: We keep the slotsync worker running even after logical
+ * decoding is disabled for simplicity. A future improvement
+ * can consider starting and stopping the worker based on
+ * logical decoding status change.

First sentence might be better written as:
For simplicity, we keep the slotsync worker running even after logical
decoding is disabled.

======
src/backend/commands/publicationcmds.c

6.
+ errmsg("logical decoding should be allowed to publish logical changes"),
+ errhint("Before creating subscriptions, set \"wal_level\" >=
\"logical\" or create a logical replication slot when \"wal_level\" =
\"replica\".")));

6a.
I still felt this this errmsg wording is inconsistent -- e.g. You have
not used the term "allowed" anywhere else in the patch; mostly logical
decoding is said to be enabled/disabled.

Maybe:
"to publish logical changes the logical decoding must be enabled"

Or simply:
"local decoding is not enabled"

~~

6b.
I'm not sure that you really needed to say "or create a logical
replication slot when wal_level = replica". Because, the CREATE
SUBSCRIPTION it refers to would be creating that logical slot anyway,
won't it?

So maybe the errhint only need to say:
Set \"wal_level\" >= \"replica\".

======
src/backend/replication/logical/logicalctl.c

7.
+ * This module enables dynamic control of logical decoding availability.
+ * Logical decoding becomes active under two conditions: when the wal_level
+ * parameter is set to 'logical', or when at least one logical replication
+ * slot exists with wal_level set to 'replica'. The system disables logical
+ * decoding when neither condition is met. Therefore, the dynamic control
+ * of logical decoding availability is required only when wal_level is set
+ * to 'replica'. With 'logical' WAL level, logical decoding is always enabled
+ * whereas with 'minimal' WAL level, it's always disabled.

I thought it is more easy/consistent to refer always wal_level instead
of sometimes referring to WAL level.

SUGGESTION (for last sentence)
Logical decoding is always enabled when wal_level='logical' and always
disabled when wal_level='minimal'.

~~~

8.
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"

Missing blank line before the #includes.

~~~

StartupLogicalDecodingStatus:

9.
+ if (last_status)
+ {
+ LogicalDecodingCtl->xlog_logical_info = true;
+ LogicalDecodingCtl->logical_decoding_enabled = true;
+ }

Is the 'if' needed? It could have just said:
LogicalDecodingCtl->xlog_logical_info = last_status;
LogicalDecodingCtl->logical_decoding_enabled = last_status;

Or, why not just delegate to the other function:
UpdateLogicalDecodingStatus(last_status, false);

~~~

start_logical_decoding_status_change:

10.
+ /*
+ * Wait for the recovery to complete. Note that even the checkpointer
+ * can wait for the recovery to complete here without concerning
+ * deadlocks unless the startup process doesn't perform any action
+ * that waits for it after calling
+ * UpdateLogicalDecodingStatusEndOfRecovery().
+ */

Is the wording correct? The "unless ... doesn't" seems backwards.

~~~

LogicalDecodingStatusChangeAllowed:

11.
+ /*
+ * We check the current status to see if we need to change it. If a status
+ * change is in-progress, we need to wait for completion.
+ */
+ if (LogicalDecodingCtl->status_change_inprogress)

Only the 2nd sentence of the comment seems relevant to this code.
AFAICT the 1st sentence belongs to logic that comes later in this
function.

~~~

RequestDisableLogicalDecoding:

12.
+/*
+ * Initiate a request for disabling logical decoding.
+ *
+ * This function expects to be after dropping the possibly-last logical
+ * replication slot as it doesn't check the logical slot presence.
+ */

Typo? /expects to be after/expects to be called after/

~~~

13.
+ /*
+ * Check if the status change is allowed before initiating a deactivation
+ * request, to avoid unnecessary work.
+ */
+ if (!LogicalDecodingStatusChangeAllowed())
+ return;

IMO call this the "disable request" (same as the function comment;
same as the function name) instead of introducing new terminolgy like
"deactivation request".

~~~

DisableLogicalDecodingIfNecessary:

14.
+ /*
+ * When disabling logical decoding, we need to disable logical decoding
+ * first and disable logical information WAL logging in order to ensure
+ * that no logical decoding processes WAL records with insufficient
+ * information.
+ */

wording?

~~~

15.
+ /*
+ * Order all running processes to reflect the xlog_logical_info update.
+ * Unlike when enabling logical decoding, we don't need to wait for all
+ * processes to complete it in this case. We already disabled logical
+ * decoding and it's always safe to write logical information to WAL
+ * records, even when not strictly required. Therefore, we don't need to
+ * wait for all running transactions to finish either.
+ */

Order makes me think about sorting. Maybe "Tell" instead of "Order" is clearer.

~~~

UpdateLogicalDecodingStatusEndOfRecovery:

16.
+ /*
+ * We can use logical decoding if we're using 'logical' WAL level or there
+ * is at least one logical replication slot.
+ */
+ if (wal_level == WAL_LEVEL_LOGICAL || CheckLogicalSlotExists())
+ new_status = true;
+
+ if (LogicalDecodingCtl->logical_decoding_enabled != new_status)
+ need_wal = true;

16a.
The comment would be clearer to say similar to lots of other places:

SUGGESTION
We can use logical decoding if wal_level=logical, or wal_level=replica
and there is at least one logical replication slot.

~

16b.
The conditions are not really needed. Maybe it is easier to write as below:

new_status = (wal_level == WAL_LEVEL_LOGICAL || CheckLogicalSlotExists());
need_wal = (LogicalDecodingCtl->logical_decoding_enabled != new_status);

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master
Next
From: David Rowley
Date:
Subject: Re: BRIN: Prevent the heapblk overflow during index summarization on very large tables resulting in an infinite loop