Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAD21AoAQJ0LuRYuj0g8-uB9Qtns88HK_TVdoa5jmX3ZPBK9gvw@mail.gmail.com
Whole thread Raw
In response to Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
List pgsql-hackers
On Wed, Oct 15, 2025 at 12:57 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Sawada-San,
>
> Here are some more review comments for v18.
>
> (WIP)
>
> ======
> src/backend/access/transam/xact.c
>
> MarkCurrentTransactionIdLoggedIfAny:
>
> 1.
> - * This returns true if wal_level >= logical and we are inside a valid
> - * subtransaction, for which the assignment was not yet written to any WAL
> - * record.
> + * This returns true if effective WAL level is logical and we are inside
> + * a valid subtransaction, for which the assignment was not yet written to
> + * any WAL record.
>
> Maybe it is better to always substitute 'wal_level' with
> 'effective_wal_level' for consistency?
>
> ~~~
>
> 2.
> - /* wal_level has to be logical */
> + /* effective WAL level has to be logical */
>
> ditto: maybe just say: "effective_wal_level has to be logical".
>
> ~~~
>
> 3.
> + * When effective WAL level is logical, guarantee that a subtransaction's
> + * xid can only be seen in the WAL stream if its toplevel xid has been
> + * logged before. If necessary we log an xact_assignment record with fewer
>
> ditto: maybe say "When effective_wal_level is logical..."

Fixed.

>
> ======
> src/backend/access/transam/xlog.c
>
> 4.
> +const char *
> +show_effective_wal_level(void)
> +{
> + char    *str;
> +
> + if (wal_level == WAL_LEVEL_MINIMAL)
> + return "minimal";
> +
> + /*
> + * During the recovery, we need to check the shared status instead of the
> + * local XLogLogicalInfo cache as we don't synchronously update it during
> + * the recovery.
> + */
> + if (RecoveryInProgress())
> + return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
> +
> + if (wal_level == WAL_LEVEL_REPLICA)
> + {
> + /*
> + * With wal_level='replica', XLogLogicalInfo indicates the actual WAL
> + * level.
> + */
> + if (IsXLogLogicalInfoEnabled())
> + str = "logical";
> + else
> + str = "replica";
> + }
> + else
> + str = "logical";
> +
> + return str;
> +}
>
> It would be simpler and have more consistent ternary usage to have
> direct returns in all places instead of just half or them. Then remove
> the 'str' var:
>
> SUGGESTION
> ...
> if (wal_level == WAL_LEVEL_MINIMAL)
>   return "minimal";
> ...
>   if (RecoveryInProgress())
>     return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
> ...
>   if (wal_level == WAL_LEVEL_REPLICA)
>     return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
> ...
>   return "logical";

Thank you for the suggestion! I reconsidered and improved the code
flow of that function.

>
> ======
> src/backend/commands/publicationcmds.c
>
> 5.
> - if (wal_level != WAL_LEVEL_LOGICAL)
> + if (!IsLogicalDecodingEnabled())
>   ereport(WARNING,
>   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> - errmsg("\"wal_level\" is insufficient to publish logical changes"),
> - errhint("Set \"wal_level\" to \"logical\" before creating subscriptions.")));
> + errmsg("logical decoding should be allowed to publish logical changes"),
> + errhint("Before creating subscriptions, set \"wal_level\" >=
> \"logical\" or create a logical replication slot when \"wal_level\" =
> \"replica\".")));
>
> Why not just say errmsg "\"effective_wal_level\" is insufficient to
> publish logical changes"

I think that saying "logical decoding should be allowed" sounds more
understandable for users than saying "effective_wal_level is
insufficient".

>
> ======
> src/backend/commands/tablecmds.c
>
> 6.
> - /* should only get here if wal_level >= logical */
> + /* should only get here if effective WAL level is 'logical' */
>   Assert(XLogLogicalInfoActive());
> ditto from earlier: say "should only get here if effective_wal_level is logical"

Fixed.

>
> ======
> src/backend/postmaster/checkpointer.c
>
> 7.
> + /*
> + * Disable logical decoding if someone requested to disable logical
> + * decoding. See comments atop logicalctl.c.
> + */
> + DisableLogicalDecodingIfNecessary();
> +
>
> The comment seems a bit repetitive.
>
> SUGGESTION
> Disable logical decoding if someone requested it. See comments atop
> logicalctl.c.

Fixed.

>
> ======
> src/backend/replication/logical/decode.c
>
> 8.
> - errmsg("logical decoding on standby requires \"wal_level\" >=
> \"logical\" on the primary")));
> + errmsg("logical decoding on standby requires \"effective_wal_level\"
> >= \"logical\" on the primary")));
>
> Is that >= really necessary?
>
> Why not say: "... requires \"effective_wal_level\" to be \"logical\" on the .."
>
> ======
> src/backend/replication/logical/logical.c
>
> 9.
> + errmsg("logical decoding on standby requires \"effective_wal_level\"
> >= \"logical\" on the primary"),
> + errhint("Set \"wal_level\" >= \"logical\" or create at least one
> logical slot when \"wal_level\" = \"replica\".")));
>
> ditto above. Those >= seem to make it more complex than necessary.
>
> errmsg can be: "...requires \"effective_wal_level\" to be \"logical\"
> on the ..."
> errhint can be: "Set \"wal_level\" = \"logical\" or..."
>
> ======
> src/backend/replication/logical/slotsync.c
>
> 10.
>   ereport(elevel,
>   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> - errmsg("replication slot synchronization requires \"wal_level\" >=
> \"logical\""));
> + errmsg("replication slot synchronization requires
> \"effective_wal_level\" >= \"logical\" on the primary"),
> + errhint("To enable logical decoding on primary, set \"wal_level\" >=
> \"logical\" or create at least one logical slot when \"wal_level\" =
> \"replica\"."));
>
> ditto above. Those >= seem to make it more complex than necessary.
>
> errmsg can be: "...requires \"effective_wal_level\" to be \"logical\"
> on the ..."
> errhint can be: "...set \"wal_level\" = \"logical\" or..."

It seems we're already using >=  in many places even before the patch:

src/backend/access/transam/xlogfuncs.c:
errmsg("pg_log_standby_snapshot() can only be used if \"wal_level\" >=
\"replica\"")));
src/backend/postmaster/postmaster.c:
(errmsg("replication slot synchronization (\"sync_replication_slots\"
= on) requires \"wal_level\" >= \"logical\"")));
src/backend/replication/logical/decode.c:
                          errmsg("logical decoding on standby requires
\"wal_level\" >= \"logical\" on the primary")));
src/backend/replication/logical/logical.c:
  errmsg("logical decoding requires \"wal_level\" >= \"logical\"")));
src/backend/replication/logical/logical.c:
          errmsg("logical decoding on standby requires \"wal_level\"
>= \"logical\" on the primary")));
src/backend/replication/logical/slotsync.c:
 errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));

I agree that it makes the message more complex than necessary. I
imagine that we used that style since we can avoid editing these
messages when we introduce a higher level than 'logical', but I think
it's likely to adjust the messages in that case anyway. So how about
discussing this topic in another thread to simply improve the
user-faced error messages? If we can agree on such changes, this patch
can follow the new style.

>
> ======
> src/backend/replication/slot.c
>
> ReplicationSlotCleanup:
>
> 11.
> + if (SlotIsLogical(s))
> + nlogicalslots++;
> +
>
> How about assign SlotIsLogical(s) to a var like other places, instead
> of calling it 2x?

SlotIsLogical() is a macro so we don't necessarily need to avoid
calling multiple times in terms of performance. Even for readability,
the current style looks reasonable to me.

>
> ~~
>
> ReportSlotInvalidation:
>
> 12.
>   case RS_INVAL_WAL_LEVEL:
> - appendStringInfoString(&err_detail, _("Logical decoding on standby
> requires \"wal_level\" >= \"logical\" on the primary server."));
> + appendStringInfoString(&err_detail, _("Logical decoding on standby
> requires the primary server to have either \"wal_level\" >=
> \"logical\" or at least one logical slot created."));
>
>
> 12a.
> ditto previous comments about the >=

Please refer to my comment to your similar comments above.

>
> ~
>
> 12b.
> This error message does not seem to be as good as previous ones. e.g.
> I think it should be mentioning about" replica", because creating
> slots with "minimal" is no good, right?

Fixed.

>
> ~~~
>
> InvalidateObsoleteReplicationSlots:
>
> 13.
> - * - RS_INVAL_WAL_LEVEL: is logical and wal_level is insufficient
> + * - RS_INVAL_WAL_LEVEL: is logical and logical decoding is disabled due to
> + *   insufficient WAL level or no logical slot presence.
>
> Maybe it is just easier to say: "...is logical and effective_wal_level
> is not logical"
>

Agreed, fixed.

> ======
> src/backend/storage/ipc/standby.c
>
> 14.
> - if (wal_level < WAL_LEVEL_LOGICAL)
> + if (!IsLogicalDecodingEnabled())
>   LWLockRelease(ProcArrayLock);
>
>   recptr = LogCurrentRunningXacts(running);
>
>   /* Release lock if we kept it longer ... */
> - if (wal_level >= WAL_LEVEL_LOGICAL)
> + if (IsLogicalDecodingEnabled())
>   LWLockRelease(ProcArrayLock);
>
> ~
>
> Would it be better to assign IsLogicalDecodingEnabled() to a var here
> instead of 2x calls?

Yeah, I think it should do that since it seems possible that logical
decoding status can change between two calls, resulting in not
releasing the ProcArrayLock.

> ======
> src/backend/utils/cache/inval.c
>
> 15.
> + * When effective WAL level is 'logical', write invalidations into WAL at
> + * each command end to support the decoding of the in-progress transactions.
> + * See CommandEndInvalidationMessages.
>
> ditto some previous review comment: Say "when effective_wal_level is
> logical, write..."

Fixed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: Michael Paquier
Date:
Subject: Instability of phycodorus in pg_upgrade tests with JIT