Thread: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Ashutosh Bapat

Date:

09 January, 14:29:22

On Tue, Dec 31, 2024 at 10:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi all,
>
> Logical decoding (and logical replication) are available only when
> wal_level = logical. As the documentation says[1], Using the 'logical'
> level increases the WAL volume which could negatively affect the
> performance. For that reason, users might want to start with using
> 'replica', but when they want to use logical decoding they need a
> server restart to increase wal_level to 'logical'. My goal is to allow
> users who are using 'replica' level to use logical decoding without a
> server restart. There are other GUC parameters related to logical
> decoding and logical replication such as max_wal_senders,
> max_logical_replication_workers, and max_replication_slots, but even
> if users set these parameters >0, there would not be a noticeable
> performance impact. And their default values are already >0. So I'd
> like to focus on making only the wal_level dynamic GUC parameter.
> There are several earlier discussions[2][3] but no one has submitted
> patches unless I'm missing something.
>
> The first idea I came up with is to make the wal_level a PGC_SIGHUP
> parameter. However, it affects not only setting 'replica' to 'logical'
> but also setting 'minimal' to 'replica' or higher. I'm not sure the
> latter case is common and it might require a checkpoint. I don't want
> to make the patch complex for uncommon cases.
>
> The second idea is to somehow allow both WAL-logging logical info and
> logical decoding even when wal_level is 'replica'. I've attached a PoC
> patch for that. The patch introduces new SQL functions such as
> pg_activate_logical_decoding() and pg_deactivate_logical_decoding().
> These functions are available only when wal_level is 'repilca'(or
> higher). In pg_activate_logical_decoding(), we set the status of
> logical decoding stored on the shared memory from 'disabled' to
> 'xlog-logical-info', allowing all processes to write logical
> information to WAL records for logical decoding. But the logical
> decoding is still not allowed. Once we confirm all in-progress
> transactions completed, we switch the status to
> 'logical-decoding-ready', meaning that users can create logical
> replication slots and use logical decoding.
>
> Overall, with the patch, there are two ways to enable logical
> decoding: setting wal_level to 'logical' and calling
> pg_activate_logical_decoding() when wal_level is 'replica'. I left the
> 'logical' level for backward compatibility and for users who want to
> enable the logical decoding without calling that SQL function. If we
> can automatically enable the logical decoding when creating the first
> logical replication slot, probably we no longer need the 'logical'
> level. There is room to discuss the user interface. Feedback is very
> welcome.
>

If a server is running at minimal wal_level and they want to enable
logical replication, they would still need a server restart. That
would be rare but not completely absent.

Our documentation says "wal_level determines how much information is
written to the WAL.". Users would may not expect that the WAL amount
changes while wal_level = replica depending upon whether logical
decoding is possible. It may be possible to set the expectations right
by changing the documentation. It's not in the patch, so I am not sure
whether this is considered.

Cloud providers do not like multiple ways of changing configuration
esp. when they can not control it. See [1]. Changing wal_level through
a SQL function may fit the same category.

I agree that it would be a lot of work to make all combinations of
wal_level changes work, but changing wal_level through SIGHUP looks
like a cleaner solution. Is there way that we make the GUC SIGHUP but
disallow certain combinations of old and new values?

[1]
https://www.postgresql.org/message-id/flat/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

RE: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

"Hayato Kuroda (Fujitsu)"

Date:

28 January, 12:38:57

Dear Sawada-san,

I love the idea. I've roughly tested the patch and worked on my env.
Here are initial comments...

1. xloglevelworker.c
```
+#include "replication/logicalxlog.h"
```

xloglevelworker.c includes replication/logicalxlog.h, but it does not exist.
The line had to be removed to build and test it.

2.
```
+static void
+writeUpdateWalLevel(int new_wal_level)
+{
+ XLogBeginInsert();
+ XLogRegisterData((char *) (&new_wal_level), sizeof(bool));
+ XLogInsert(RM_XLOG_ID, XLOG_UPDATE_WAL_LEVEL);
+}
```

IIUC the data length should be sizeof(int) instead of sizeof(bool).

3.
Is there a reason why the process does not wait till the archiver exits?

4.
When I dumped wal files, I found that XLOG_UPDATE_WAL_LEVEL cannot be recognized:

```
rmgr: XLOG len (rec/tot): 27/ 27, tx: 0, lsn: 0/03050838, prev 0/03050800, desc: UNKNOWN (f0)
wal_levellogical

```

xlog_identify() must be updated as well.

5.
When I changed "logical" to "replica", postgres outputs like below:

```
LOG: received SIGHUP, reloading configuration files
LOG: parameter "wal_level" changed to "replica"
LOG: wal_level control worker started
LOG: changing wal_level from "logical" to "replica"
LOG: wal_level has been decreased to "replica"
LOG: successfully changed wal_level from "logical" to "replica"
```

ISTM that both postmaster and the wal_level control worker said something like
"wal_level changed", which is bit strange for me. Since GUC can't be renamed,
can we use another name for the wal_level control state?

6.
With the patch present, the wal_level can be changed to the minimal even when the
streaming replication is going. If we do that, the walsender exits immediately and
the below FATAL appears periodically until the standby stops. Same things can be
said for the logical replication:

```
FATAL: streaming replication receiver "walreceiver" could not connect to the primary server:
connection to server on socket "/tmp/.s.PGSQL.oooo" failed:
FATAL: WAL senders require "wal_level" to be "replica" or "logical
```

I know this is not a perfect, but can we avoid the issue by reject the GUC update
if the walsender exists? Another approach is not to update the value when replication
slots need to be invalidated.

----------
Best regards,
Haato Kuroda

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

29 January, 03:09:32

On Tue, Jan 28, 2025 at 1:39 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> I love the idea. I've roughly tested the patch and worked on my env.
> Here are initial comments...

Thank you for looking at the patch!

>
> 1. xloglevelworker.c
> ```
> +#include "replication/logicalxlog.h"
> ```
>
> xloglevelworker.c includes replication/logicalxlog.h, but it does not exist.
> The line had to be removed to build and test it.
>
> 2.
> ```
> +static void
> +writeUpdateWalLevel(int new_wal_level)
> +{
> +       XLogBeginInsert();
> +       XLogRegisterData((char *) (&new_wal_level), sizeof(bool));
> +       XLogInsert(RM_XLOG_ID, XLOG_UPDATE_WAL_LEVEL);
> +}
> ```
>
> IIUC the data length should be sizeof(int) instead of sizeof(bool).

Agreed to fix them.

>
> 3.
> Is there a reason why the process does not wait till the archiver exits?

No. I didn't implement this part as the patch was just for
proof-of-concept. I think it would be better to wait for it to exit.

>
> 4.
> When I dumped wal files, I found that XLOG_UPDATE_WAL_LEVEL cannot be recognized:
>
> ```
> rmgr: XLOG        len (rec/tot):     27/    27, tx:          0, lsn: 0/03050838, prev 0/03050800, desc: UNKNOWN (f0)
wal_levellogical 
> ```
>
> xlog_identify() must be updated as well.

Will fix.

>
> 5.
> When I changed "logical" to "replica", postgres outputs like below:
>
> ```
> LOG:  received SIGHUP, reloading configuration files
> LOG:  parameter "wal_level" changed to "replica"
> LOG:  wal_level control worker started
> LOG:  changing wal_level from "logical" to "replica"
> LOG:  wal_level has been decreased to "replica"
> LOG:  successfully changed wal_level from "logical" to "replica"
> ```
>
> ISTM that both postmaster and the wal_level control worker said something like
> "wal_level changed", which is bit strange for me. Since GUC can't be renamed,
> can we use another name for the wal_level control state?

I'm concerned that users could be confused if two different names
refer to substantially the same thing.

Having said that, I guess that we need to drastically change the
messages. For example, I think that the wal_level worker should say
something like "successfully made 'logical' wal_level effective"
instead of saying something like "changed wal_level value". Also,
users might not need gradual messages when increasing 'minimal' to
'logical' or decreasing 'logical' to 'minimal'.

>
> 6.
> With the patch present, the wal_level can be changed to the minimal even when the
> streaming replication is going. If we do that, the walsender exits immediately and
> the below FATAL appears periodically until the standby stops. Same things can be
> said for the logical replication:
>
> ```
> FATAL:  streaming replication receiver "walreceiver" could not connect to the primary server:
> connection to server on socket "/tmp/.s.PGSQL.oooo" failed:
> FATAL:  WAL senders require "wal_level" to be "replica" or "logical
> ```
>
> I know this is not a perfect, but can we avoid the issue by reject the GUC update
> if the walsender exists? Another approach is not to update the value when replication
> slots need to be invalidated.

Does it mean that we reject the config file from being reloaded in
that case? I have no idea how to reject it in a case where the
wal_level in postgresql.conf changed and the user did 'pg_ctl reload'.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

04 February, 11:15:29

On Mon, Feb 3, 2025 at 3:40 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> > I'm concerned that users could be confused if two different names
> > refer to substantially the same thing.
> >
> > Having said that, I guess that we need to drastically change the
> > messages. For example, I think that the wal_level worker should say
> > something like "successfully made 'logical' wal_level effective"
> > instead of saying something like "changed wal_level value". Also,
> > users might not need gradual messages when increasing 'minimal' to
> > 'logical' or decreasing 'logical' to 'minimal'.
>
> +1 for something like "successfully made 'logical' wal_level effective", and
> removing gradual messages.
>
> > > 6.
> > > With the patch present, the wal_level can be changed to the minimal even when
> > the
> > > streaming replication is going. If we do that, the walsender exits immediately
> > and
> > > the below FATAL appears periodically until the standby stops. Same things can
> > be
> > > said for the logical replication:
> > >
> > > ```
> > > FATAL:  streaming replication receiver "walreceiver" could not connect to the
> > primary server:
> > > connection to server on socket "/tmp/.s.PGSQL.oooo" failed:
> > > FATAL:  WAL senders require "wal_level" to be "replica" or "logical
> > > ```
> > >
> > > I know this is not a perfect, but can we avoid the issue by reject the GUC update
> > > if the walsender exists? Another approach is not to update the value when
> > replication
> > > slots need to be invalidated.
> >
> > Does it mean that we reject the config file from being reloaded in
> > that case? I have no idea how to reject it in a case where the
> > wal_level in postgresql.conf changed and the user did 'pg_ctl reload'.
>
> I imagined like attached. When I modified wal_level to minimal and send SIGHUP,
> postmaster reported below lines and failed to update wal_level.
>
> ```
> LOG:  received SIGHUP, reloading configuration files
> LOG:  wal_level cannot be set to "minimal" while walsender exists
> LOG:  configuration file "...postgresql.conf" contains errors; unaffected changes were applied
> ```

Interesting, and thanks for sharing the patch. But I think that when
we change the wal_level to 'minimal', there is a window where a new
walsender can launch after passing the check_wal_level() check.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Bertrand Drouvot

Date:

12 February, 10:44:38

Hi,

On Tue, Feb 11, 2025 at 02:11:10PM -0800, Masahiko Sawada wrote:
> I've updated the patch that includes comment updates and bug fixes.

Thanks!

> The main idea of changing WAL level online is to decouple two aspects:
> (1) the information included in WAL records and (2) the
> functionalities available at each WAL level. With that, we can change
> the WAL level gradually. For example, when increasing the WAL level
> from 'replica' to 'logical', we first switch the WAL level on the
> shared memory to a new higher level where we allow processes to write
> WAL records with additional information required by the logical
> decoding, while keeping the logical decoding unavailable. The new
> level is something between 'replica' and 'logical'. Once we confirm
> all processes have synchronized to the new level, we increase the WAL
> level further to 'logical', allowing us to start logical decoding. The
> patch supports all combinations of WAL level transitions. It makes
> sense to me to use a background worker to proceed with this transition
> work since we need to wait at some points, rather than delegating it
> to the checkpointer process.

The background worker being added is "wal_level control worker". I wonder if
it would make sense to create a more "generic" one instead (to whom we could 
assign more "tasks" later on, as suggested in the past in [1]).

+   /*
+    * XXX: Perhaps it's not okay that we failed to launch a bgworker and give
+    * up wal_level change because we already reported that the change has
+    * been accepted. Do we need to use aux process instead for that purpose?
+    */
+   if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+       ereport(WARNING,
+               (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+                errmsg("out of background worker slots"),
+                errhint("You might need to increase \"%s\".", "max_worker_processes")));

Not sure it has to be an aux process instead as it should be busy in rare occasions.

Maybe we could add some mechanism for ensuring that a bgworker slot is available
when needed (as suggested in [2])?

Not saying it has to be done that way. I just thought that the "wal_level control worker"
could be a perfect use case/starting point for a more generic one but I don't want
to over complicate that thread though.

So maybe just rename "wal_level control worker" to say "custodian worker" and
we could also think about [2]? Feel free to consider all of this as Nits if you
feel it deviates too much from the initial intend of this thread.

[1]: https://www.postgresql.org/message-id/flat/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com
[2]: https://www.postgresql.org/message-id/1058306.1680467858%40sss.pgh.pa.us

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

14 February, 11:17:48

On Tue, Feb 11, 2025 at 11:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Feb 11, 2025 at 02:11:10PM -0800, Masahiko Sawada wrote:
> > I've updated the patch that includes comment updates and bug fixes.
>
> Thanks!
>
> > The main idea of changing WAL level online is to decouple two aspects:
> > (1) the information included in WAL records and (2) the
> > functionalities available at each WAL level. With that, we can change
> > the WAL level gradually. For example, when increasing the WAL level
> > from 'replica' to 'logical', we first switch the WAL level on the
> > shared memory to a new higher level where we allow processes to write
> > WAL records with additional information required by the logical
> > decoding, while keeping the logical decoding unavailable. The new
> > level is something between 'replica' and 'logical'. Once we confirm
> > all processes have synchronized to the new level, we increase the WAL
> > level further to 'logical', allowing us to start logical decoding. The
> > patch supports all combinations of WAL level transitions. It makes
> > sense to me to use a background worker to proceed with this transition
> > work since we need to wait at some points, rather than delegating it
> > to the checkpointer process.
>
> The background worker being added is "wal_level control worker". I wonder if
> it would make sense to create a more "generic" one instead (to whom we could
> assign more "tasks" later on, as suggested in the past in [1]).
>
> +   /*
> +    * XXX: Perhaps it's not okay that we failed to launch a bgworker and give
> +    * up wal_level change because we already reported that the change has
> +    * been accepted. Do we need to use aux process instead for that purpose?
> +    */
> +   if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
> +       ereport(WARNING,
> +               (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
> +                errmsg("out of background worker slots"),
> +                errhint("You might need to increase \"%s\".", "max_worker_processes")));
>
> Not sure it has to be an aux process instead as it should be busy in rare occasions.

Thank you for referring to the custodian worker thread. I'm not sure
that online wal_level change work would fit the concept of custodian
worker, which offloads some work for time-critical works such as
checkpointing, but this idea made me think of other possible
directions of this work.

Looking at the latest custodian worker patch, the basic architecture
is to have a single custodian worker and processes can ask it for some
work such as removing logical decoding related files. The online
wal_level change will be the one of the tasks that processes (eps.
checkpointer) can ask for it. On the other hand, one point that I
think might not fit this wal_level work well is that while the
custodian worker is a long-lived worker process, it's sufficient for
the online wal_level change work to have a bgworker that does its work
and then exits. IOW, from the perspective of this work, I prefer the
idea of having one short-lived worker for one task over having one
long-lived worker for multiple tasks. Reading that thread, while we
need to resolve the XID wraparound issue for the work of removing
logical decoding related files, the work of removing temporary files
seems to fit a short-lived worker style. So I thought as one of the
directions, it might be worth considering to have an infrastructure
where we can launch a bgworker just for one task, and we implement the
online wal_level change and temporary files removal on top of it.

> Maybe we could add some mechanism for ensuring that a bgworker slot is available
> when needed (as suggested in [2])?

Yeah, we need this mechanism if we use a bgworker for these works.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Bertrand Drouvot

Date:

14 February, 13:35:52

Hi,

On Fri, Feb 14, 2025 at 12:17:48AM -0800, Masahiko Sawada wrote:
> On Tue, Feb 11, 2025 at 11:44 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:

> Looking at the latest custodian worker patch, the basic architecture
> is to have a single custodian worker and processes can ask it for some
> work such as removing logical decoding related files. The online
> wal_level change will be the one of the tasks that processes (eps.
> checkpointer) can ask for it. On the other hand, one point that I
> think might not fit this wal_level work well is that while the
> custodian worker is a long-lived worker process,

That was the case initialy but it looks like it would not have been the case
at the end. See, Tom's comment in [1]:

"
I wonder if a single long-lived custodian task is the right model at all.
At least for RemovePgTempFiles, it'd make more sense to write it as a
background worker that spawns, does its work, and then exits,
independently of anything else
"

> it's sufficient for
> the online wal_level change work to have a bgworker that does its work
> and then exits.

Fully agree and I did not think about changing this behavior.

> IOW, from the perspective of this work, I prefer the
> idea of having one short-lived worker for one task over having one
> long-lived worker for multiple tasks.

Yeah, or one short-lived worker for multiple tasks could work too. It just 
starts when it has something to do and then exit.

> Reading that thread, while we
> need to resolve the XID wraparound issue for the work of removing
> logical decoding related files, the work of removing temporary files
> seems to fit a short-lived worker style. So I thought as one of the
> directions, it might be worth considering to have an infrastructure
> where we can launch a bgworker just for one task, and we implement the
> online wal_level change and temporary files removal on top of it.

Yeap, that was exactly my point when I mentioned the custodian thread (taking
into account Tom's comment quoted above).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Bertrand Drouvot

Date:

19 February, 12:56:18

Hi,

On Mon, Feb 17, 2025 at 12:07:56PM -0800, Masahiko Sawada wrote:
> On Fri, Feb 14, 2025 at 2:35 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Yeap, that was exactly my point when I mentioned the custodian thread (taking
> > into account Tom's comment quoted above).
> >
> 
> I've written PoC patches to have the online wal_level change work use
> a more generic infrastructure. These patches are still in PoC state
> but seem like a good direction to me. Here is a brief explanation for
> each patch.

Thanks for the patches!

> * The 0001 patch introduces "reserved background worker slots". We
> allocate max_process_workers + BGWORKER_CLASS_RESERVED at startup, and
> if the number of running bgworker exceeds max_worker_processes, only
> workers using the reserved slots can be launched. We can request to
> use the reserved slots by adding BGWORKER_CLASS_RESERVED flag at
> bgworker registration.

I had a quick look at 0001 and I think the way that's implemented is reasonnable.
I thought this could be defined through a GUC so that extensions can benefit
from it. But OTOH the core code should ensure the value is > as the number of
reserved slots needed by the core so not using a GUC looks ok to me.

> * The 0002 patch introduces "bgtask worker". The bgtask infrastructure
> is designed to execute internal tasks in background in
> one-worker-per-one-task style. Internally, bgtask workers use the
> reserved bgworker so it's guaranteed that they can launch.

Yeah.

> The
> internal tasks that we can request are predefined and this patch has a
> dummy task as a placeholder. This patch implements only the minimal
> functionality for the online wal_level change work. I've not tested if
> this bgtask infrastructure can be used for tasks that we wanted to
> offload to the custodian worker.

Again, I had a quick look and looks simple enough of our need here. It "just"
executes "(void) InternalBgTasks[type].func()" and then exists. That's, I think,
a good starting point to add more tasks in the future (if we want to).

> * The 0003 patch makes wal_level a SIGHUP parameter. We do the online
> wal_level change work using the bgtask infrastructure. There are no
> major changes from the previous version other than that.

It replaces the dummy task introduced in 0002 by the one that suits our needs
here (through the new BgTaskWalLevelChange() function).

The design looks reasonable to me. Waiting to see if others disagree before
looking more closely at the code.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

20 February, 21:05:20

On Wed, Feb 19, 2025 at 1:56 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,

Thank you for looking at the patches.

>
> On Mon, Feb 17, 2025 at 12:07:56PM -0800, Masahiko Sawada wrote:
> > On Fri, Feb 14, 2025 at 2:35 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > Yeap, that was exactly my point when I mentioned the custodian thread (taking
> > > into account Tom's comment quoted above).
> > >
> >
> > I've written PoC patches to have the online wal_level change work use
> > a more generic infrastructure. These patches are still in PoC state
> > but seem like a good direction to me. Here is a brief explanation for
> > each patch.
>
> Thanks for the patches!
>
> > * The 0001 patch introduces "reserved background worker slots". We
> > allocate max_process_workers + BGWORKER_CLASS_RESERVED at startup, and
> > if the number of running bgworker exceeds max_worker_processes, only
> > workers using the reserved slots can be launched. We can request to
> > use the reserved slots by adding BGWORKER_CLASS_RESERVED flag at
> > bgworker registration.
>
> I had a quick look at 0001 and I think the way that's implemented is reasonnable.
> I thought this could be defined through a GUC so that extensions can benefit
> from it. But OTOH the core code should ensure the value is > as the number of
> reserved slots needed by the core so not using a GUC looks ok to me.

Interesting idea. I kept the reserved slots only for internal use but
it would be worth considering to use GUC instead.

> > * The 0002 patch introduces "bgtask worker". The bgtask infrastructure
> > is designed to execute internal tasks in background in
> > one-worker-per-one-task style. Internally, bgtask workers use the
> > reserved bgworker so it's guaranteed that they can launch.
>
> Yeah.
>
> > The
> > internal tasks that we can request are predefined and this patch has a
> > dummy task as a placeholder. This patch implements only the minimal
> > functionality for the online wal_level change work. I've not tested if
> > this bgtask infrastructure can be used for tasks that we wanted to
> > offload to the custodian worker.
>
> Again, I had a quick look and looks simple enough of our need here. It "just"
> executes "(void) InternalBgTasks[type].func()" and then exists. That's, I think,
> a good starting point to add more tasks in the future (if we want to).

Yeah, we might want to extend it further, for example to pass an
argument to the background task or to ask multiple tasks for the
single bgtask worker. As far as I can read the custodian patch set,
the work of removing temp files seems not to require any argument
though.

>
> > * The 0003 patch makes wal_level a SIGHUP parameter. We do the online
> > wal_level change work using the bgtask infrastructure. There are no
> > major changes from the previous version other than that.
>
> It replaces the dummy task introduced in 0002 by the one that suits our needs
> here (through the new BgTaskWalLevelChange() function).
>
> The design looks reasonable to me. Waiting to see if others disagree before
> looking more closely at the code.

Thanks.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

21 April, 20:31:03

On Thu, Feb 20, 2025 at 10:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Feb 19, 2025 at 1:56 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
>
> Thank you for looking at the patches.
>
> >
> > On Mon, Feb 17, 2025 at 12:07:56PM -0800, Masahiko Sawada wrote:
> > > On Fri, Feb 14, 2025 at 2:35 AM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > Yeap, that was exactly my point when I mentioned the custodian thread (taking
> > > > into account Tom's comment quoted above).
> > > >
> > >
> > > I've written PoC patches to have the online wal_level change work use
> > > a more generic infrastructure. These patches are still in PoC state
> > > but seem like a good direction to me. Here is a brief explanation for
> > > each patch.
> >
> > Thanks for the patches!
> >
> > > * The 0001 patch introduces "reserved background worker slots". We
> > > allocate max_process_workers + BGWORKER_CLASS_RESERVED at startup, and
> > > if the number of running bgworker exceeds max_worker_processes, only
> > > workers using the reserved slots can be launched. We can request to
> > > use the reserved slots by adding BGWORKER_CLASS_RESERVED flag at
> > > bgworker registration.
> >
> > I had a quick look at 0001 and I think the way that's implemented is reasonnable.
> > I thought this could be defined through a GUC so that extensions can benefit
> > from it. But OTOH the core code should ensure the value is > as the number of
> > reserved slots needed by the core so not using a GUC looks ok to me.
>
> Interesting idea. I kept the reserved slots only for internal use but
> it would be worth considering to use GUC instead.
>
> > > * The 0002 patch introduces "bgtask worker". The bgtask infrastructure
> > > is designed to execute internal tasks in background in
> > > one-worker-per-one-task style. Internally, bgtask workers use the
> > > reserved bgworker so it's guaranteed that they can launch.
> >
> > Yeah.
> >
> > > The
> > > internal tasks that we can request are predefined and this patch has a
> > > dummy task as a placeholder. This patch implements only the minimal
> > > functionality for the online wal_level change work. I've not tested if
> > > this bgtask infrastructure can be used for tasks that we wanted to
> > > offload to the custodian worker.
> >
> > Again, I had a quick look and looks simple enough of our need here. It "just"
> > executes "(void) InternalBgTasks[type].func()" and then exists. That's, I think,
> > a good starting point to add more tasks in the future (if we want to).
>
> Yeah, we might want to extend it further, for example to pass an
> argument to the background task or to ask multiple tasks for the
> single bgtask worker. As far as I can read the custodian patch set,
> the work of removing temp files seems not to require any argument
> though.
>
> >
> > > * The 0003 patch makes wal_level a SIGHUP parameter. We do the online
> > > wal_level change work using the bgtask infrastructure. There are no
> > > major changes from the previous version other than that.
> >
> > It replaces the dummy task introduced in 0002 by the one that suits our needs
> > here (through the new BgTaskWalLevelChange() function).
> >
> > The design looks reasonable to me. Waiting to see if others disagree before
> > looking more closely at the code.
>
> Thanks.

I would like to discuss behavioral and user interface considerations.

Upon further analysis of this patch regarding the conversion of
wal_level to a SIGHUP parameter, I find that supporting all
combinations of wal_level value changes might make less sense.
Specifically, changing to or from 'minimal' would necessitate a
checkpoint, and reducing wal_level to 'minimal' would require
terminating physical replication, WAL archiving, and online backups.
While these operations demand careful consideration, there seems to be
no compelling use case for decreasing to 'minimal'. Furthermore,
increasing wal_level from 'minimal' is typically a one-time operation
during a database's lifetime. Therefore, we should weigh the benefits
against the implementation complexity.

One solution is to manage the effective WAL level using two distinct
GUC parameters: max_wal_level and wal_level. max_wal_level would be a
POSTMASTER parameter controlling the system's maximum allowable WAL
level, with values 'minimal', 'replica', and 'logical'. wal_level
would function as a SIGHUP parameter managing the runtime WAL level,
accepting values 'replica', 'logical', and 'auto'. The selected value
must be either 'auto' or not exceed max_wal_level. When set to 'auto',
wal_level automatically synchronizes with max_wal_level's value. This
approach would enable online WAL level transitions between 'replica'
and 'logical'.

Regarding logical decoding on standbys, currently both primary and
standby servers must have wal_level set to 'logical'. We need to
determine the appropriate behavior when users decrease the WAL level
from 'logical' to 'replica' through configuration file reload.

One approach would be to invalidate all logical replication slots on
the standby when transitioning to 'replica' WAL level. Although
incoming WAL records from the primary would still be written at
'logical' level, making logical decoding technically feasible, this
behavior seems logical as it reflects the user's intent to discontinue
logical decoding on the standby. For consistency, we might need to
invalidate logical slots during server startup if the WAL level is
insufficient.

Alternatively, we could permit logical decoding on the standby even
with wal_level set to 'replica'. However, this would necessitate
invalidating all logical replication slots during promotion,
potentially extending downtime during failover.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Amit Kapila

Date:

23 April, 15:46:13

On Mon, Apr 21, 2025 at 11:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I would like to discuss behavioral and user interface considerations.
>
> Upon further analysis of this patch regarding the conversion of
> wal_level to a SIGHUP parameter, I find that supporting all
> combinations of wal_level value changes might make less sense.
> Specifically, changing to or from 'minimal' would necessitate a
> checkpoint, and reducing wal_level to 'minimal' would require
> terminating physical replication, WAL archiving, and online backups.
> While these operations demand careful consideration, there seems to be
> no compelling use case for decreasing to 'minimal'. Furthermore,
> increasing wal_level from 'minimal' is typically a one-time operation
> during a database's lifetime. Therefore, we should weigh the benefits
> against the implementation complexity.
>
> One solution is to manage the effective WAL level using two distinct
> GUC parameters: max_wal_level and wal_level. max_wal_level would be a
> POSTMASTER parameter controlling the system's maximum allowable WAL
> level, with values 'minimal', 'replica', and 'logical'. wal_level
> would function as a SIGHUP parameter managing the runtime WAL level,
> accepting values 'replica', 'logical', and 'auto'. The selected value
> must be either 'auto' or not exceed max_wal_level. When set to 'auto',
> wal_level automatically synchronizes with max_wal_level's value. This
> approach would enable online WAL level transitions between 'replica'
> and 'logical'.
>
>
> Regarding logical decoding on standbys, currently both primary and
> standby servers must have wal_level set to 'logical'. We need to
> determine the appropriate behavior when users decrease the WAL level
> from 'logical' to 'replica' through configuration file reload.
>
> One approach would be to invalidate all logical replication slots on
> the standby when transitioning to 'replica' WAL level. Although
> incoming WAL records from the primary would still be written at
> 'logical' level, making logical decoding technically feasible, this
> behavior seems logical as it reflects the user's intent to discontinue
> logical decoding on the standby. For consistency, we might need to
> invalidate logical slots during server startup if the WAL level is
> insufficient.
>
> Alternatively, we could permit logical decoding on the standby even
> with wal_level set to 'replica'. However, this would necessitate
> invalidating all logical replication slots during promotion,
> potentially extending downtime during failover.
>

BTW, did we consider the idea to automatically transition to 'logical'
when the first logical slot is created and transition back to
'replica' when last logical slot gets dropped? I see some ideas around
this last time we discussed this topic.

[1] - https://www.postgresql.org/message-id/CAA4eK1J0we5qsZ-ZOwXPbZyvwdWbnT43knO2Cxidia2aHxZSJw%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

23 April, 19:04:20

On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 21, 2025 at 11:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I would like to discuss behavioral and user interface considerations.
> >
> > Upon further analysis of this patch regarding the conversion of
> > wal_level to a SIGHUP parameter, I find that supporting all
> > combinations of wal_level value changes might make less sense.
> > Specifically, changing to or from 'minimal' would necessitate a
> > checkpoint, and reducing wal_level to 'minimal' would require
> > terminating physical replication, WAL archiving, and online backups.
> > While these operations demand careful consideration, there seems to be
> > no compelling use case for decreasing to 'minimal'. Furthermore,
> > increasing wal_level from 'minimal' is typically a one-time operation
> > during a database's lifetime. Therefore, we should weigh the benefits
> > against the implementation complexity.
> >
> > One solution is to manage the effective WAL level using two distinct
> > GUC parameters: max_wal_level and wal_level. max_wal_level would be a
> > POSTMASTER parameter controlling the system's maximum allowable WAL
> > level, with values 'minimal', 'replica', and 'logical'. wal_level
> > would function as a SIGHUP parameter managing the runtime WAL level,
> > accepting values 'replica', 'logical', and 'auto'. The selected value
> > must be either 'auto' or not exceed max_wal_level. When set to 'auto',
> > wal_level automatically synchronizes with max_wal_level's value. This
> > approach would enable online WAL level transitions between 'replica'
> > and 'logical'.
> >
> >
> > Regarding logical decoding on standbys, currently both primary and
> > standby servers must have wal_level set to 'logical'. We need to
> > determine the appropriate behavior when users decrease the WAL level
> > from 'logical' to 'replica' through configuration file reload.
> >
> > One approach would be to invalidate all logical replication slots on
> > the standby when transitioning to 'replica' WAL level. Although
> > incoming WAL records from the primary would still be written at
> > 'logical' level, making logical decoding technically feasible, this
> > behavior seems logical as it reflects the user's intent to discontinue
> > logical decoding on the standby. For consistency, we might need to
> > invalidate logical slots during server startup if the WAL level is
> > insufficient.
> >
> > Alternatively, we could permit logical decoding on the standby even
> > with wal_level set to 'replica'. However, this would necessitate
> > invalidating all logical replication slots during promotion,
> > potentially extending downtime during failover.
> >
>
> BTW, did we consider the idea to automatically transition to 'logical'
> when the first logical slot is created and transition back to
> 'replica' when last logical slot gets dropped? I see some ideas around
> this last time we discussed this topic.

Yes. Bertrand pointed out that a drawback is that the primary server
needs to create a logical slot in order to execute logical decoding on
the standbys[1].

Regards,

[1] https://www.postgresql.org/message-id/Z5DCm6xiBfbUdvX7%40ip-10-97-1-34.eu-west-3.compute.internal

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Amit Kapila

Date:

24 April, 15:30:07

On Wed, Apr 23, 2025 at 9:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > BTW, did we consider the idea to automatically transition to 'logical'
> > when the first logical slot is created and transition back to
> > 'replica' when last logical slot gets dropped? I see some ideas around
> > this last time we discussed this topic.
>
> Yes. Bertrand pointed out that a drawback is that the primary server
> needs to create a logical slot in order to execute logical decoding on
> the standbys[1].
>

True, but if we want to avoid that, we can still keep 'logical' as
wal_level for the ease of users. We can also have another API like the
one you originally proposed (pg_activate_logical_decoding) for the
ease of users. But the minimum requirement would be that one creates a
logical slot to enable logical decoding/replication.

Additionally, shall we do some benchmarking, if not done already, to
show the cases where the performance and WAL volume can hurt users if
we make wal_level as 'logical'?

--
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

24 April, 20:44:05

On Thu, Apr 24, 2025 at 5:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 23, 2025 at 9:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > BTW, did we consider the idea to automatically transition to 'logical'
> > > when the first logical slot is created and transition back to
> > > 'replica' when last logical slot gets dropped? I see some ideas around
> > > this last time we discussed this topic.
> >
> > Yes. Bertrand pointed out that a drawback is that the primary server
> > needs to create a logical slot in order to execute logical decoding on
> > the standbys[1].
> >
>
> True, but if we want to avoid that, we can still keep 'logical' as
> wal_level for the ease of users.

I think we'd like to cover the use case like where users start with
'replica' on the primary and execute logical decoding on the standby
without neither creating a logical slot on the primary nor restarting
the primary.

> We can also have another API like the
> one you originally proposed (pg_activate_logical_decoding) for the
> ease of users. But the minimum requirement would be that one creates a
> logical slot to enable logical decoding/replication.

I think we want to avoid the runtime WAL level automatically decreased
to 'replica' once all logical slots are removed, if users still want
to execute logical decoding on only the standby. One idea is that if
users enable logical decoding using pg_activate_logical_decoding(),
the runtime WAL level doesn't decrease to 'replica' even if all
logical slots are removed. But it would require for us to remember how
the logical decoding has been enabled in a permanent way. Also, I'm
concerned that having three ways to enable logical decoding could
confuse users: wal_level GUC parameter, creating at least one logical
slot, and pg_activate_logical_decoding().

> Additionally, shall we do some benchmarking, if not done already, to
> show the cases where the performance and WAL volume can hurt users if
> we make wal_level as 'logical'?

I believe it would be significant especially for REPLICA IDENTITY FULL
tables. I agree it's worth benchmarking it but I guess the result
would not convince us to make 'logical' default.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Bertrand Drouvot

Date:

06 May, 10:19:36

Hi,

On Mon, Apr 21, 2025 at 10:31:03AM -0700, Masahiko Sawada wrote:
> I would like to discuss behavioral and user interface considerations.
> 
> Upon further analysis of this patch regarding the conversion of
> wal_level to a SIGHUP parameter, I find that supporting all
> combinations of wal_level value changes might make less sense.
> Specifically, changing to or from 'minimal' would necessitate a
> checkpoint, and reducing wal_level to 'minimal' would require
> terminating physical replication, WAL archiving, and online backups.
> While these operations demand careful consideration, there seems to be
> no compelling use case for decreasing to 'minimal'. Furthermore,
> increasing wal_level from 'minimal' is typically a one-time operation
> during a database's lifetime. Therefore, we should weigh the benefits
> against the implementation complexity.

Agree.

> One solution is to manage the effective WAL level using two distinct
> GUC parameters: max_wal_level and wal_level. max_wal_level would be a
> POSTMASTER parameter controlling the system's maximum allowable WAL
> level, with values 'minimal', 'replica', and 'logical'. wal_level
> would function as a SIGHUP parameter managing the runtime WAL level,
> accepting values 'replica', 'logical', and 'auto'. The selected value
> must be either 'auto' or not exceed max_wal_level. When set to 'auto',
> wal_level automatically synchronizes with max_wal_level's value. This
> approach would enable online WAL level transitions between 'replica'
> and 'logical'.

That makes sense to me. I think that 'logical' could be the default value
for max_wal_level and 'replica' the default for wal_level.
I think that would provide almost the same user experience as currently and would
allow replica->logical change without restart. Thoughts?

> Regarding logical decoding on standbys, currently both primary and
> standby servers must have wal_level set to 'logical'. We need to
> determine the appropriate behavior when users decrease the WAL level
> from 'logical' to 'replica' through configuration file reload.
> 
> One approach would be to invalidate all logical replication slots on
> the standby when transitioning to 'replica' WAL level. Although
> incoming WAL records from the primary would still be written at
> 'logical' level, making logical decoding technically feasible, this
> behavior seems logical as it reflects the user's intent to discontinue
> logical decoding on the standby.

+1

> For consistency, we might need to
> invalidate logical slots during server startup if the WAL level is
> insufficient.

Not sure. Currently we'd not allow the standby to start:

"
LOG:  entering standby mode
FATAL:  logical replication slot "logical_slot" exists, but "wal_level" < "logical"
HINT:  Change "wal_level" to be "logical" or higher.
LOG:  startup process (PID 1790508) exited with exit code 1
"

I think that's a good guard for configuration change mistakes. If that's a mistake
change back to logical and start. If that's not a mistake then change back to
logical, start, change with SIGHUP. OTOH I also see the benefits of being consistent
between SIGHUP and start.

> Alternatively, we could permit logical decoding on the standby even
> with wal_level set to 'replica'.

Yeah, technically speaking we could as the WALs are coming from the primary (that
has wal_level set to logical).

> However, this would necessitate
> invalidating all logical replication slots during promotion,
> potentially extending downtime during failover.

Yeah, I'm tempted to vote to not allow logical decoding on the standby if the
wal_level is not logical.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Amit Kapila

Date:

07 May, 09:59:10

On Thu, Apr 24, 2025 at 11:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Apr 24, 2025 at 5:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 9:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > BTW, did we consider the idea to automatically transition to 'logical'
> > > > when the first logical slot is created and transition back to
> > > > 'replica' when last logical slot gets dropped? I see some ideas around
> > > > this last time we discussed this topic.
> > >
> > > Yes. Bertrand pointed out that a drawback is that the primary server
> > > needs to create a logical slot in order to execute logical decoding on
> > > the standbys[1].
> > >
> >
> > True, but if we want to avoid that, we can still keep 'logical' as
> > wal_level for the ease of users.
>
> I think we'd like to cover the use case like where users start with
> 'replica' on the primary and execute logical decoding on the standby
> without neither creating a logical slot on the primary nor restarting
> the primary.
>

Okay, if we introduce a SIGHUP GUC like max_wal_level as you are
proposing, the above requirement will be fulfilled, right? The other
way is by API pg_activate_logical_decoding().

> > We can also have another API like the
> > one you originally proposed (pg_activate_logical_decoding) for the
> > ease of users. But the minimum requirement would be that one creates a
> > logical slot to enable logical decoding/replication.
>
> I think we want to avoid the runtime WAL level automatically decreased
> to 'replica' once all logical slots are removed, if users still want
> to execute logical decoding on only the standby. One idea is that if
> users enable logical decoding using pg_activate_logical_decoding(),
> the runtime WAL level doesn't decrease to 'replica' even if all
> logical slots are removed.
>

That makes sense. If we are using an API like
pg_activate_*/pg_deactivate_*, then why add an additional dependency
on the slots?

--
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

07 May, 22:35:33

On Tue, May 6, 2025 at 11:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 24, 2025 at 11:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Apr 24, 2025 at 5:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Apr 23, 2025 at 9:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > BTW, did we consider the idea to automatically transition to 'logical'
> > > > > when the first logical slot is created and transition back to
> > > > > 'replica' when last logical slot gets dropped? I see some ideas around
> > > > > this last time we discussed this topic.
> > > >
> > > > Yes. Bertrand pointed out that a drawback is that the primary server
> > > > needs to create a logical slot in order to execute logical decoding on
> > > > the standbys[1].
> > > >
> > >
> > > True, but if we want to avoid that, we can still keep 'logical' as
> > > wal_level for the ease of users.
> >
> > I think we'd like to cover the use case like where users start with
> > 'replica' on the primary and execute logical decoding on the standby
> > without neither creating a logical slot on the primary nor restarting
> > the primary.
> >
>
> Okay, if we introduce a SIGHUP GUC like max_wal_level as you are
> proposing, the above requirement will be fulfilled, right?

Right. Both the primary and the standby can increase WAL level to
'logical' without server restart nor creating a logical slot.

> The other
> way is by API pg_activate_logical_decoding().

Yes. This approach would be simpler than the current proposal as we
don't need other new infrastructure such as executing a task in the
background. However, we might want to note that wal_level value would
no longer show the actual runtime WAL level if the logical decoding is
activated via this API. Probably it's better to introduce a read-only
GUC, say runtime_wal_level, showing the actual WAL level. Also,
Ashutosh pointed out[1] before that cloud providers do not like
multiple ways of changing configuration esp. when they can not control
it. But I'm not sure this applies to the API as it's a SQL function
whose access privilege can be controlled.

>
> > > We can also have another API like the
> > > one you originally proposed (pg_activate_logical_decoding) for the
> > > ease of users. But the minimum requirement would be that one creates a
> > > logical slot to enable logical decoding/replication.
> >
> > I think we want to avoid the runtime WAL level automatically decreased
> > to 'replica' once all logical slots are removed, if users still want
> > to execute logical decoding on only the standby. One idea is that if
> > users enable logical decoding using pg_activate_logical_decoding(),
> > the runtime WAL level doesn't decrease to 'replica' even if all
> > logical slots are removed.
> >
>
> That makes sense. If we are using an API like
> pg_activate_*/pg_deactivate_*, then why add an additional dependency
> on the slots?

I thought that we need to remember how logical decoding got enabled
because otherwise even if we enable logical decoding using the API,
it's disabled to 'replica' if all logical slots get removed. So the
idea I mentioned above is that we somehow prevent logical decoding
from being disabled even if all logical slots are removed. If we're
using only these APIs to enable/disable logical decoding, we don't
need to add a dependency on the slots, although we probably want to
disallow disabling logical decoding if there is at least one active
logical slot.

Regards,

[1] https://www.postgresql.org/message-id/CAExHW5tyJrdjqKFQ%2BqDs8Yq3E_P1Fj_T4pwVW9WACmMznRtDuw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Amit Kapila

Date:

10 May, 10:00:39

On Thu, May 8, 2025 at 1:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, May 6, 2025 at 11:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 24, 2025 at 11:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Apr 24, 2025 at 5:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 23, 2025 at 9:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > BTW, did we consider the idea to automatically transition to 'logical'
> > > > > > when the first logical slot is created and transition back to
> > > > > > 'replica' when last logical slot gets dropped? I see some ideas around
> > > > > > this last time we discussed this topic.
> > > > >
> > > > > Yes. Bertrand pointed out that a drawback is that the primary server
> > > > > needs to create a logical slot in order to execute logical decoding on
> > > > > the standbys[1].
> > > > >
> > > >
> > > > True, but if we want to avoid that, we can still keep 'logical' as
> > > > wal_level for the ease of users.
> > >
> > > I think we'd like to cover the use case like where users start with
> > > 'replica' on the primary and execute logical decoding on the standby
> > > without neither creating a logical slot on the primary nor restarting
> > > the primary.
> > >
> >
> > Okay, if we introduce a SIGHUP GUC like max_wal_level as you are
> > proposing, the above requirement will be fulfilled, right?
>
> Right. Both the primary and the standby can increase WAL level to
> 'logical' without server restart nor creating a logical slot.
>
> > The other
> > way is by API pg_activate_logical_decoding().
>
> Yes. This approach would be simpler than the current proposal as we
> don't need other new infrastructure such as executing a task in the
> background.
>

Right, but to an extent, this is also similar to having a requirement
of a logical slot on the primary. Now, it seems to me that the point
you are trying to make is that to allow logical decoding on standby,
it is okay to ask users to use pg_activate_logical_decoding() on
primary, but it would be inconvenient to ask them to have a logical
slot on primary instead. If my understanding is correct, then why do
you think so? We recommend that users have a physical slot on primary
and use it via primary_slot_name on standby to control resource
removal, so why can't we ask them to have a logical slot on primary to
allow logical decoding on standby?

> However, we might want to note that wal_level value would
> no longer show the actual runtime WAL level if the logical decoding is
> activated via this API. Probably it's better to introduce a read-only
> GUC, say runtime_wal_level, showing the actual WAL level.
>

Yeah, we need some way to show the correct value. In one of the
previous emails on this thread, you mentioned that we can use
show_hook to show the correct value. I see that show_in_hot_standby()
uses in_memory value to show the correct value. Do you have something
like that in your mind?

BTW, what is your idea to preserve the state to allow logical decoding
across server restart when the user uses the API, do we want to
persist the state in some way, if so, how? OTOH, if we use the idea to
have a logical slot to allow decoding, then the presence of a logical
slot can tell us whether we need to enable the new state to allow
logical decoding after restart.

> Also,
> Ashutosh pointed out[1] before that cloud providers do not like
> multiple ways of changing configuration esp. when they can not control
> it. But I'm not sure this applies to the API as it's a SQL function
> whose access privilege can be controlled.
>

By multiple ways, do we mean to say that one way for users would be to
use the existing way (change wal_level to logical and restart server),
and the other way would be to use the new API (or have a logical
slot)? But won't similarly users have multiple ways to retain WAL for
standby servers (either by using wal_keep_size or by having a
primary_slot_name). The other example is that one can either manually
change postgresql.conf file or use ALTER SYSTEM to change it, and then
reloadthe  config or restart the server for the change to take effect.
There could be other similar examples as well if one tries to list all
such possibilities. I feel one should be concerned if we are trying to
make both wal_level GUC as SIGHUP, and also try to provide an API to
enable logical decoding.

> > >
> >
> > That makes sense. If we are using an API like
> > pg_activate_*/pg_deactivate_*, then why add an additional dependency
> > on the slots?
>
> I thought that we need to remember how logical decoding got enabled
> because otherwise even if we enable logical decoding using the API,
> it's disabled to 'replica' if all logical slots get removed. So the
> idea I mentioned above is that we somehow prevent logical decoding
> from being disabled even if all logical slots are removed. If we're
> using only these APIs to enable/disable logical decoding, we don't
> need to add a dependency on the slots, although we probably want to
> disallow disabling logical decoding if there is at least one active
> logical slot.
>

Yeah, this is a detail that should be discussed once we finalize the
API to enable logical decoding on both primary and standby without
restarting the primary server.

--
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Amit Kapila

Date:

10 May, 14:08:03

On Sat, May 10, 2025 at 1:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, May 10, 2025 at 12:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Right, but to an extent, this is also similar to having a requirement
> > of a logical slot on the primary. Now, it seems to me that the point
> > you are trying to make is that to allow logical decoding on standby,
> > it is okay to ask users to use pg_activate_logical_decoding() on
> > primary, but it would be inconvenient to ask them to have a logical
> > slot on primary instead. If my understanding is correct, then why do
> > you think so? We recommend that users have a physical slot on primary
> > and use it via primary_slot_name on standby to control resource
> > removal, so why can't we ask them to have a logical slot on primary to
> > allow logical decoding on standby?
>
> I was thinking of a simple use case where users do logical decoding
> from the physical standby. That is, the primary has a physical slot
> and the standby uses it via primary_slot_name, and the subscriber
> connects the standby server for logical replication with a logical
> slot on the standby. In this case, IIUC we need to require users to
> create a logical slot on the primary in order just to increase WAL
> level to 'logical', but it doesn't make sense to me. No one is going
> to use this logical slot and the primary ends up accumulating WALs.
>

Can we have a parameter like immediately_reserve in
create_logical_slot API, similar to what we have for physical slots?
We need to work out the details, but that should address the kind of
use case you are worried about, unless I am missing something.

--
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Dilip Kumar

Date:

18 May, 13:36:31

On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, May 10, 2025 at 7:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, May 10, 2025 at 1:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Sat, May 10, 2025 at 12:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Right, but to an extent, this is also similar to having a requirement
> > > > of a logical slot on the primary. Now, it seems to me that the point
> > > > you are trying to make is that to allow logical decoding on standby,
> > > > it is okay to ask users to use pg_activate_logical_decoding() on
> > > > primary, but it would be inconvenient to ask them to have a logical
> > > > slot on primary instead. If my understanding is correct, then why do
> > > > you think so? We recommend that users have a physical slot on primary
> > > > and use it via primary_slot_name on standby to control resource
> > > > removal, so why can't we ask them to have a logical slot on primary to
> > > > allow logical decoding on standby?
> > >
> > > I was thinking of a simple use case where users do logical decoding
> > > from the physical standby. That is, the primary has a physical slot
> > > and the standby uses it via primary_slot_name, and the subscriber
> > > connects the standby server for logical replication with a logical
> > > slot on the standby. In this case, IIUC we need to require users to
> > > create a logical slot on the primary in order just to increase WAL
> > > level to 'logical', but it doesn't make sense to me. No one is going
> > > to use this logical slot and the primary ends up accumulating WALs.
> > >
> >
> > Can we have a parameter like immediately_reserve in
> > create_logical_slot API, similar to what we have for physical slots?
> > We need to work out the details, but that should address the kind of
> > use case you are worried about, unless I am missing something.
>
> Interesting idea. One concern in my mind is that in the use case I
> mentioned above, users would need to carefully manage the extra
> logical slot to keep the logical decoding active. The logical decoding
> is deactivated on the standby as soon as users drop all logical slots
> on the primary.
>
> Also, with this idea of automatically increasing WAL level, do we want
> to keep the 'logical' WAL level? If so, it requires an extra step of
> creating a non-reserved logical slot on the primary in order for the
> standby to activate the logical decoding. On the other hand, we can
> also keep the 'logical' WAL level for the compatibility and for making
> the logical decoding enabled without the coordination of WAL level
> transition. But wal_level GUC parameter would no longer tell the
> actual WAL level to users when 'replica' + logical slots. Is it
> sufficient to provide a read-only GUC parameter, say
> effective_wal_level showing the actual WAL level being used?
>

Thanks for proposing the idea of making wal_level configurable at
runtime. But why isn't making the relevant GUCs SIGHUP-reloadable
sufficient?

For enabling logical replication, users are already familiar with the
wal_level and max_wal_senders settings. The main issue is that
changing them currently requires a server restart. If we can address
that by making the GUCs reloadable via SIGHUP, that might be enough.

On the other hand, if the goal is to make the behavior fully dynamic,
then we should go all the way, decouple it from wal_level. For
example, we could start logging the extra WAL needed for logical
decoding as soon as a logical slot is created, and stop once all
logical slots are dropped, even if wal_level is still set to logical.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Amit Kapila

Date:

19 May, 12:05:08

On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, May 10, 2025 at 7:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Can we have a parameter like immediately_reserve in
> > create_logical_slot API, similar to what we have for physical slots?
> > We need to work out the details, but that should address the kind of
> > use case you are worried about, unless I am missing something.
>
> Interesting idea. One concern in my mind is that in the use case I
> mentioned above, users would need to carefully manage the extra
> logical slot to keep the logical decoding active. The logical decoding
> is deactivated on the standby as soon as users drop all logical slots
> on the primary.
>

Yes, but the same is true for a physical slot in the case of physical
replication used via primary_slot_name parameter.

> Also, with this idea of automatically increasing WAL level, do we want
> to keep the 'logical' WAL level? If so, it requires an extra step of
> creating a non-reserved logical slot on the primary in order for the
> standby to activate the logical decoding. On the other hand, we can
> also keep the 'logical' WAL level for the compatibility and for making
> the logical decoding enabled without the coordination of WAL level
> transition.

Right, I also feel we should retain both ways to enable logical
replication at least initially. Once we get some feedback, we may
think of removing 'logical' as wal_level.

>  But wal_level GUC parameter would no longer tell the
> actual WAL level to users when 'replica' + logical slots.
>

Right.

> Is it
> sufficient to provide a read-only GUC parameter, say
> effective_wal_level showing the actual WAL level being used?
>

I am not so sure about how we want to communicate this to the user,
but I guess to start with, this is a good idea.

--
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Amit Kapila

Date:

21 May, 07:54:41

On Wed, May 21, 2025 at 12:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, May 19, 2025 at 2:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Sat, May 10, 2025 at 7:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Can we have a parameter like immediately_reserve in
> > > > create_logical_slot API, similar to what we have for physical slots?
> > > > We need to work out the details, but that should address the kind of
> > > > use case you are worried about, unless I am missing something.
> > >
> > > Interesting idea. One concern in my mind is that in the use case I
> > > mentioned above, users would need to carefully manage the extra
> > > logical slot to keep the logical decoding active. The logical decoding
> > > is deactivated on the standby as soon as users drop all logical slots
> > > on the primary.
> > >
> >
> > Yes, but the same is true for a physical slot in the case of physical
> > replication used via primary_slot_name parameter.
>
> Could you elaborate on this?
>

I am trying to correlate with the case where standby no longer needs
physical slot due to some reason like the standby machine failure, or
say someone uses pg_createsubscriber on standby to make it subscriber,
etc. In such a case, user needs to manually remove the physical slot
on primary. There is difference in both cases but the point is one may
need to manage physical slot as well.

>
> I recently had a discussion with Ashtosh at PGConf.dev regarding an
> alternative approach: introducing a new command syntax such as "ALTER
> SYSTEM UPDATE wal_level TO 'logical'". In his presentation[1], he
> outlined this proposed command as a means to modify specific GUC
> parameters synchronously. The backend executing this command would
> manage the transition, allowing users to interrupt the process via
> Ctrl-C if necessary. In the specific context of wal_level change, this
> command could be designed to reject operations like "ALTER SYSTEM
> UPDATE wal_level TO 'minimal'" with an error, effectively preventing
> undesirable wal_level transitions to or from 'minimal'. While this
> approach shares similarities with our previous proposal of
> implementing a dedicated SQL function for WAL level modifications, it
> offers a more standardized interface for users.
>
> Though I find merit in this proposal, I remain uncertain about its
> implementation details and whether it represents the optimal solution
> for online wal_level changes, particularly given that our current
> approach of automatic WAL level adjustment appears viable.
>

Yeah, I find the idea that the presence of a logical slot will allow
the user to enable logical decoding/replication more appealing than
this new alternative, leaving aside the challenges of realizing it.

--
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From

Masahiko Sawada

Date:

04 June, 04:10:44

On Tue, May 20, 2025 at 9:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, May 21, 2025 at 12:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, May 19, 2025 at 2:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Sat, May 10, 2025 at 7:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Can we have a parameter like immediately_reserve in
> > > > > create_logical_slot API, similar to what we have for physical slots?
> > > > > We need to work out the details, but that should address the kind of
> > > > > use case you are worried about, unless I am missing something.
> > > >
> > > > Interesting idea. One concern in my mind is that in the use case I
> > > > mentioned above, users would need to carefully manage the extra
> > > > logical slot to keep the logical decoding active. The logical decoding
> > > > is deactivated on the standby as soon as users drop all logical slots
> > > > on the primary.
> > > >
> > >
> > > Yes, but the same is true for a physical slot in the case of physical
> > > replication used via primary_slot_name parameter.
> >
> > Could you elaborate on this?
> >
>
> I am trying to correlate with the case where standby no longer needs
> physical slot due to some reason like the standby machine failure, or
> say someone uses pg_createsubscriber on standby to make it subscriber,
> etc. In such a case, user needs to manually remove the physical slot
> on primary. There is difference in both cases but the point is one may
> need to manage physical slot as well.

Thank you for clarifying this. I see your point.

> >
> > I recently had a discussion with Ashtosh at PGConf.dev regarding an
> > alternative approach: introducing a new command syntax such as "ALTER
> > SYSTEM UPDATE wal_level TO 'logical'". In his presentation[1], he
> > outlined this proposed command as a means to modify specific GUC
> > parameters synchronously. The backend executing this command would
> > manage the transition, allowing users to interrupt the process via
> > Ctrl-C if necessary. In the specific context of wal_level change, this
> > command could be designed to reject operations like "ALTER SYSTEM
> > UPDATE wal_level TO 'minimal'" with an error, effectively preventing
> > undesirable wal_level transitions to or from 'minimal'. While this
> > approach shares similarities with our previous proposal of
> > implementing a dedicated SQL function for WAL level modifications, it
> > offers a more standardized interface for users.
> >
> > Though I find merit in this proposal, I remain uncertain about its
> > implementation details and whether it represents the optimal solution
> > for online wal_level changes, particularly given that our current
> > approach of automatic WAL level adjustment appears viable.
> >
>
> Yeah, I find the idea that the presence of a logical slot will allow
> the user to enable logical decoding/replication more appealing than
> this new alternative, leaving aside the challenges of realizing it.

I've drafted this idea. Here are summary for attached two patches:

0001 patch allows us to create a logical slot without WAL reservation.

0002 patch is the main patch for dynamically enabling/disabling
logical decoding when wal_level is 'replica'. It's in PoC state and
has a lot of XXX comments. One thing I think we need to consider is
that since disabling the logical decoding needs to write a WAL record
for standbys and happens when dropping the last logical slot which
needs to write a WAL record for standbys, it's possible that we write
a WAL record in a process shutdown during the process exit (e.g.,
ReplicationSlotRelease() and ReplicationSlotCleanup() are called by
ReplicationSlotShmemExit()). It might be safe as long as we do that
during calling before_shmem_exit callback but I'm not sure there is a
chance to do that during calling on_shmem_exit callbacks. It would be
better to somehow lazily disable the logical decoding.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

On Wed, Jul 2, 2025 at 9:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jun 18, 2025 at 1:07 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Jun 18, 2025 at 6:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Thank you for the comments!
> > >
> > > >
> > > > 2)
> > > > I see that when primary switches back its effective wal_level to
> > > > replica while standby has wal_level=logical in conf file, then standby
> > > > has this status:
> > > >
> > > > postgres=# show wal_level;
> > > >  wal_level
> > > > -----------
> > > >  logical
> > > >
> > > > postgres=# show effective_wal_level;
> > > >  effective_wal_level
> > > > ---------------------
> > > >  replica
> > > >
> > > > Is this correct? Can effective_wal_level be < wal_level anytime? I
> > > > feel it can be greater but never lesser.
> > >
> > > Hmm, I think we need to define what value we should show in
> > > effective_wal_level on standbys because the standbys actually are not
> > > writing any WALs and whether or not the logical decoding is enabled on
> > > the standbys depends on the primary.
> > >
> > > In the previous version patch, the standby's effective_wal_level value
> > > depended solely on the standby's wal_level value. However, it was
> > > confusing in a sense because it's possible that the logical decoding
> > > could be available even though effective_wal_level is 'replica' if the
> > > primary already enables it. One idea is that given that the logical
> > > decoding availability and effective_wal_level value are independent in
> > > principle, it's better to provide a SQL function to get the logical
> > > decoding status so that users can check the logical decoding
> > > availability without checking effective_wal_level. With that function,
> > > it might make sense to revert back the behavior to the previous one.
> > > That is, on the primary the effective_wal_level value is always
> > > greater than or equal to wal_level whereas on the standbys it's always
> > > the same as wal_level, and users would be able to check the logical
> > > decoding availability using the SQL function. Or it might also be
> > > worth considering to show effective_wal_level as NULL on standbys.
> >
> > Yes, that is one idea. It will resolve the confusion.
> > But I was thinking, instead of having one new GUC + a SQL function,
> > can we have a GUC alone, which shows logical_decoding status plus the
> > cause of that. The new GUC will be applicable on both primary and
> > standby. As an example, let's say we name it as
> > logical_decoding_status, then it can have these values (
> > <status>_<cause>):
> >
> > enabled_wal_level_logical:                                  valid both
> > for primary, standby
> > enabled_effective_wal_level_logical:                   valid only for primary
> > enabled_cascaded_logical_decoding                   valid only for standby
> > disabled :
> >   valid both for primary, standby
> >
> > 'enabled_cascaded_logical_decoding'  will indicate that logical
> > decoding is enabled on standby (even when its own wal_level=replica)
> > as a cascaded effect from primary. It can be possible either due to
> > primary's wal_level=logical or logical slot being present on primary.
>
> I'm not sure it's a good idea to combine two values into one GUC
> because the tools would have to parse the string in order to know when
> they want to know either information.

Okay. Agreed.

> As for the effective_wal_level shown on the standby, if it shows the
> effective WAL level it might make sense to show as 'replica' even if
> the standby's wal_level is 'logical'

Alright. It depends on the definition we choose to assign to
effective_wal_level.

> because the standby cannot write
> any WAL and need to follow the primary.

When the standby’s wal_level is set to 'logical', the requirement for
logical decoding is already fulfilled. Or do you mean that the
effective_wal_level on standby should not be shown as logical until
both the primary and standby have wal_level set to logical and we also
have a logical slot present on standby?

> While it might be worth
> considering to accept the case of effective_wal_level (replica) <
> wal_level (logical) only on the standbys, we need to keep the
> principle that the logical decoding is available only when
> effective_wal_level = 'logical'.
>

Back to the previous question, when will the effective_wal_level be
displayed as 'logical' on standby? Which criterias need to be met?

thanks
Shveta