Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAD21AoADdcgjRgH9kyJnAAe_dkoxPx8DKwGA0PfVx6_qjADz=g@mail.gmail.com
Whole thread Raw
In response to RE: POC: enable logical decoding when wal_level = 'replica' without a server restart  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Tue, Jan 28, 2025 at 1:39 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> I love the idea. I've roughly tested the patch and worked on my env.
> Here are initial comments...

Thank you for looking at the patch!

>
> 1. xloglevelworker.c
> ```
> +#include "replication/logicalxlog.h"
> ```
>
> xloglevelworker.c includes replication/logicalxlog.h, but it does not exist.
> The line had to be removed to build and test it.
>
> 2.
> ```
> +static void
> +writeUpdateWalLevel(int new_wal_level)
> +{
> +       XLogBeginInsert();
> +       XLogRegisterData((char *) (&new_wal_level), sizeof(bool));
> +       XLogInsert(RM_XLOG_ID, XLOG_UPDATE_WAL_LEVEL);
> +}
> ```
>
> IIUC the data length should be sizeof(int) instead of sizeof(bool).

Agreed to fix them.

>
> 3.
> Is there a reason why the process does not wait till the archiver exits?

No. I didn't implement this part as the patch was just for
proof-of-concept. I think it would be better to wait for it to exit.

>
> 4.
> When I dumped wal files, I found that XLOG_UPDATE_WAL_LEVEL cannot be recognized:
>
> ```
> rmgr: XLOG        len (rec/tot):     27/    27, tx:          0, lsn: 0/03050838, prev 0/03050800, desc: UNKNOWN (f0)
wal_levellogical 
> ```
>
> xlog_identify() must be updated as well.

Will fix.

>
> 5.
> When I changed "logical" to "replica", postgres outputs like below:
>
> ```
> LOG:  received SIGHUP, reloading configuration files
> LOG:  parameter "wal_level" changed to "replica"
> LOG:  wal_level control worker started
> LOG:  changing wal_level from "logical" to "replica"
> LOG:  wal_level has been decreased to "replica"
> LOG:  successfully changed wal_level from "logical" to "replica"
> ```
>
> ISTM that both postmaster and the wal_level control worker said something like
> "wal_level changed", which is bit strange for me. Since GUC can't be renamed,
> can we use another name for the wal_level control state?

I'm concerned that users could be confused if two different names
refer to substantially the same thing.

Having said that, I guess that we need to drastically change the
messages. For example, I think that the wal_level worker should say
something like "successfully made 'logical' wal_level effective"
instead of saying something like "changed wal_level value". Also,
users might not need gradual messages when increasing 'minimal' to
'logical' or decreasing 'logical' to 'minimal'.

>
> 6.
> With the patch present, the wal_level can be changed to the minimal even when the
> streaming replication is going. If we do that, the walsender exits immediately and
> the below FATAL appears periodically until the standby stops. Same things can be
> said for the logical replication:
>
> ```
> FATAL:  streaming replication receiver "walreceiver" could not connect to the primary server:
> connection to server on socket "/tmp/.s.PGSQL.oooo" failed:
> FATAL:  WAL senders require "wal_level" to be "replica" or "logical
> ```
>
> I know this is not a perfect, but can we avoid the issue by reject the GUC update
> if the walsender exists? Another approach is not to update the value when replication
> slots need to be invalidated.

Does it mean that we reject the config file from being reloaded in
that case? I have no idea how to reject it in a case where the
wal_level in postgresql.conf changed and the user did 'pg_ctl reload'.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: Logging parallel worker draught
Next
From: Peter Geoghegan
Date:
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree