Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAD21AoADdcgjRgH9kyJnAAe_dkoxPx8DKwGA0PfVx6_qjADz=g@mail.gmail.com Whole thread Raw |
In response to | RE: POC: enable logical decoding when wal_level = 'replica' without a server restart ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
List | pgsql-hackers |
On Tue, Jan 28, 2025 at 1:39 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Sawada-san, > > I love the idea. I've roughly tested the patch and worked on my env. > Here are initial comments... Thank you for looking at the patch! > > 1. xloglevelworker.c > ``` > +#include "replication/logicalxlog.h" > ``` > > xloglevelworker.c includes replication/logicalxlog.h, but it does not exist. > The line had to be removed to build and test it. > > 2. > ``` > +static void > +writeUpdateWalLevel(int new_wal_level) > +{ > + XLogBeginInsert(); > + XLogRegisterData((char *) (&new_wal_level), sizeof(bool)); > + XLogInsert(RM_XLOG_ID, XLOG_UPDATE_WAL_LEVEL); > +} > ``` > > IIUC the data length should be sizeof(int) instead of sizeof(bool). Agreed to fix them. > > 3. > Is there a reason why the process does not wait till the archiver exits? No. I didn't implement this part as the patch was just for proof-of-concept. I think it would be better to wait for it to exit. > > 4. > When I dumped wal files, I found that XLOG_UPDATE_WAL_LEVEL cannot be recognized: > > ``` > rmgr: XLOG len (rec/tot): 27/ 27, tx: 0, lsn: 0/03050838, prev 0/03050800, desc: UNKNOWN (f0) wal_levellogical > ``` > > xlog_identify() must be updated as well. Will fix. > > 5. > When I changed "logical" to "replica", postgres outputs like below: > > ``` > LOG: received SIGHUP, reloading configuration files > LOG: parameter "wal_level" changed to "replica" > LOG: wal_level control worker started > LOG: changing wal_level from "logical" to "replica" > LOG: wal_level has been decreased to "replica" > LOG: successfully changed wal_level from "logical" to "replica" > ``` > > ISTM that both postmaster and the wal_level control worker said something like > "wal_level changed", which is bit strange for me. Since GUC can't be renamed, > can we use another name for the wal_level control state? I'm concerned that users could be confused if two different names refer to substantially the same thing. Having said that, I guess that we need to drastically change the messages. For example, I think that the wal_level worker should say something like "successfully made 'logical' wal_level effective" instead of saying something like "changed wal_level value". Also, users might not need gradual messages when increasing 'minimal' to 'logical' or decreasing 'logical' to 'minimal'. > > 6. > With the patch present, the wal_level can be changed to the minimal even when the > streaming replication is going. If we do that, the walsender exits immediately and > the below FATAL appears periodically until the standby stops. Same things can be > said for the logical replication: > > ``` > FATAL: streaming replication receiver "walreceiver" could not connect to the primary server: > connection to server on socket "/tmp/.s.PGSQL.oooo" failed: > FATAL: WAL senders require "wal_level" to be "replica" or "logical > ``` > > I know this is not a perfect, but can we avoid the issue by reject the GUC update > if the walsender exists? Another approach is not to update the value when replication > slots need to be invalidated. Does it mean that we reject the config file from being reloaded in that case? I have no idea how to reject it in a case where the wal_level in postgresql.conf changed and the user did 'pg_ctl reload'. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: