RE: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | OSCPR01MB14966E989331F1FA7AF06BD9BF53BA@OSCPR01MB14966.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
Dear Sawada-san, > > Assuming that logical_decoding written in the WAL is false here, and a logical > > replication slot is created just after that. In my experiments below happened: > > > > Let me clarify each step: > > > 1. startup process updated logical_decoding_enabled to false, at line 8652. > > I assume that logical_decoding_enabled was enabled before step 1. Right. Initially logical replication slot exist on both primary and standby. More detail; the standby slot was created by the slotsync worker. > > 2. slotsync worker started to sync. Surprisingly, it created a (second) logical > > slot and started logical decoding with fast_foward mode. > > I guess that the postmaster launched the slotsync worker before the > startup changes the status since logical decoding was enabled as I > mentioned above, which seems fine to me. As you said, the slotsync worker has already been launched when the status is changed. I felt logical slot should not be created after the status on the shared memory is changed. > > 3. startup invalidated logical slots due to the wal_level. the slot created at > > step2 was automatically dropped, because it was not sync-readly yet. > > 4. startup process shut down the slotsync worker. > > 5. start process read the STATUS_CHANGE record again, which has the value > "true". > > it requested to restart the sync worker. > > 6. restarted sync worker synchronize the slot again... > > > > For me it works well but it is bit a strange because 1) logical decoding is > > started even when effective_wal_level is false, > > I think it's a race condition between the postmaster and the startup, > it could happen even between the backend and the startup; the startup > disables logical decoding right after the backend passes > CheckLogicalDecodingRequirements() check. I think it's technically > okay since all WAL records before the STATUS_CHANGE should have the > logical information. Even if it starts to do logical decoding, it > would end up decoding the STATUS_CHANGE record and with an error (see > xlog_decode()). To clarify, are you thinking that it is no need to be fixed, because eventually the system becomes the appropriate state, right? > > and 2) the synced slot is > > dropped once with below message: > > > > ``` > > LOG: terminating process 1474448 to release replication slot "test2" > > DETAIL: Logical decoding on standby requires "wal_level" >= "logical" or at > least one logical slot on the primary server. > > CONTEXT: WAL redo at 0/030000B8 for > XLOG/LOGICAL_DECODING_STATUS_CHANGE: false > > ERROR: canceling statement due to conflict with recovery > > DETAIL: User was using a logical replication slot that must be invalidated. > > ``` > > > > Can we stop the sync worker before updating the status? IIUC this is one of the > > solution. > > I think it would lead to another race condition; the slotsync worker > can start again before updating the status. Hmm, okay. Another small comment: this data structure is not used in other files, no need to set extern. ``` extern LogicalDecodingCtlData *LogicalDecodingCtl; ``` Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: