Thread: Exit walsender before confirming remote flush in logical replication
Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear hackers, (I added Amit as CC because we discussed in another thread) This is a fork thread from time-delayed logical replication [1]. While discussing, we thought that we could extend the condition of walsender shutdown[2][3]. Currently, walsenders delay the shutdown request until confirming all sent data are flushed on remote side. This condition was added in 985bd7[4], which is for supporting clean switchover. Supposing that there is a primary-secondary physical replication system, and do following steps. If any changes are come while step 2 but the walsender does not confirm the remote flush, the reboot in step 3 may be failed. 1. Stops primary server. 2. Promotes secondary to new primary. 3. Reboot (old)primary as new secondary. In case of logical replication, however, we cannot support the use-case that switches the role publisher <-> subscriber. Suppose same case as above, additional transactions are committed while doing step2. To catch up such changes subscriber must receive WALs related with trans, but it cannot be done because subscriber cannot request WALs from the specific position. In the case, we must truncate all data in new subscriber once, and then create new subscription with copy_data = true. Therefore, I think that we can ignore the condition for shutting down the walsender in logical replication. This change may be useful for time-delayed logical replication. The walsender waits the shutdown until all changes are applied on subscriber, even if it is delayed. This causes that publisher cannot be stopped if large delay-time is specified. PSA the minimal patch for that. I'm not sure whether WalSndCaughtUp should be also omitted or not. It seems that changes may affect other parts like WalSndWaitForWal(), but we can investigate more about it. [1]: https://commitfest.postgresql.org/41/3581/ [2]: https://www.postgresql.org/message-id/TYAPR01MB58661BA3BF38E9798E59AE14F5E19%40TYAPR01MB5866.jpnprd01.prod.outlook.com [3]: https://www.postgresql.org/message-id/CAA4eK1LyetktcphdRrufHac4t5DGyhsS2xG2DSOGb7OaOVcDVg%40mail.gmail.com [4]: https://github.com/postgres/postgres/commit/985bd7d49726c9f178558491d31a570d47340459 Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > (I added Amit as CC because we discussed in another thread) > > This is a fork thread from time-delayed logical replication [1]. > While discussing, we thought that we could extend the condition of walsender shutdown[2][3]. > > Currently, walsenders delay the shutdown request until confirming all sent data > are flushed on remote side. This condition was added in 985bd7[4], which is for > supporting clean switchover. Supposing that there is a primary-secondary > physical replication system, and do following steps. If any changes are come > while step 2 but the walsender does not confirm the remote flush, the reboot in > step 3 may be failed. > > 1. Stops primary server. > 2. Promotes secondary to new primary. > 3. Reboot (old)primary as new secondary. > > In case of logical replication, however, we cannot support the use-case that > switches the role publisher <-> subscriber. Suppose same case as above, additional > transactions are committed while doing step2. To catch up such changes subscriber > must receive WALs related with trans, but it cannot be done because subscriber > cannot request WALs from the specific position. In the case, we must truncate all > data in new subscriber once, and then create new subscription with copy_data > = true. > > Therefore, I think that we can ignore the condition for shutting down the > walsender in logical replication. > > This change may be useful for time-delayed logical replication. The walsender > waits the shutdown until all changes are applied on subscriber, even if it is > delayed. This causes that publisher cannot be stopped if large delay-time is > specified. I think the current behaviour is an artifact of using the same WAL sender code for both logical and physical replication. I agree with you that the logical WAL sender need not wait for all the WAL to be replayed downstream. I have not reviewed the patch though. -- Best Wishes, Ashutosh Bapat
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
At Thu, 22 Dec 2022 17:29:34 +0530, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote in > On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > In case of logical replication, however, we cannot support the use-case that > > switches the role publisher <-> subscriber. Suppose same case as above, additional .. > > Therefore, I think that we can ignore the condition for shutting down the > > walsender in logical replication. ... > > This change may be useful for time-delayed logical replication. The walsender > > waits the shutdown until all changes are applied on subscriber, even if it is > > delayed. This causes that publisher cannot be stopped if large delay-time is > > specified. > > I think the current behaviour is an artifact of using the same WAL > sender code for both logical and physical replication. Yeah, I don't think we do that for the reason of switchover. On the other hand I think the behavior was intentionally taken over since it is thought as sensible alone. And I'm afraind that many people already relies on that behavior. > I agree with you that the logical WAL sender need not wait for all the > WAL to be replayed downstream. Thus I feel that it might be a bit outrageous to get rid of that bahavior altogether because of a new feature stumbling on it. I'm fine doing that only in the apply_delay case, though. However, I have another concern that we are introducing the second exception for XLogSendLogical in the common path. I doubt that anyone wants to use synchronous logical replication with apply_delay since the sender transaction is inevitablly affected back by that delay. Thus how about before entering an apply_delay, logrep worker sending a kind of crafted feedback, which reports commit_data.end_lsn as flushpos? A little tweak is needed in send_feedback() but seems to work.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Dec 23, 2022 at 7:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Thu, 22 Dec 2022 17:29:34 +0530, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote in > > On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu) > > <kuroda.hayato@fujitsu.com> wrote: > > > In case of logical replication, however, we cannot support the use-case that > > > switches the role publisher <-> subscriber. Suppose same case as above, additional > .. > > > Therefore, I think that we can ignore the condition for shutting down the > > > walsender in logical replication. > ... > > > This change may be useful for time-delayed logical replication. The walsender > > > waits the shutdown until all changes are applied on subscriber, even if it is > > > delayed. This causes that publisher cannot be stopped if large delay-time is > > > specified. > > > > I think the current behaviour is an artifact of using the same WAL > > sender code for both logical and physical replication. > > Yeah, I don't think we do that for the reason of switchover. On the > other hand I think the behavior was intentionally taken over since it > is thought as sensible alone. > Do you see it was discussed somewhere? If so, can you please point to that discussion? > And I'm afraind that many people already > relies on that behavior. > But OTOH, it can also be annoying for users to see some wait during the shutdown which is actually not required. > > I agree with you that the logical WAL sender need not wait for all the > > WAL to be replayed downstream. > > Thus I feel that it might be a bit outrageous to get rid of that > bahavior altogether because of a new feature stumbling on it. I'm > fine doing that only in the apply_delay case, though. However, I have > another concern that we are introducing the second exception for > XLogSendLogical in the common path. > > I doubt that anyone wants to use synchronous logical replication with > apply_delay since the sender transaction is inevitablly affected back > by that delay. > > Thus how about before entering an apply_delay, logrep worker sending a > kind of crafted feedback, which reports commit_data.end_lsn as > flushpos? A little tweak is needed in send_feedback() but seems to > work.. > How can we send commit_data.end_lsn before actually committing the xact? I think this can lead to a problem because next time (say after restart of walsender) server can skip sending the xact even if it is not committed by the client. -- With Regards, Amit Kapila.
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Horiguchi-san, > Thus how about before entering an apply_delay, logrep worker sending a > kind of crafted feedback, which reports commit_data.end_lsn as > flushpos? A little tweak is needed in send_feedback() but seems to > work.. Thanks for replying! I tested your saying but it could not work well... I made PoC based on the latest time-delayed patches [1] for non-streaming case. Apply workers that are delaying applications send begin_data.final_lsn as recvpos and flushpos in send_feedback(). Followings were contents of the feedback message I got, and we could see that recv and flush were overwritten. ``` DEBUG: sending feedback (force 1) to recv 0/1553638, write 0/1553550, flush 0/1553638 CONTEXT: processing remote data for replication origin "pg_16390" during message type "BEGIN" in transaction 730, finishedat 0/1553638 ``` In terms of walsender, however, sentPtr seemed to be slightly larger than flushed position on subscriber. ``` (gdb) p MyWalSnd->sentPtr $2 = 22361760 (gdb) p MyWalSnd->flush $3 = 22361656 (gdb) p *MyWalSnd $4 = {pid = 28807, state = WALSNDSTATE_STREAMING, sentPtr = 22361760, needreload = false, write = 22361656, flush = 22361656, apply = 22361424, writeLag = 20020343, flushLag = 20020343, applyLag = 20020343, sync_standby_priority = 0, mutex = 0 '\000', latch = 0x7ff0350cbb94, replyTime = 725113263592095} ``` Therefore I could not shut down the publisher node when applications were delaying. Do you have any opinions about them? ``` $ pg_ctl stop -D data_pub/ waiting for server to shut down............................................................... failed pg_ctl: server does not shut down ``` [1]: https://www.postgresql.org/message-id/TYCPR01MB83730A3E21E921335F6EFA38EDE89@TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Horiguchi-san, > > Thus how about before entering an apply_delay, logrep worker sending a > > kind of crafted feedback, which reports commit_data.end_lsn as > > flushpos? A little tweak is needed in send_feedback() but seems to > > work.. > > Thanks for replying! I tested your saying but it could not work well... > > I made PoC based on the latest time-delayed patches [1] for non-streaming case. > Apply workers that are delaying applications send begin_data.final_lsn as recvpos > and flushpos in send_feedback(). Maybe I misunderstood what you said... I have also found that sentPtr is not the actual sent position, but the starting point of the next WAL. You can see the comment below. ``` /* * How far have we sent WAL already? This is also advertised in * MyWalSnd->sentPtr. (Actually, this is the next WAL location to send.) */ static XLogRecPtr sentPtr = InvalidXLogRecPtr; ``` We must use end_lsn for crafting messages to cheat the walsender, but such records are included in COMMIT, not in BEGIN for the non-streaming case. But if workers are delayed in apply_handle_commit(), will they hold locks for database objects for a long time and it causes another issue. Best Regards, Hayato Kuroda FUJITSU LIMITED
On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > In case of logical replication, however, we cannot support the use-case that > switches the role publisher <-> subscriber. Suppose same case as above, additional > transactions are committed while doing step2. To catch up such changes subscriber > must receive WALs related with trans, but it cannot be done because subscriber > cannot request WALs from the specific position. In the case, we must truncate all > data in new subscriber once, and then create new subscription with copy_data > = true. > > Therefore, I think that we can ignore the condition for shutting down the > walsender in logical replication. > +1 for the idea. - * Note that if we determine that there's still more data to send, this - * function will return control to the caller. + * Note that if we determine that there's still more data to send or we are in + * the physical replication more, this function will return control to the + * caller. I think in this comment you meant to say 1. "or we are in physical replication mode and all WALs are not yet replicated" 2. Typo /replication more/replication mode -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Dilip, Thanks for checking my proposal! > - * Note that if we determine that there's still more data to send, this > - * function will return control to the caller. > + * Note that if we determine that there's still more data to send or we are in > + * the physical replication more, this function will return control to the > + * caller. > > I think in this comment you meant to say > > 1. "or we are in physical replication mode and all WALs are not yet replicated" > 2. Typo /replication more/replication mode Firstly I considered 2, but I thought 1 seemed to be better. PSA the updated patch. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Tue, Dec 27, 2022 at 1:44 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Thanks for checking my proposal! > > > - * Note that if we determine that there's still more data to send, this > > - * function will return control to the caller. > > + * Note that if we determine that there's still more data to send or we are in > > + * the physical replication more, this function will return control to the > > + * caller. > > > > I think in this comment you meant to say > > > > 1. "or we are in physical replication mode and all WALs are not yet replicated" > > 2. Typo /replication more/replication mode > > Firstly I considered 2, but I thought 1 seemed to be better. > PSA the updated patch. > I think even for logical replication we should check whether there is any pending WAL (via pq_is_send_pending()) to be sent. Otherwise, what is the point to send the done message? Also, the caller of WalSndDone() already has that check which is another reason why I can't see why you didn't have the same check in function WalSndDone(). BTW, even after fixing this, I think logical replication will behave differently when due to some reason (like time-delayed replication) send buffer gets full and walsender is not able to send data. I think this will be less of an issue with physical replication because there is a separate walreceiver process to flush the WAL which doesn't wait but the same is not true for logical replication. Do you have any thoughts on this matter? -- With Regards, Amit Kapila.
On Tue, Dec 27, 2022 at 2:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 27, 2022 at 1:44 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Thanks for checking my proposal! > > > > > - * Note that if we determine that there's still more data to send, this > > > - * function will return control to the caller. > > > + * Note that if we determine that there's still more data to send or we are in > > > + * the physical replication more, this function will return control to the > > > + * caller. > > > > > > I think in this comment you meant to say > > > > > > 1. "or we are in physical replication mode and all WALs are not yet replicated" > > > 2. Typo /replication more/replication mode > > > > Firstly I considered 2, but I thought 1 seemed to be better. > > PSA the updated patch. > > > > I think even for logical replication we should check whether there is > any pending WAL (via pq_is_send_pending()) to be sent. Otherwise, what > is the point to send the done message? Also, the caller of > WalSndDone() already has that check which is another reason why I > can't see why you didn't have the same check in function WalSndDone(). > > BTW, even after fixing this, I think logical replication will behave > differently when due to some reason (like time-delayed replication) > send buffer gets full and walsender is not able to send data. I think > this will be less of an issue with physical replication because there > is a separate walreceiver process to flush the WAL which doesn't wait > but the same is not true for logical replication. Do you have any > thoughts on this matter? > In logical replication, it can happen today as well without time-delayed replication. Basically, say apply worker is waiting to acquire some lock that is already acquired by some backend then it will have the same behavior. I have not verified this, so you may want to check it once. -- With Regards, Amit Kapila.
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, > > Firstly I considered 2, but I thought 1 seemed to be better. > > PSA the updated patch. > > > > I think even for logical replication we should check whether there is > any pending WAL (via pq_is_send_pending()) to be sent. Otherwise, what > is the point to send the done message? Also, the caller of > WalSndDone() already has that check which is another reason why I > can't see why you didn't have the same check in function WalSndDone(). I did not have strong opinion around here. Fixed. > BTW, even after fixing this, I think logical replication will behave > differently when due to some reason (like time-delayed replication) > send buffer gets full and walsender is not able to send data. I think > this will be less of an issue with physical replication because there > is a separate walreceiver process to flush the WAL which doesn't wait > but the same is not true for logical replication. Do you have any > thoughts on this matter? Yes, it may happen even if this work is done. And your analysis is correct that the receive buffer is rarely full in physical replication because walreceiver immediately flush WALs. I think this is an architectural problem. Maybe we have assumed that the decoded WALs are consumed in as short time. I do not have good idea, but one approach is introducing a new process logical-walreceiver. It will record the decoded WALs to the persistent storage and workers consume and then remove them. It may have huge impact for other features and should not be accepted... Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, > In logical replication, it can happen today as well without > time-delayed replication. Basically, say apply worker is waiting to > acquire some lock that is already acquired by some backend then it > will have the same behavior. I have not verified this, so you may want > to check it once. Right, I could reproduce the scenario with following steps. 1. Construct pub -> sub logical replication system with streaming = off. 2. Define a table on both nodes. ``` CREATE TABLE tbl (id int PRIMARY KEY); ``` 3. Execute concurrent transactions. Tx-1 (on subscriber) BEGIN; INSERT INTO tbl SELECT i FROM generate_series(1, 5000) s(i); Tx-2 (on publisher) INSERT INTO tbl SELECT i FROM generate_series(1, 5000) s(i); 4. Try to shutdown publisher but it will be failed. ``` $ pg_ctl stop -D publisher waiting for server to shut down............................................................... failed pg_ctl: server does not shut down ``` Best Regards, Hayato Kuroda FUJITSU LIMITED
On Wed, Dec 28, 2022 at 8:19 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > In logical replication, it can happen today as well without > > time-delayed replication. Basically, say apply worker is waiting to > > acquire some lock that is already acquired by some backend then it > > will have the same behavior. I have not verified this, so you may want > > to check it once. > > Right, I could reproduce the scenario with following steps. > > 1. Construct pub -> sub logical replication system with streaming = off. > 2. Define a table on both nodes. > > ``` > CREATE TABLE tbl (id int PRIMARY KEY); > ``` > > 3. Execute concurrent transactions. > > Tx-1 (on subscriber) > BEGIN; > INSERT INTO tbl SELECT i FROM generate_series(1, 5000) s(i); > > Tx-2 (on publisher) > INSERT INTO tbl SELECT i FROM generate_series(1, 5000) s(i); > > 4. Try to shutdown publisher but it will be failed. > > ``` > $ pg_ctl stop -D publisher > waiting for server to shut down............................................................... failed > pg_ctl: server does not shut down > ``` Thanks for the verification. BTW, do you think we should document this either with time-delayed replication or otherwise unless this is already documented? Another thing we can investigate here why do we need to ensure that there is no pending send before shutdown. -- With Regards, Amit Kapila.
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, > Thanks for the verification. BTW, do you think we should document this > either with time-delayed replication or otherwise unless this is > already documented? I think this should be documented at "Shutting Down the Server" section in runtime.sgml or logical-replicaiton.sgml, but I cannot find. It will be included in next version. > Another thing we can investigate here why do we need to ensure that > there is no pending send before shutdown. I have not done yet about it, will continue next year. It seems that walsenders have been sending all data before shutting down since ea5516, e0b581 and 754baa. There were many threads related with streaming replication, so I could not pin the specific message that written in the commit message of ea5516. I have also checked some wiki pages [1][2], but I could not find any design about it. [1]: https://wiki.postgresql.org/wiki/Streaming_Replication [2]: https://wiki.postgresql.org/wiki/Synchronous_Replication_9/2010_Proposal Best Regards, Hayato Kuroda FUJITSU LIMITED
On Wed, Dec 28, 2022 at 8:18 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Amit, > > > > Firstly I considered 2, but I thought 1 seemed to be better. > > > PSA the updated patch. > > > > > > > I think even for logical replication we should check whether there is > > any pending WAL (via pq_is_send_pending()) to be sent. Otherwise, what > > is the point to send the done message? Also, the caller of > > WalSndDone() already has that check which is another reason why I > > can't see why you didn't have the same check in function WalSndDone(). > > I did not have strong opinion around here. Fixed. > > > BTW, even after fixing this, I think logical replication will behave > > differently when due to some reason (like time-delayed replication) > > send buffer gets full and walsender is not able to send data. I think > > this will be less of an issue with physical replication because there > > is a separate walreceiver process to flush the WAL which doesn't wait > > but the same is not true for logical replication. Do you have any > > thoughts on this matter? > > Yes, it may happen even if this work is done. And your analysis is correct that > the receive buffer is rarely full in physical replication because walreceiver > immediately flush WALs. > Okay, but what happens in the case of physical replication when synchronous_commit = remote_apply? In that case, won't it ensure that apply has also happened? If so, then shouldn't the time delay feature also cause a similar problem for physical replication as well? -- With Regards, Amit Kapila.
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
At Wed, 28 Dec 2022 09:15:41 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in > > Another thing we can investigate here why do we need to ensure that > > there is no pending send before shutdown. > > I have not done yet about it, will continue next year. > It seems that walsenders have been sending all data before shutting down since ea5516, > e0b581 and 754baa. > There were many threads related with streaming replication, so I could not pin > the specific message that written in the commit message of ea5516. > > I have also checked some wiki pages [1][2], but I could not find any design about it. > > [1]: https://wiki.postgresql.org/wiki/Streaming_Replication > [2]: https://wiki.postgresql.org/wiki/Synchronous_Replication_9/2010_Proposal If I'm grabbing the discussion here correctly, in my memory, it is because: physical replication needs all records that have written on primary are written on standby for switchover to succeed. It is annoying that normal shutdown occasionally leads to switchover failure. Thus WalSndDone explicitly waits for remote flush/write regardless of the setting of synchronous_commit. Thus apply delay doesn't affect shutdown (AFAICS), and that is sufficient since all the records will be applied at the next startup. In logical replication apply preceeds write and flush so we have no indication whether a record is "replicated" to standby by other than apply LSN. On the other hand, logical recplication doesn't have a business with switchover so that assurarance is useless. Thus I think we can (practically) ignore apply_lsn at shutdown. It seems subtly irregular, though. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
At Fri, 13 Jan 2023 16:41:08 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > Okay, but what happens in the case of physical replication when > synchronous_commit = remote_apply? In that case, won't it ensure that > apply has also happened? If so, then shouldn't the time delay feature > also cause a similar problem for physical replication as well? As written in another mail, WalSndDone doesn't honor synchronous_commit. In other words, AFAIS walsender finishes not waiting remote_apply. The unapplied recods will be applied at the next startup. I didn't confirmed that behavior for myself, though.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Horiguchi-san, Amit, > At Fri, 13 Jan 2023 16:41:08 +0530, Amit Kapila <amit.kapila16@gmail.com> > wrote in > > Okay, but what happens in the case of physical replication when > > synchronous_commit = remote_apply? In that case, won't it ensure that > > apply has also happened? If so, then shouldn't the time delay feature > > also cause a similar problem for physical replication as well? > > As written in another mail, WalSndDone doesn't honor > synchronous_commit. In other words, AFAIS walsender finishes not > waiting remote_apply. The unapplied recods will be applied at the > next startup. > > I didn't confirmed that behavior for myself, though.. If Amit wanted to say about the case that sending data is pending in physical replication, the walsender cannot stop. But this is not related with the synchronous_commit: it is caused because it must sweep all pending data before shutting down. We can reproduce the situation with: 1. build streaming replication system 2. kill -STOP $walreceiver 3. insert data to primary server 4. try to stop the primary server If what you said was not related with pending data, walsender can be stopped even if the synchronous_commit = remote_apply. As Horiguchi-san said, such a condition is not written in WalSndDone() [1]. I think the parameter synchronous_commit does not affect walsender process so well. It just define when backend returns the result to client. I could check by following steps: 1. built streaming replication system. PSA the script to follow that. Primary config. ``` synchronous_commit = 'remote_apply' synchronous_standby_names = 'secondary' ``` Secondary config. ``` recovery_min_apply_delay = 1d primary_conninfo = 'user=postgres port=$port_N1 application_name=secondary' hot_standby = on ``` 2. inserted data to primary. This waited the remote apply psql -U postgres -p $port_primary -c "INSERT INTO tbl SELECT generate_series(1, 5000)" 3. Stopped the primary server from another terminal. It could be done. The terminal on step2 said like: ``` WARNING: canceling the wait for synchronous replication and terminating connection due to administrator command DETAIL: The transaction has already committed locally, but might not have been replicated to the standby. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost ``` [1]: https://github.com/postgres/postgres/blob/master/src/backend/replication/walsender.c#L3121 Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Horiguchi-san, > If I'm grabbing the discussion here correctly, in my memory, it is > because: physical replication needs all records that have written on > primary are written on standby for switchover to succeed. It is > annoying that normal shutdown occasionally leads to switchover > failure. Thus WalSndDone explicitly waits for remote flush/write > regardless of the setting of synchronous_commit. AFAIK the condition (sentPtr == replicatedPtr) seemed to be introduced for the purpose[1]. You meant to say that the conditon (!pq_is_send_pending()) has same motivation, right? > Thus apply delay > doesn't affect shutdown (AFAICS), and that is sufficient since all the > records will be applied at the next startup. I was not clear the word "next startup", but I agreed that we can shut down the walsender in case of recovery_min_apply_delay > 0 and synchronous_commit = remote_apply. The startup process will be not terminated even if the primary crashes, so I think the process will apply the transaction sooner or later. > In logical replication apply preceeds write and flush so we have no > indication whether a record is "replicated" to standby by other than > apply LSN. On the other hand, logical recplication doesn't have a > business with switchover so that assurarance is useless. Thus I think > we can (practically) ignore apply_lsn at shutdown. It seems subtly > irregular, though. Another consideration is that the condition (!pq_is_send_pending()) ensures that there are no pending messages, including other packets. Currently we force walsenders to clean up all messages before shutting down, even if it is a keepalive one. I cannot have any problems caused by this, but I can keep the condition in case of logical replication. I updated the patch accordingly. Also, I found that the previous version did not work well in case of streamed transactions. When a streamed transaction is committed on publisher but the application is delayed on subscriber, the process sometimes waits until there is no pending write. This is done in ProcessPendingWrites(). I added another termination path in the function. [1]: https://github.com/postgres/postgres/commit/985bd7d49726c9f178558491d31a570d47340459 Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Mon, Jan 16, 2023 at 4:39 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > In logical replication apply preceeds write and flush so we have no > > indication whether a record is "replicated" to standby by other than > > apply LSN. On the other hand, logical recplication doesn't have a > > business with switchover so that assurarance is useless. Thus I think > > we can (practically) ignore apply_lsn at shutdown. It seems subtly > > irregular, though. > > Another consideration is that the condition (!pq_is_send_pending()) ensures that > there are no pending messages, including other packets. Currently we force walsenders > to clean up all messages before shutting down, even if it is a keepalive one. > I cannot have any problems caused by this, but I can keep the condition in case of > logical replication. > Let me try to summarize the discussion till now. The problem we are trying to solve here is to allow a shutdown to complete when walsender is not able to send the entire WAL. Currently, in such cases, the shutdown fails. As per our current understanding, this can happen when (a) walreceiver/walapply process is stuck (not able to receive more WAL) due to locks or some other reason; (b) a long time delay has been configured to apply the WAL (we don't yet have such a feature for logical replication but the discussion for same is in progress). Both reasons mostly apply to logical replication because there is no separate walreceiver process whose job is to just flush the WAL. In logical replication, the process that receives the WAL also applies it. So, while applying it can stuck for a long time waiting for some heavy-weight lock to be released by some other long-running transaction by the backend. Similarly, if the user has configured a large value of time-delayed apply, it can lead to a network buffer full between walsender and receive/process. The condition to allow the shutdown to wait for all WAL to be sent has two parts: (a) it confirms that there is no pending WAL to be sent; (b) it confirms all the WAL sent has been flushed by the client. As per our understanding, both these conditions are to allow clean switchover/failover which seems to be useful only for physical replication. The logical replication doesn't provide such functionality. The proposed patch tries to eliminate condition (b) for logical replication in the hopes that the same will allow the shutdown to be complete in most cases. There is no specific reason discussed to not do (a) for logical replication. Now, to proceed here we have the following options: (1) Fix (b) as proposed by the patch and document the risks related to (a); (2) Fix both (a) and (b); (3) Do nothing and document that users need to unblock the subscribers to complete the shutdown. Thoughts? -- With Regards, Amit Kapila.
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, hackers, > Let me try to summarize the discussion till now. The problem we are > trying to solve here is to allow a shutdown to complete when walsender > is not able to send the entire WAL. Currently, in such cases, the > shutdown fails. As per our current understanding, this can happen when > (a) walreceiver/walapply process is stuck (not able to receive more > WAL) due to locks or some other reason; (b) a long time delay has been > configured to apply the WAL (we don't yet have such a feature for > logical replication but the discussion for same is in progress). Thanks for summarizing. While analyzing stuck, I noticed that there are two types of shutdown failures. They could be characterized by the back trace. They are shown at the bottom. Type i) The walsender executes WalSndDone(), but cannot satisfy the condition. It means that all WALs have been sent to the subscriber but have not flushed; sentPtr is not the same as replicatedPtr. This stuck can happen when the delayed transaction is small or streamed. Type ii) The walsender cannot execute WalSndDone(), stacks at ProcessPendingWrites(). It means that when the send buffer becomes full while replicating a transaction; pq_is_send_pending() returns true and the walsender cannot break the loop. This stuck can happen when the delayed transaction is large, but it is not a streamed one. If we choose modification (1), we can only fix type (i) because pending WALs cause the failure. IIUC if we want to shut down walsender processes even if (ii), we must choose (2) and additional fixes are needed. Based on the above, I prefer modification (2) because it can rescue more cases. Thoughts? PSA the patch for it. It is almost the same as the previous version, but the comments are updated. Appendinx: The backtrace for type i) ``` #0 WalSndDone (send_data=0x87f825 <XLogSendLogical>) at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:3111 #1 0x000000000087ed1d in WalSndLoop (send_data=0x87f825 <XLogSendLogical>) at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:2525 #2 0x000000000087d40a in StartLogicalReplication (cmd=0x1f49030) at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:1320 #3 0x000000000087df29 in exec_replication_command ( cmd_string=0x1f15498 "START_REPLICATION SLOT \"sub\" LOGICAL 0/0 (proto_version '4', streaming 'on', origin 'none', publication_names'\"pub\"')") at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:1830 #4 0x000000000091b032 in PostgresMain (dbname=0x1f4c938 "postgres", username=0x1f4c918 "postgres") at ../../PostgreSQL-Source-Dev/src/backend/tcop/postgres.c:4561 #5 0x000000000085390b in BackendRun (port=0x1f3d0b0) at ../../PostgreSQL-Source-Dev/src/backend/postmaster/postmaster.c:4437 #6 0x000000000085322c in BackendStartup (port=0x1f3d0b0) at ../../PostgreSQL-Source-Dev/src/backend/postmaster/postmaster.c:4165 #7 0x000000000084f7a2 in ServerLoop () at ../../PostgreSQL-Source-Dev/src/backend/postmaster/postmaster.c:1762 #8 0x000000000084f0a2 in PostmasterMain (argc=3, argv=0x1f0ff30) at ../../PostgreSQL-Source-Dev/src/backend/postmaster/postmaster.c:1452 #9 0x000000000074a4d6 in main (argc=3, argv=0x1f0ff30) at ../../PostgreSQL-Source-Dev/src/backend/main/main.c:200 ``` The backtrace for type ii) ``` #0 ProcessPendingWrites () at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:1438 #1 0x000000000087d635 in WalSndWriteData (ctx=0x1429ce8, lsn=22406440, xid=731, last_write=true) at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:1405 #2 0x0000000000888420 in OutputPluginWrite (ctx=0x1429ce8, last_write=true) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/logical.c:669 #3 0x00007f022dfe43a7 in pgoutput_change (ctx=0x1429ce8, txn=0x1457d40, relation=0x7f0245075268, change=0x1460ef8) at ../../PostgreSQL-Source-Dev/src/backend/replication/pgoutput/pgoutput.c:1491 #4 0x0000000000889125 in change_cb_wrapper (cache=0x142bcf8, txn=0x1457d40, relation=0x7f0245075268, change=0x1460ef8) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/logical.c:1077 #5 0x000000000089507c in ReorderBufferApplyChange (rb=0x142bcf8, txn=0x1457d40, relation=0x7f0245075268, change=0x1460ef8,streaming=false) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/reorderbuffer.c:1969 #6 0x0000000000895866 in ReorderBufferProcessTXN (rb=0x142bcf8, txn=0x1457d40, commit_lsn=23060624, snapshot_now=0x1440150,command_id=0, streaming=false) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/reorderbuffer.c:2245 #7 0x0000000000896348 in ReorderBufferReplay (txn=0x1457d40, rb=0x142bcf8, xid=731, commit_lsn=23060624, end_lsn=23060672,commit_time=727353664342177, origin_id=0, origin_lsn=0) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/reorderbuffer.c:2675 #8 0x00000000008963d0 in ReorderBufferCommit (rb=0x142bcf8, xid=731, commit_lsn=23060624, end_lsn=23060672, commit_time=727353664342177,origin_id=0, origin_lsn=0) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/reorderbuffer.c:2699 #9 0x00000000008842c7 in DecodeCommit (ctx=0x1429ce8, buf=0x7ffcf03731a0, parsed=0x7ffcf0372fa0, xid=731, two_phase=false) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/decode.c:682 #10 0x0000000000883667 in xact_decode (ctx=0x1429ce8, buf=0x7ffcf03731a0) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/decode.c:216 #11 0x000000000088338b in LogicalDecodingProcessRecord (ctx=0x1429ce8, record=0x142a080) at ../../PostgreSQL-Source-Dev/src/backend/replication/logical/decode.c:119 #12 0x000000000087f8c7 in XLogSendLogical () at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:3060 #13 0x000000000087ec5a in WalSndLoop (send_data=0x87f825 <XLogSendLogical>) at ../../PostgreSQL-Source-Dev/src/backend/replication/walsender.c:2490 ... ``` Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Tue, Jan 17, 2023 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Let me try to summarize the discussion till now. The problem we are > trying to solve here is to allow a shutdown to complete when walsender > is not able to send the entire WAL. Currently, in such cases, the > shutdown fails. As per our current understanding, this can happen when > (a) walreceiver/walapply process is stuck (not able to receive more > WAL) due to locks or some other reason; (b) a long time delay has been > configured to apply the WAL (we don't yet have such a feature for > logical replication but the discussion for same is in progress). > > Both reasons mostly apply to logical replication because there is no > separate walreceiver process whose job is to just flush the WAL. In > logical replication, the process that receives the WAL also applies > it. So, while applying it can stuck for a long time waiting for some > heavy-weight lock to be released by some other long-running > transaction by the backend. > While checking the commits and email discussions in this area, I came across the email [1] from Michael where something similar seems to have been discussed. Basically, whether the early shutdown of walsender can prevent a switchover between publisher and subscriber but that part was never clearly answered in that email chain. It might be worth reading the entire discussion [2]. That discussion finally lead to the following commit: commit c6c333436491a292d56044ed6e167e2bdee015a2 Author: Andres Freund <andres@anarazel.de> Date: Mon Jun 5 18:53:41 2017 -0700 Prevent possibility of panics during shutdown checkpoint. When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. ... ... Here, as mentioned in the commit, we are trying to ensure that before checkpoint writes its shutdown WAL record, we ensure that "if logical replication is in progress all existing WAL records are processed followed by a shutdown.". I think even before this commit, we try to send the entire WAL before shutdown but not completely sure. There was no discussion on what happens if the logical walreceiver/walapply process is waiting on some heavy-weight lock and the network socket buffer is full due to which walsender is not able to process its WAL. Is it okay for shutdown to fail in such a case as it is happening now, or shall we somehow detect that and shut down the walsender, or we just allow logical walsender to always exit immediately as soon as the shutdown signal came? Note: I have added some of the people involved in the previous thread's [2] discussion in the hope that they can share their thoughts. [1] - https://www.postgresql.org/message-id/CAB7nPqR3icaA%3DqMv_FuU8YVYH3KUrNMnq_OmCfkzxCHC4fox8w%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zVmYgfBk%3DLjsk-UL9cEf-eA%40mail.gmail.com -- With Regards, Amit Kapila.
On Fri, Jan 20, 2023 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 17, 2023 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Let me try to summarize the discussion till now. The problem we are > > trying to solve here is to allow a shutdown to complete when walsender > > is not able to send the entire WAL. Currently, in such cases, the > > shutdown fails. As per our current understanding, this can happen when > > (a) walreceiver/walapply process is stuck (not able to receive more > > WAL) due to locks or some other reason; (b) a long time delay has been > > configured to apply the WAL (we don't yet have such a feature for > > logical replication but the discussion for same is in progress). > > > > Both reasons mostly apply to logical replication because there is no > > separate walreceiver process whose job is to just flush the WAL. In > > logical replication, the process that receives the WAL also applies > > it. So, while applying it can stuck for a long time waiting for some > > heavy-weight lock to be released by some other long-running > > transaction by the backend. > > > > While checking the commits and email discussions in this area, I came > across the email [1] from Michael where something similar seems to > have been discussed. Basically, whether the early shutdown of > walsender can prevent a switchover between publisher and subscriber > but that part was never clearly answered in that email chain. It might > be worth reading the entire discussion [2]. That discussion finally > lead to the following commit: Right, in the thread the question is raised about whether it makes sense for logical replication to send all WALs but there is no conclusion on that. But I think this patch is mainly about resolving the PANIC due to extra WAL getting generated by walsender during checkpoint processing and that's the reason the behavior of sending all the WAL is maintained but only the extra WAL generation stopped (before shutdown checkpoint can proceed) using this new state > > commit c6c333436491a292d56044ed6e167e2bdee015a2 > Author: Andres Freund <andres@anarazel.de> > Date: Mon Jun 5 18:53:41 2017 -0700 > > Prevent possibility of panics during shutdown checkpoint. > > When the checkpointer writes the shutdown checkpoint, it checks > afterwards whether any WAL has been written since it started and > throws a PANIC if so. At that point, only walsenders are still > active, so one might think this could not happen, but walsenders can > also generate WAL, for instance in BASE_BACKUP and logical decoding > related commands (e.g. via hint bits). So they can trigger this panic > if such a command is run while the shutdown checkpoint is being > written. > > To fix this, divide the walsender shutdown into two phases. First, > checkpointer, itself triggered by postmaster, sends a > PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend > is idle or runs an SQL query this causes the backend to shutdown, if > logical replication is in progress all existing WAL records are > processed followed by a shutdown. > ... > ... > > Here, as mentioned in the commit, we are trying to ensure that before > checkpoint writes its shutdown WAL record, we ensure that "if logical > replication is in progress all existing WAL records are processed > followed by a shutdown.". I think even before this commit, we try to > send the entire WAL before shutdown but not completely sure. Yes, I think that there is no change in that behavior. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Dilip, hackers, Thanks for giving your opinion. I analyzed the relation with the given commit, and I thought I could keep my patch. How do you think? # Abstract * Some modifications should be needed. * We cannot rollback the shutdown if walsenders are stuck * We don't have a good way to detect stuck # Discussion Compared to physical replication, it is likely to happen that logical replication is stuck. I think the risk should be avoided as much as possible by fixing codes. Then, if it leads another failure, we can document the caution to users. While shutting down the server, checkpointer sends SIGUSR1 signal to wansenders. It is done after being exited other processes, so we cannot raise ERROR and rollback the operation if checkpointer recognize the process stuck at that time. We don't have any features that postmaster can check whether this node is publisher or not. So if we want to add the mechanism that can check the health of walsenders before shutting down, we must do that at the top of process_pm_shutdown_request() even if we are not in logical replication. I think it affects the basis of postgres largely, and in the first place, PostgreSQL does not have a mechanism to check the health of process. Therefore, I want to adopt the approach that walsender itself exits immediately when they get signals. ## About patch - Were fixes correct? In ProcessPendingWrites(), my patch, wansender calls WalSndDone() when it gets SIGUSR1 signal. I think this should be. From the patch [1]: ``` @@ -1450,6 +1450,10 @@ ProcessPendingWrites(void) /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); + + /* If we got shut down requested, try to exit the process */ + if (got_STOPPING) + WalSndDone(XLogSendLogical); } /* reactivate latch so WalSndLoop knows to continue */ ``` Per my analysis, in case of logical replication, walsenders exit with following steps. Note that logical walsender does not receive SIGUSR2 signal, set flag by themselves instead: 1. postmaster sends shutdown requests to checkpointer 2. checkpointer sends SIGUSR1 to walsenders and wait 3. when walsenders accept SIGUSR1, they turn got_SIGUSR1 on. 4. walsenders consume all WALs. @XLogSendLogical 5. walsenders turn got_SIGUSR2 on by themselves @XLogSendLogical 6. walsenders recognize the flag is on, so call WalSndDone() @ WalSndLoop 7. proc_exit(0) 8. checkpoitner writes shutdown record ... Type (i) stuck, I reported in -hackers[1], means that processes stop at step 6 and Type (ii) stuck means that processes stop at 4. In step4, got_SIGUSR2 is never set to on, so we must use got_STOPPING flag. [1]: https://www.postgresql.org/message-id/TYCPR01MB58701A47F35FED0A2B399662F5C49@TYCPR01MB5870.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
On Fri, Jan 20, 2023 at 7:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 17, 2023 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Let me try to summarize the discussion till now. The problem we are > > trying to solve here is to allow a shutdown to complete when walsender > > is not able to send the entire WAL. Currently, in such cases, the > > shutdown fails. As per our current understanding, this can happen when > > (a) walreceiver/walapply process is stuck (not able to receive more > > WAL) due to locks or some other reason; (b) a long time delay has been > > configured to apply the WAL (we don't yet have such a feature for > > logical replication but the discussion for same is in progress). > > > > Both reasons mostly apply to logical replication because there is no > > separate walreceiver process whose job is to just flush the WAL. In > > logical replication, the process that receives the WAL also applies > > it. So, while applying it can stuck for a long time waiting for some > > heavy-weight lock to be released by some other long-running > > transaction by the backend. > > > > While checking the commits and email discussions in this area, I came > across the email [1] from Michael where something similar seems to > have been discussed. Basically, whether the early shutdown of > walsender can prevent a switchover between publisher and subscriber > but that part was never clearly answered in that email chain. It might > be worth reading the entire discussion [2]. That discussion finally > lead to the following commit: > > commit c6c333436491a292d56044ed6e167e2bdee015a2 > Author: Andres Freund <andres@anarazel.de> > Date: Mon Jun 5 18:53:41 2017 -0700 > > Prevent possibility of panics during shutdown checkpoint. > > When the checkpointer writes the shutdown checkpoint, it checks > afterwards whether any WAL has been written since it started and > throws a PANIC if so. At that point, only walsenders are still > active, so one might think this could not happen, but walsenders can > also generate WAL, for instance in BASE_BACKUP and logical decoding > related commands (e.g. via hint bits). So they can trigger this panic > if such a command is run while the shutdown checkpoint is being > written. > > To fix this, divide the walsender shutdown into two phases. First, > checkpointer, itself triggered by postmaster, sends a > PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend > is idle or runs an SQL query this causes the backend to shutdown, if > logical replication is in progress all existing WAL records are > processed followed by a shutdown. > ... > ... > > Here, as mentioned in the commit, we are trying to ensure that before > checkpoint writes its shutdown WAL record, we ensure that "if logical > replication is in progress all existing WAL records are processed > followed by a shutdown.". I think even before this commit, we try to > send the entire WAL before shutdown but not completely sure. There was > no discussion on what happens if the logical walreceiver/walapply > process is waiting on some heavy-weight lock and the network socket > buffer is full due to which walsender is not able to process its WAL. > Is it okay for shutdown to fail in such a case as it is happening now, > or shall we somehow detect that and shut down the walsender, or we > just allow logical walsender to always exit immediately as soon as the > shutdown signal came? +1 to eliminate condition (b) for logical replication. Regarding (a), as Amit mentioned before[1], I think we should check if pq_is_send_pending() is false. Otherwise, we will end up terminating the WAL stream without the done message. Which will lead to an error message "ERROR: could not receive data from WAL stream: server closed the connection unexpectedly" on the subscriber even at a clean shutdown. In a case where pq_is_send_pending() doesn't become false for a long time, (e.g., the network socket buffer got full due to the apply worker waiting on a lock), I think users should unblock it by themselves. Or it might be practically better to shutdown the subscriber first in the logical replication case, unlike the physical replication case. I've not studied the time-delayed logical replication patch yet, though. Regards, [1] https://www.postgresql.org/message-id/CAA4eK1%2BpD654%2BXnrPugYueh7Oh22EBGTr6dA_fS0%2BgPiHayG9A%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Feb 1, 2023 at 2:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Jan 20, 2023 at 7:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jan 17, 2023 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Let me try to summarize the discussion till now. The problem we are > > > trying to solve here is to allow a shutdown to complete when walsender > > > is not able to send the entire WAL. Currently, in such cases, the > > > shutdown fails. As per our current understanding, this can happen when > > > (a) walreceiver/walapply process is stuck (not able to receive more > > > WAL) due to locks or some other reason; (b) a long time delay has been > > > configured to apply the WAL (we don't yet have such a feature for > > > logical replication but the discussion for same is in progress). > > > > > > Both reasons mostly apply to logical replication because there is no > > > separate walreceiver process whose job is to just flush the WAL. In > > > logical replication, the process that receives the WAL also applies > > > it. So, while applying it can stuck for a long time waiting for some > > > heavy-weight lock to be released by some other long-running > > > transaction by the backend. > > > ... ... > > +1 to eliminate condition (b) for logical replication. > > Regarding (a), as Amit mentioned before[1], I think we should check if > pq_is_send_pending() is false. > Sorry, but your suggestion is not completely clear to me. Do you mean to say that for logical replication, we shouldn't wait for all the WAL to be successfully replicated but we should ensure to inform the subscriber that XLOG streaming is done (by ensuring pq_is_send_pending() is false and by calling EndCommand, pq_flush())? > Otherwise, we will end up terminating > the WAL stream without the done message. Which will lead to an error > message "ERROR: could not receive data from WAL stream: server closed > the connection unexpectedly" on the subscriber even at a clean > shutdown. > But will that be a problem? As per docs of shutdown [1] ( “Smart” mode disallows new connections, then waits for all existing clients to disconnect. If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected.), there is no such guarantee. I see that it is required for the switchover in physical replication to ensure that all the WAL is sent and replicated but we don't need that for logical replication. > In a case where pq_is_send_pending() doesn't become false > for a long time, (e.g., the network socket buffer got full due to the > apply worker waiting on a lock), I think users should unblock it by > themselves. Or it might be practically better to shutdown the > subscriber first in the logical replication case, unlike the physical > replication case. > Yeah, will users like such a dependency? And what will they gain by doing so? [1] - https://www.postgresql.org/docs/devel/app-pg-ctl.html -- With Regards, Amit Kapila.
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
At Wed, 1 Feb 2023 14:58:14 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Wed, Feb 1, 2023 at 2:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Otherwise, we will end up terminating > > the WAL stream without the done message. Which will lead to an error > > message "ERROR: could not receive data from WAL stream: server closed > > the connection unexpectedly" on the subscriber even at a clean > > shutdown. > > > > But will that be a problem? As per docs of shutdown [1] ( “Smart” mode > disallows new connections, then waits for all existing clients to > disconnect. If the server is in hot standby, recovery and streaming > replication will be terminated once all clients have disconnected.), > there is no such guarantee. I see that it is required for the > switchover in physical replication to ensure that all the WAL is sent > and replicated but we don't need that for logical replication. +1 Since publisher is not aware of apply-delay (by this patch), as a matter of fact publisher seems gone before sending EOS in that case. The error message is correctly describing that situation. > > In a case where pq_is_send_pending() doesn't become false > > for a long time, (e.g., the network socket buffer got full due to the > > apply worker waiting on a lock), I think users should unblock it by > > themselves. Or it might be practically better to shutdown the > > subscriber first in the logical replication case, unlike the physical > > replication case. > > > > Yeah, will users like such a dependency? And what will they gain by doing so? If PostgreSQL required such kind of special care about shutdown at facing a trouble to keep replication consistency, that won't be acceptable. The current time-delayed logical replication can be seen as a kind of intentional continuous large network lag in this aspect. And I think the consistency is guaranteed even in such cases. On the other hand I don't think the almost all people care about the exact progress when facing such troubles, as far as replication consistently is maintained. IMHO that is also true for the logical-delayed-replication case. > [1] - https://www.postgresql.org/docs/devel/app-pg-ctl.html regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Feb 1, 2023 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Feb 1, 2023 at 2:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Jan 20, 2023 at 7:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jan 17, 2023 at 2:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Let me try to summarize the discussion till now. The problem we are > > > > trying to solve here is to allow a shutdown to complete when walsender > > > > is not able to send the entire WAL. Currently, in such cases, the > > > > shutdown fails. As per our current understanding, this can happen when > > > > (a) walreceiver/walapply process is stuck (not able to receive more > > > > WAL) due to locks or some other reason; (b) a long time delay has been > > > > configured to apply the WAL (we don't yet have such a feature for > > > > logical replication but the discussion for same is in progress). > > > > > > > > Both reasons mostly apply to logical replication because there is no > > > > separate walreceiver process whose job is to just flush the WAL. In > > > > logical replication, the process that receives the WAL also applies > > > > it. So, while applying it can stuck for a long time waiting for some > > > > heavy-weight lock to be released by some other long-running > > > > transaction by the backend. > > > > > ... > ... > > > > +1 to eliminate condition (b) for logical replication. > > > > Regarding (a), as Amit mentioned before[1], I think we should check if > > pq_is_send_pending() is false. > > > > Sorry, but your suggestion is not completely clear to me. Do you mean > to say that for logical replication, we shouldn't wait for all the WAL > to be successfully replicated but we should ensure to inform the > subscriber that XLOG streaming is done (by ensuring > pq_is_send_pending() is false and by calling EndCommand, pq_flush())? Yes. > > > Otherwise, we will end up terminating > > the WAL stream without the done message. Which will lead to an error > > message "ERROR: could not receive data from WAL stream: server closed > > the connection unexpectedly" on the subscriber even at a clean > > shutdown. > > > > But will that be a problem? As per docs of shutdown [1] ( “Smart” mode > disallows new connections, then waits for all existing clients to > disconnect. If the server is in hot standby, recovery and streaming > replication will be terminated once all clients have disconnected.), > there is no such guarantee. In smart shutdown case, the walsender doesn't exit until it can flush the done message, no? > I see that it is required for the > switchover in physical replication to ensure that all the WAL is sent > and replicated but we don't need that for logical replication. It won't be a problem in practice in terms of logical replication. But I'm concerned that this error could confuse users. Is there any case where the client gets such an error at the smart shutdown? > > > In a case where pq_is_send_pending() doesn't become false > > for a long time, (e.g., the network socket buffer got full due to the > > apply worker waiting on a lock), I think users should unblock it by > > themselves. Or it might be practically better to shutdown the > > subscriber first in the logical replication case, unlike the physical > > replication case. > > > > Yeah, will users like such a dependency? And what will they gain by doing so? IIUC there is no difference between smart shutdown and fast shutdown in logical replication walsender, but reading the doc[1], it seems to me that in the smart shutdown mode, the server stops existing sessions normally. For example, If the client is psql that gets stuck for some reason and the network buffer gets full, the smart shutdown waits for a backend process to send all results to the client. I think the logical replication walsender should follow this behavior for consistency. One idea is to distinguish smart shutdown and fast shutdown also in logical replication walsender so that we disconnect even without the done message in fast shutdown mode, but I'm not sure it's worthwhile. Regards, [1] https://www.postgresql.org/docs/devel/server-shutdown.html Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Feb 2, 2023 at 10:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Feb 1, 2023 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > In a case where pq_is_send_pending() doesn't become false > > > for a long time, (e.g., the network socket buffer got full due to the > > > apply worker waiting on a lock), I think users should unblock it by > > > themselves. Or it might be practically better to shutdown the > > > subscriber first in the logical replication case, unlike the physical > > > replication case. > > > > > > > Yeah, will users like such a dependency? And what will they gain by doing so? > > IIUC there is no difference between smart shutdown and fast shutdown > in logical replication walsender, but reading the doc[1], it seems to > me that in the smart shutdown mode, the server stops existing sessions > normally. For example, If the client is psql that gets stuck for some > reason and the network buffer gets full, the smart shutdown waits for > a backend process to send all results to the client. I think the > logical replication walsender should follow this behavior for > consistency. One idea is to distinguish smart shutdown and fast > shutdown also in logical replication walsender so that we disconnect > even without the done message in fast shutdown mode, but I'm not sure > it's worthwhile. > The main problem we want to solve here is to avoid shutdown failing in case walreceiver/applyworker is busy waiting for some lock or for some other reason as shown in the email [1]. I haven't tested it but if such a problem doesn't exist in smart shutdown mode then probably we can allow walsender to wait till all the data is sent. We can once investigate what it takes to introduce shutdown mode knowledge for logical walsender. OTOH, the docs for smart shutdown says "If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected." which to me indicates that it is okay to terminate logical replication connections even in smart mode. [1] - https://www.postgresql.org/message-id/TYAPR01MB58669CB06F6657ABCEFE6555F5F29%40TYAPR01MB5866.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Thu, Feb 2, 2023 at 10:04 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 1 Feb 2023 14:58:14 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > On Wed, Feb 1, 2023 at 2:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Otherwise, we will end up terminating > > > the WAL stream without the done message. Which will lead to an error > > > message "ERROR: could not receive data from WAL stream: server closed > > > the connection unexpectedly" on the subscriber even at a clean > > > shutdown. > > > > > > > But will that be a problem? As per docs of shutdown [1] ( “Smart” mode > > disallows new connections, then waits for all existing clients to > > disconnect. If the server is in hot standby, recovery and streaming > > replication will be terminated once all clients have disconnected.), > > there is no such guarantee. I see that it is required for the > > switchover in physical replication to ensure that all the WAL is sent > > and replicated but we don't need that for logical replication. > > +1 > > Since publisher is not aware of apply-delay (by this patch), as a > matter of fact publisher seems gone before sending EOS in that > case. The error message is correctly describing that situation. > This can happen even without apply-delay patch. For example, when apply process is waiting on some lock. -- With Regards, Amit Kapila.
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, Sawada-san, > > IIUC there is no difference between smart shutdown and fast shutdown > > in logical replication walsender, but reading the doc[1], it seems to > > me that in the smart shutdown mode, the server stops existing sessions > > normally. For example, If the client is psql that gets stuck for some > > reason and the network buffer gets full, the smart shutdown waits for > > a backend process to send all results to the client. I think the > > logical replication walsender should follow this behavior for > > consistency. One idea is to distinguish smart shutdown and fast > > shutdown also in logical replication walsender so that we disconnect > > even without the done message in fast shutdown mode, but I'm not sure > > it's worthwhile. > > > > The main problem we want to solve here is to avoid shutdown failing in > case walreceiver/applyworker is busy waiting for some lock or for some > other reason as shown in the email [1]. I haven't tested it but if > such a problem doesn't exist in smart shutdown mode then probably we > can allow walsender to wait till all the data is sent. Based on the idea, I made a PoC patch to introduce the smart shutdown to walsenders. PSA 0002 patch. 0001 is not changed from v5. When logical walsenders got shutdown request but their send buffer is full due to the delay, they will: * wait to complete to send data to subscriber if we are in smart shutdown mode * exit immediately if we are in fast shutdown mode Note that in both case, walsender does not wait the remote flush of WALs. For implementing that, I added new attribute to WalSndCtlData that indicates the shutdown status. Basically it is zero, but it will be changed by postmaster when it gets request. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Fri, Feb 3, 2023 at 5:38 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Amit, Sawada-san, > > > > IIUC there is no difference between smart shutdown and fast shutdown > > > in logical replication walsender, but reading the doc[1], it seems to > > > me that in the smart shutdown mode, the server stops existing sessions > > > normally. For example, If the client is psql that gets stuck for some > > > reason and the network buffer gets full, the smart shutdown waits for > > > a backend process to send all results to the client. I think the > > > logical replication walsender should follow this behavior for > > > consistency. One idea is to distinguish smart shutdown and fast > > > shutdown also in logical replication walsender so that we disconnect > > > even without the done message in fast shutdown mode, but I'm not sure > > > it's worthwhile. > > > > > > > The main problem we want to solve here is to avoid shutdown failing in > > case walreceiver/applyworker is busy waiting for some lock or for some > > other reason as shown in the email [1]. > > For this problem isn't using -t (timeout) avoid it? So, if there is a pending WAL, users can always use -t option to allow the shutdown to complete. Now, I agree that it is not very clear how much time to specify but a user has some option to allow the shutdown to complete. I am not telling that teaching walsenders about shutdown modes is completely a bad idea but it doesn't seem necessary to allow shutdowns to complete. -- With Regards, Amit Kapila.
Hi, On 2023-02-02 11:21:54 +0530, Amit Kapila wrote: > The main problem we want to solve here is to avoid shutdown failing in > case walreceiver/applyworker is busy waiting for some lock or for some > other reason as shown in the email [1]. Isn't handling this part of the job of wal_sender_timeout? I don't at all agree that it's ok to just stop replicating changes because we're blocked on network IO. The patch justifies this with: > Currently, at shutdown, walsender processes wait to send all pending data and > ensure the all data is flushed in remote node. This mechanism was added by > 985bd7 for supporting clean switch over, but such use-case cannot be supported > for logical replication. This commit remove the blocking in the case. and at the start of the thread with: > In case of logical replication, however, we cannot support the use-case that > switches the role publisher <-> subscriber. Suppose same case as above, additional > transactions are committed while doing step2. To catch up such changes subscriber > must receive WALs related with trans, but it cannot be done because subscriber > cannot request WALs from the specific position. In the case, we must truncate all > data in new subscriber once, and then create new subscription with copy_data > = true. But that seems a too narrow view to me. Imagine you want to decomission the current primary, and instead start to use the logical standby as the primary. For that you'd obviously want to replicate the last few changes. But with the proposed change, that'd be hard to ever achieve. Note that even disallowing any writes on the logical primary would make it hard to be sure that everything is replicated, because autovacuum, bgwriter, checkpointer all can continue to write WAL. Without being able to check that the last LSN has indeed been sent out, how do you know that you didn't miss something? Greetings, Andres Freund
On Sat, Feb 4, 2023 at 6:31 PM Andres Freund <andres@anarazel.de> wrote: > > On 2023-02-02 11:21:54 +0530, Amit Kapila wrote: > > The main problem we want to solve here is to avoid shutdown failing in > > case walreceiver/applyworker is busy waiting for some lock or for some > > other reason as shown in the email [1]. > > Isn't handling this part of the job of wal_sender_timeout? > In some cases, it is not clear whether we can handle it by wal_sender_timeout. Consider a case of a time-delayed replica where the applyworker will keep sending some response/alive message so that walsender doesn't timeout in that (during delay) period. In that case, because walsender won't timeout, the shutdown will fail (with the failed message) even though it will be complete after the walsender is able to send all the WAL and shutdown. The time-delayed replica patch is still under discussion [1]. Also, for large values of wal_sender_timeout, it will wait till the walsender times out and can return with a failed message. > > I don't at all agree that it's ok to just stop replicating changes > because we're blocked on network IO. The patch justifies this with: > > > Currently, at shutdown, walsender processes wait to send all pending data and > > ensure the all data is flushed in remote node. This mechanism was added by > > 985bd7 for supporting clean switch over, but such use-case cannot be supported > > for logical replication. This commit remove the blocking in the case. > > and at the start of the thread with: > > > In case of logical replication, however, we cannot support the use-case that > > switches the role publisher <-> subscriber. Suppose same case as above, additional > > transactions are committed while doing step2. To catch up such changes subscriber > > must receive WALs related with trans, but it cannot be done because subscriber > > cannot request WALs from the specific position. In the case, we must truncate all > > data in new subscriber once, and then create new subscription with copy_data > > = true. > > But that seems a too narrow view to me. Imagine you want to decomission > the current primary, and instead start to use the logical standby as the > primary. For that you'd obviously want to replicate the last few > changes. But with the proposed change, that'd be hard to ever achieve. > I think that can still be achieved with the idea being discussed which is to keep allowing sending the WAL for smart shutdown mode but not for other modes(fast, immediate). I don't know whether it is a good idea or not but Kuroda-San has produced a POC patch for it. We can instead choose to improve our docs related to shutdown to explain a bit more about the shutdown's interaction with (logical and physical) replication. As of now, it says: (“Smart” mode disallows new connections, then waits for all existing clients to disconnect. If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected.)[2]. Here, it is not clear that shutdown will wait for sending and flushing all the WALs. The information for fast and immediate modes is even lesser which makes it more difficult to understand what kind of behavior is expected in those modes. [1] - https://commitfest.postgresql.org/42/3581/ [2] - https://www.postgresql.org/docs/devel/app-pg-ctl.html -- With Regards, Amit Kapila.
Hi, On February 5, 2023 8:29:19 PM PST, Amit Kapila <amit.kapila16@gmail.com> wrote: >On Sat, Feb 4, 2023 at 6:31 PM Andres Freund <andres@anarazel.de> wrote: >> >> On 2023-02-02 11:21:54 +0530, Amit Kapila wrote: >> > The main problem we want to solve here is to avoid shutdown failing in >> > case walreceiver/applyworker is busy waiting for some lock or for some >> > other reason as shown in the email [1]. >> >> Isn't handling this part of the job of wal_sender_timeout? >> > >In some cases, it is not clear whether we can handle it by >wal_sender_timeout. Consider a case of a time-delayed replica where >the applyworker will keep sending some response/alive message so that >walsender doesn't timeout in that (during delay) period. In that case, >because walsender won't timeout, the shutdown will fail (with the >failed message) even though it will be complete after the walsender is >able to send all the WAL and shutdown. The time-delayed replica patch >is still under discussion [1]. Also, for large values of >wal_sender_timeout, it will wait till the walsender times out and can >return with a failed message. > >> >> I don't at all agree that it's ok to just stop replicating changes >> because we're blocked on network IO. The patch justifies this with: >> >> > Currently, at shutdown, walsender processes wait to send all pending data and >> > ensure the all data is flushed in remote node. This mechanism was added by >> > 985bd7 for supporting clean switch over, but such use-case cannot be supported >> > for logical replication. This commit remove the blocking in the case. >> >> and at the start of the thread with: >> >> > In case of logical replication, however, we cannot support the use-case that >> > switches the role publisher <-> subscriber. Suppose same case as above, additional >> > transactions are committed while doing step2. To catch up such changes subscriber >> > must receive WALs related with trans, but it cannot be done because subscriber >> > cannot request WALs from the specific position. In the case, we must truncate all >> > data in new subscriber once, and then create new subscription with copy_data >> > = true. >> >> But that seems a too narrow view to me. Imagine you want to decomission >> the current primary, and instead start to use the logical standby as the >> primary. For that you'd obviously want to replicate the last few >> changes. But with the proposed change, that'd be hard to ever achieve. >> > >I think that can still be achieved with the idea being discussed which >is to keep allowing sending the WAL for smart shutdown mode but not >for other modes(fast, immediate). I don't know whether it is a good >idea or not but Kuroda-San has produced a POC patch for it. We can >instead choose to improve our docs related to shutdown to explain a >bit more about the shutdown's interaction with (logical and physical) >replication. As of now, it says: (“Smart” mode disallows new >connections, then waits for all existing clients to disconnect. If the >server is in hot standby, recovery and streaming replication will be >terminated once all clients have disconnected.)[2]. Here, it is not >clear that shutdown will wait for sending and flushing all the WALs. >The information for fast and immediate modes is even lesser which >makes it more difficult to understand what kind of behavior is >expected in those modes. > >[1] - https://commitfest.postgresql.org/42/3581/ >[2] - https://www.postgresql.org/docs/devel/app-pg-ctl.html > Smart shutdown is practically unusable. I don't think it makes sense to tie behavior of walsender to it in any way. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Mon, Feb 6, 2023 at 10:33 AM Andres Freund <andres@anarazel.de> wrote: > > On February 5, 2023 8:29:19 PM PST, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> But that seems a too narrow view to me. Imagine you want to decomission > >> the current primary, and instead start to use the logical standby as the > >> primary. For that you'd obviously want to replicate the last few > >> changes. But with the proposed change, that'd be hard to ever achieve. > >> > > > >I think that can still be achieved with the idea being discussed which > >is to keep allowing sending the WAL for smart shutdown mode but not > >for other modes(fast, immediate). I don't know whether it is a good > >idea or not but Kuroda-San has produced a POC patch for it. We can > >instead choose to improve our docs related to shutdown to explain a > >bit more about the shutdown's interaction with (logical and physical) > >replication. As of now, it says: (“Smart” mode disallows new > >connections, then waits for all existing clients to disconnect. If the > >server is in hot standby, recovery and streaming replication will be > >terminated once all clients have disconnected.)[2]. Here, it is not > >clear that shutdown will wait for sending and flushing all the WALs. > >The information for fast and immediate modes is even lesser which > >makes it more difficult to understand what kind of behavior is > >expected in those modes. > > > >[1] - https://commitfest.postgresql.org/42/3581/ > >[2] - https://www.postgresql.org/docs/devel/app-pg-ctl.html > > > > Smart shutdown is practically unusable. I don't think it makes sense to tie behavior of walsender to it in any way. > So, we have the following options: (a) do nothing for this; (b) clarify the current behavior in docs. Any suggestions? -- With Regards, Amit Kapila.
Hi, On 2023-02-06 12:23:54 +0530, Amit Kapila wrote: > On Mon, Feb 6, 2023 at 10:33 AM Andres Freund <andres@anarazel.de> wrote: > > Smart shutdown is practically unusable. I don't think it makes sense to tie behavior of walsender to it in any way. > > > > So, we have the following options: (a) do nothing for this; (b) > clarify the current behavior in docs. Any suggestions? b) seems good. I also think it'd make sense to improve this on a code-level. Just not in the wholesale way discussed so far. How about we make it an option in START_REPLICATION? Delayed logical rep can toggle that on by default. Greetings, Andres Freund
On Tue, Feb 7, 2023 at 2:04 AM Andres Freund <andres@anarazel.de> wrote: > > On 2023-02-06 12:23:54 +0530, Amit Kapila wrote: > > On Mon, Feb 6, 2023 at 10:33 AM Andres Freund <andres@anarazel.de> wrote: > > > Smart shutdown is practically unusable. I don't think it makes sense to tie behavior of walsender to it in any way. > > > > > > > So, we have the following options: (a) do nothing for this; (b) > > clarify the current behavior in docs. Any suggestions? > > b) seems good. > > I also think it'd make sense to improve this on a code-level. Just not in the > wholesale way discussed so far. > > How about we make it an option in START_REPLICATION? Delayed logical rep can > toggle that on by default. > Works for me. So, when this option is set in START_REPLICATION message, walsender will set some flag and allow itself to exit at shutdown without waiting for WAL to be sent? -- With Regards, Amit Kapila.
Hi, On 2023-02-07 09:00:13 +0530, Amit Kapila wrote: > On Tue, Feb 7, 2023 at 2:04 AM Andres Freund <andres@anarazel.de> wrote: > > How about we make it an option in START_REPLICATION? Delayed logical rep can > > toggle that on by default. > Works for me. So, when this option is set in START_REPLICATION > message, walsender will set some flag and allow itself to exit at > shutdown without waiting for WAL to be sent? Yes. I think that might be useful in other situations as well, but we don't need to make those configurable initially. But I imagine it'd be useful to set things up so that non-HA physical replicas don't delay shutdown, particularly if they're geographically far away. Greetings, Andres Freund
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Andres, Amit, > On 2023-02-07 09:00:13 +0530, Amit Kapila wrote: > > On Tue, Feb 7, 2023 at 2:04 AM Andres Freund <andres@anarazel.de> wrote: > > > How about we make it an option in START_REPLICATION? Delayed logical > rep can > > > toggle that on by default. > > > Works for me. So, when this option is set in START_REPLICATION > > message, walsender will set some flag and allow itself to exit at > > shutdown without waiting for WAL to be sent? > > Yes. I think that might be useful in other situations as well, but we don't > need to make those configurable initially. But I imagine it'd be useful to set > things up so that non-HA physical replicas don't delay shutdown, particularly > if they're geographically far away. Based on the discussion, I made a patch for adding a walsender option exit_before_confirming to the START_STREAMING replication command. It can be used for both physical and logical replication. I made the patch with extendibility - it allows adding further options. And better naming are very welcome. For physical replication, the grammar was slightly changed like a logical one. It can now accept options but currently, only one option is allowed. And it is not used in normal streaming replication. For logical replication, the option is combined with options for the output plugin. Of course, we can modify the API to better style. 0001 patch was ported from time-delayed logical replication thread[1], which uses the added option. When the min_apply_delay option is specified and publisher seems to be PG16 or later, the apply worker sends a START_REPLICATION query with exit_before_confirming = true. And the worker will reboot and send START_REPLICATION again when min_apply_delay is changed from zero to a non-zero value or non-zero to zero. Note that I removed version number because the approach is completely changed. [1]: https://www.postgresql.org/message-id/TYCPR01MB8373BA483A6D2C924C600968EDDB9@TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
> Dear Andres, Amit, > > > On 2023-02-07 09:00:13 +0530, Amit Kapila wrote: > > > On Tue, Feb 7, 2023 at 2:04 AM Andres Freund <andres@anarazel.de> wrote: > > > > How about we make it an option in START_REPLICATION? Delayed logical > > rep can > > > > toggle that on by default. > > > > > Works for me. So, when this option is set in START_REPLICATION > > > message, walsender will set some flag and allow itself to exit at > > > shutdown without waiting for WAL to be sent? > > > > Yes. I think that might be useful in other situations as well, but we don't > > need to make those configurable initially. But I imagine it'd be useful to set > > things up so that non-HA physical replicas don't delay shutdown, particularly > > if they're geographically far away. > > Based on the discussion, I made a patch for adding a walsender option > exit_before_confirming to the START_STREAMING replication command. It can > be > used for both physical and logical replication. I made the patch with > extendibility - it allows adding further options. > And better naming are very welcome. > > For physical replication, the grammar was slightly changed like a logical one. > It can now accept options but currently, only one option is allowed. And it is > not used in normal streaming replication. For logical replication, the option is > combined with options for the output plugin. Of course, we can modify the API to > better style. > > 0001 patch was ported from time-delayed logical replication thread[1], which > uses > the added option. When the min_apply_delay option is specified and publisher > seems > to be PG16 or later, the apply worker sends a START_REPLICATION query with > exit_before_confirming = true. And the worker will reboot and send > START_REPLICATION > again when min_apply_delay is changed from zero to a non-zero value or non-zero > to zero. > > Note that I removed version number because the approach is completely changed. > > [1]: > https://www.postgresql.org/message-id/TYCPR01MB8373BA483A6D2C924C60 > 0968EDDB9@TYCPR01MB8373.jpnprd01.prod.outlook.com I noticed that previous ones are rejected by cfbot, even if they passed on my environment... PSA fixed version. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
> I noticed that previous ones are rejected by cfbot, even if they passed on my > environment... > PSA fixed version. While analyzing more, I found the further bug that forgets initialization. PSA new version that could be passed automated tests on my github repository. Sorry for noise. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
I agree to the direction and thanks for the patch. At Tue, 7 Feb 2023 17:08:54 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in > > I noticed that previous ones are rejected by cfbot, even if they passed on my > > environment... > > PSA fixed version. > > While analyzing more, I found the further bug that forgets initialization. > PSA new version that could be passed automated tests on my github repository. > Sorry for noise. 0002: This patch doesn't seem to offer a means to change the default walsender behavior. We need a subscription option named like "walsender_exit_mode" to do that. +ConsumeWalsenderOptions(List *options, WalSndData *data) I wonder if it is the right design to put options for different things into a single list. I rather choose to embed the walsender option in the syntax than needing this function. K_START_REPLICATION opt_slot opt_physical RECPTR opt_timeline opt_shutdown_mode K_START_REPLICATION K_SLOTIDENT K_LOGICAL RECPTR opt_shutdown_mode plugin_options where opt_shutdown_mode would be like "SHUTDOWN_MODE immediate". ====== If we go with the current design, I think it is better to share the option list rule between the logical and physical START_REPLCIATION commands. I'm not sure I like the option syntax "exit_before_confirming=<Boolean>". I imagin that other options may come in future. Thus, how about "walsender_shutdown_mode=<mode>", where the mode is one of "wait_flush"(default) and "immediate"? +typedef struct +{ + bool exit_before_confirming; +} WalSndData; Data doesn't seem to represent the variable. Why not WalSndOptions? - !equal(newsub->publications, MySubscription->publications)) + !equal(newsub->publications, MySubscription->publications) || + (newsub->minapplydelay > 0 && MySubscription->minapplydelay == 0) || + (newsub->minapplydelay == 0 && MySubscription->minapplydelay > 0)) I slightly prefer the following expression (Others may disagree:p): ((newsub->minapplydelay == 0) != (MySubscription->minapplydelay == 0)) And I think we need a comment for the term. For example, /* minapplydelay affects START_REPLICATION option exit_before_confirming */ + * Reads all entrly of the list and consume if needed. s/entrly/entries/ ? ... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Feb 8, 2023 at 7:57 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > I agree to the direction and thanks for the patch. > > At Tue, 7 Feb 2023 17:08:54 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in > > > I noticed that previous ones are rejected by cfbot, even if they passed on my > > > environment... > > > PSA fixed version. > > > > While analyzing more, I found the further bug that forgets initialization. > > PSA new version that could be passed automated tests on my github repository. > > Sorry for noise. > > 0002: > > This patch doesn't seem to offer a means to change the default > walsender behavior. We need a subscription option named like > "walsender_exit_mode" to do that. > I don't think at this stage we need a subscription-level option, we can extend it later if this is really useful for users. For now, we can set this new option when min_apply_delay > 0. > > +ConsumeWalsenderOptions(List *options, WalSndData *data) > > I wonder if it is the right design to put options for different things > into a single list. I rather choose to embed the walsender option in > the syntax than needing this function. > > K_START_REPLICATION opt_slot opt_physical RECPTR opt_timeline opt_shutdown_mode > > K_START_REPLICATION K_SLOTIDENT K_LOGICAL RECPTR opt_shutdown_mode plugin_options > > where opt_shutdown_mode would be like "SHUTDOWN_MODE immediate". > The other option could have been that we just add it as a plugin_option for logical replication but it doesn't seem to match with the other plugin options. I think it would be better to have it as a separate option something like opt_shutdown_immediate and extend the logical replication syntax for now. We can later extend physical replication syntax when we want to expose such an option via physical replication. -- With Regards, Amit Kapila.
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, Thanks for giving comments! > > > > 0002: > > > > This patch doesn't seem to offer a means to change the default > > walsender behavior. We need a subscription option named like > > "walsender_exit_mode" to do that. > > > > I don't think at this stage we need a subscription-level option, we > can extend it later if this is really useful for users. For now, we > can set this new option when min_apply_delay > 0. Agreed. I wanted to keep the feature closed for PG16 and then will extend if needed. > > > > +ConsumeWalsenderOptions(List *options, WalSndData *data) > > > > I wonder if it is the right design to put options for different things > > into a single list. I rather choose to embed the walsender option in > > the syntax than needing this function. > > > > K_START_REPLICATION opt_slot opt_physical RECPTR opt_timeline > opt_shutdown_mode > > > > K_START_REPLICATION K_SLOTIDENT K_LOGICAL RECPTR > opt_shutdown_mode plugin_options > > > > where opt_shutdown_mode would be like "SHUTDOWN_MODE immediate". > > > > The other option could have been that we just add it as a > plugin_option for logical replication but it doesn't seem to match > with the other plugin options. I think it would be better to have it > as a separate option something like opt_shutdown_immediate and extend > the logical replication syntax for now. We can later extend physical > replication syntax when we want to expose such an option via physical > replication. The main intention for us is to shut down logical walsenders. Therefore, same as above, I want to develop the feature for logical replication once and then try to extend if we want. TBH I think adding physicalrep support seems not to be so hard, but I want to keep the patch smaller. The new patch will be attached soon in another mail. Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Horiguchi-san, Thank you for checking the patch! PSA new version. > 0002: > > This patch doesn't seem to offer a means to change the default > walsender behavior. We need a subscription option named like > "walsender_exit_mode" to do that. As I said in another mail[1], I'm thinking the feature does not have to be used alone for now. > +ConsumeWalsenderOptions(List *options, WalSndData *data) > > I wonder if it is the right design to put options for different things > into a single list. I rather choose to embed the walsender option in > the syntax than needing this function. > > K_START_REPLICATION opt_slot opt_physical RECPTR opt_timeline > opt_shutdown_mode > > K_START_REPLICATION K_SLOTIDENT K_LOGICAL RECPTR > opt_shutdown_mode plugin_options > > where opt_shutdown_mode would be like "SHUTDOWN_MODE immediate". Right, the option handling was quite bad. I added new syntax opt_shutdown_mode to logical replication. And many codes were modified accordingly. Note that based on the other discussion, I removed codes for supporting physical replication but tried to keep the extensibility. > ====== > If we go with the current design, I think it is better to share the > option list rule between the logical and physical START_REPLCIATION > commands. > > I'm not sure I like the option syntax > "exit_before_confirming=<Boolean>". I imagin that other options may > come in future. Thus, how about "walsender_shutdown_mode=<mode>", > where the mode is one of "wait_flush"(default) and "immediate"? Seems better, I changed to from boolean to enumeration. > +typedef struct > +{ > + bool exit_before_confirming; > +} WalSndData; > > Data doesn't seem to represent the variable. Why not WalSndOptions? This is inspired by PGOutputData, but I prefer your idea. Fixed. > - !equal(newsub->publications, MySubscription->publications)) > + !equal(newsub->publications, MySubscription->publications) || > + (newsub->minapplydelay > 0 && > MySubscription->minapplydelay == 0) || > + (newsub->minapplydelay == 0 && > MySubscription->minapplydelay > 0)) > > I slightly prefer the following expression (Others may disagree:p): > > ((newsub->minapplydelay == 0) != (MySubscription->minapplydelay == 0)) I think conditions for the same parameter should be aligned one line, So your posted seems better. Fixed. > > And I think we need a comment for the term. For example, > > /* minapplydelay affects START_REPLICATION option exit_before_confirming > */ Added just above the condition. > + * Reads all entrly of the list and consume if needed. > s/entrly/entries/ ? > ... This part is no longer needed. [1]: https://www.postgresql.org/message-id/TYAPR01MB5866D3EC780D251953BDE7FAF5D89%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
> > Dear Horiguchi-san, > > Thank you for checking the patch! PSA new version. PSA rebased patch that supports updated time-delayed patch[1]. [1]: https://www.postgresql.org/message-id/TYAPR01MB5866C11DAF8AB04F3CC181D3F5D89@TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Exit walsender before confirming remote flush in logical replication
From
"Takamichi Osumi (Fujitsu)"
Date:
On Wednesday, February 8, 2023 6:47 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > PSA rebased patch that supports updated time-delayed patch[1]. Hi, Thanks for creating the patch ! Minor review comments on v5-0002. (1) + Decides the condition for exiting the walsender process. + <literal>'wait_flush'</literal>, which is the default, the walsender + will wait for all the sent WALs to be flushed on the subscriber side, + before exiting the process. <literal>'immediate'</literal> will exit + without confirming the remote flush. This may break the consistency + between publisher and subscriber, but it may be useful for a system + that has a high-latency network to reduce the amount of time for + shutdown. (1-1) The first part "exiting the walsender process" can be improved. Probably, you can say "the exiting walsender process" or "Decides the behavior of the walsender process at shutdown" instread. (1-2) Also, the next sentence can be improved something like "If the shutdown mode is wait_flush, which is the default, the walsender waits for all the sent WALs to be flushed on the subscriber side. If it is immediate, the walsender exits without confirming the remote flush". (1-3) We don't need to wrap wait_flush and immediate by single quotes within the literal tag. (2) + /* minapplydelay affects SHUTDOWN_MODE option */ I think we can move this comment to just above the 'if' condition and combine it with the existing 'if' conditions comments. (3) 001_rep_changes.pl (3-1) Question In general, do we add this kind of check when we extend the protocol (STREAM_REPLICATION command) or add a new condition for apply worker exit ? In case when we would like to know the restart of the walsender process in TAP tests, then could you tell me why the new test code matches the purpose of this patch ? (3-2) + "Timed out while waiting for apply to restart after changing min_apply_delay to non-zero value"; Probably, we can partly change this sentence like below, because we check walsender's pid. FROM: "... while waiting for apply to restart..." TO: "... while waiting for the walsender to restart..." Best Regards, Takamichi Osumi
Hi, On Wed, Feb 8, 2023 at 6:47 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > > > Dear Horiguchi-san, > > > > Thank you for checking the patch! PSA new version. > > PSA rebased patch that supports updated time-delayed patch[1]. > Thank you for the patch! Here are some comments on v5 patch: +/* + * Options for controlling the behavior of the walsender. Options can be + * specified in the START_STREAMING replication command. Currently only one + * option is allowed. + */ +typedef struct +{ + WalSndShutdownMode shutdown_mode; +} WalSndOptions; + +static WalSndOptions *my_options = NULL; I'm not sure we need to have it as a struct at this stage since we support only one option. I wonder if we can have one value, say shutdown_mode, and we can make it a struct when we really need it. Even if we use WalSndOptions struct, I don't think we need to dynamically allocate it. Since a walsender can start logical replication multiple times in principle, my_options is not freed. --- +/* + * Parse given shutdown mode. + * + * Currently two values are accepted - "wait_flush" and "immediate" + */ +static void +ParseShutdownMode(char *shutdownmode) +{ + if (pg_strcasecmp(shutdownmode, "wait_flush") == 0) + my_options->shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; + else if (pg_strcasecmp(shutdownmode, "immediate") == 0) + my_options->shutdown_mode = WALSND_SHUTDOWN_MODE_IMMIDEATE; + else + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), + errmsg("SHUTDOWN_MODE requires \"wait_flush\" or \"immediate\"")); +} I think we should make the error message consistent with other enum parameters. How about the message like: ERROR: invalid value shutdown mode: "%s" Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Osumi-san, Thank you for reviewing! PSA new version. > (1) > > + Decides the condition for exiting the walsender process. > + <literal>'wait_flush'</literal>, which is the default, the walsender > + will wait for all the sent WALs to be flushed on the subscriber side, > + before exiting the process. <literal>'immediate'</literal> will exit > + without confirming the remote flush. This may break the consistency > + between publisher and subscriber, but it may be useful for a system > + that has a high-latency network to reduce the amount of time for > + shutdown. > > (1-1) > > The first part "exiting the walsender process" can be improved. > Probably, you can say "the exiting walsender process" or > "Decides the behavior of the walsender process at shutdown" instread. Fixed. Second idea was chosen. > (1-2) > > Also, the next sentence can be improved something like > "If the shutdown mode is wait_flush, which is the default, the > walsender waits for all the sent WALs to be flushed on the subscriber side. > If it is immediate, the walsender exits without confirming the remote flush". Fixed. > (1-3) > > We don't need to wrap wait_flush and immediate by single quotes > within the literal tag. This style was ported from the SNAPSHOT options part, so I decided to keep. > (2) > > + /* minapplydelay affects SHUTDOWN_MODE option */ > > I think we can move this comment to just above the 'if' condition > and combine it with the existing 'if' conditions comments. Moved and added some comments. > (3) 001_rep_changes.pl > > (3-1) Question > > In general, do we add this kind of check when we extend the protocol > (STREAM_REPLICATION command) > or add a new condition for apply worker exit ? > In case when we would like to know the restart of the walsender process in TAP > tests, > then could you tell me why the new test code matches the purpose of this patch ? The replication command is not for normal user, so I think we don't have to test itself. The check that waits to restart the apply worker was added to improve the robustness. I think there is a possibility to fail the test when the apply worker recevies a transaction before it checks new subscription option. Now the failure can be avoided by confriming to reload pg_subscription and restart. > (3-2) > > + "Timed out while waiting for apply to restart after changing min_apply_delay > to non-zero value"; > > Probably, we can partly change this sentence like below, because we check > walsender's pid. > FROM: "... while waiting for apply to restart..." > TO: "... while waiting for the walsender to restart..." Right, fixed. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Sawada-san, Thank you for reviewing! > +/* > + * Options for controlling the behavior of the walsender. Options can be > + * specified in the START_STREAMING replication command. Currently only one > + * option is allowed. > + */ > +typedef struct > +{ > + WalSndShutdownMode shutdown_mode; > +} WalSndOptions; > + > +static WalSndOptions *my_options = NULL; > > I'm not sure we need to have it as a struct at this stage since we > support only one option. I wonder if we can have one value, say > shutdown_mode, and we can make it a struct when we really need it. > Even if we use WalSndOptions struct, I don't think we need to > dynamically allocate it. Since a walsender can start logical > replication multiple times in principle, my_options is not freed. +1, removed the structure. > --- > +/* > + * Parse given shutdown mode. > + * > + * Currently two values are accepted - "wait_flush" and "immediate" > + */ > +static void > +ParseShutdownMode(char *shutdownmode) > +{ > + if (pg_strcasecmp(shutdownmode, "wait_flush") == 0) > + my_options->shutdown_mode = > WALSND_SHUTDOWN_MODE_WAIT_FLUSH; > + else if (pg_strcasecmp(shutdownmode, "immediate") == 0) > + my_options->shutdown_mode = > WALSND_SHUTDOWN_MODE_IMMIDEATE; > + else > + ereport(ERROR, > + errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("SHUTDOWN_MODE requires > \"wait_flush\" or \"immediate\"")); > +} > > I think we should make the error message consistent with other enum > parameters. How about the message like: > > ERROR: invalid value shutdown mode: "%s" Modified like enum parameters and hint message was also provided. New patch is attached in [1]. [1]: https://www.postgresql.org/message-id/TYAPR01MB586683FC450662990E356A0EF5D99%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Here are my review comments for the v6-0002 patch. ====== Commit Message 1. This commit extends START_REPLICATION to accept SHUTDOWN_MODE term. Currently, it works well only for logical replication. ~ 1a. "to accept SHUTDOWN term" --> "to include a SHUTDOWN_MODE clause." ~ 1b. "it works well only for..." --> do you mean "it is currently implemented only for..." ~~~ 2. When 'wait_flush', which is the default, is specified, the walsender will wait for all the sent WALs to be flushed on the subscriber side, before exiting the process. 'immediate' will exit without confirming the remote flush. This may break the consistency between publisher and subscriber, but it may be useful for a system that has a high-latency network to reduce the amount of time for shutdown. This may be useful to shut down the publisher even when the worker is stuck. ~ SUGGESTION The shutdown modes are: 1) 'wait_flush' (the default). In this mode, the walsender will wait for all the sent WALs to be flushed on the subscriber side, before exiting the process. 2) 'immediate'. In this mode, the walsender will exit without confirming the remote flush. This may break the consistency between publisher and subscriber. This mode might be useful for a system that has a high-latency network (to reduce the amount of time for shutdown), or to allow the shutdown of the publisher even when the worker is stuck. ====== doc/src/sgml/protocol.sgml 3. + <varlistentry> + <term><literal>SHUTDOWN_MODE { 'wait_flush' | 'immediate' }</literal></term> + <listitem> + <para> + Decides the behavior of the walsender process at shutdown. If the + shutdown mode is <literal>'wait_flush'</literal>, which is the + default, the walsender waits for all the sent WALs to be flushed + on the subscriber side. If it is <literal>'immediate'</literal>, + the walsender exits without confirming the remote flush. + </para> + </listitem> + </varlistentry> The synopsis said: [ SHUTDOWN_MODE shutdown_mode ] But then the 'shutdown_mode' term was never mentioned again (??). Instead it says: SHUTDOWN_MODE { 'wait_flush' | 'immediate' } IMO the detailed explanation should not say SHUTDOWN_MODE again. It should be writtenmore like this: SUGGESTION shutdown_mode Determines the behavior of the walsender process at shutdown. If shutdown_mode is 'wait_flush', the walsender waits for all the sent WALs to be flushed on the subscriber side. This is the default when SHUTDOWN_MODE is not specified. If shutdown_mode is 'immediate', the walsender exits without confirming the remote flush. ====== .../libpqwalreceiver/libpqwalreceiver.c 4. + /* Add SHUTDOWN_MODE option if needed */ + if (options->shutdown_mode && + PQserverVersion(conn->streamConn) >= 160000) + appendStringInfo(&cmd, " SHUTDOWN_MODE '%s'", + options->shutdown_mode); Maybe you can expand on the meaning of "if needed". SUGGESTION Add SHUTDOWN_MODE clause if needed (i.e. if not using the default shutdown_mode) ====== src/backend/replication/logical/worker.c 5. maybe_reread_subscription + * + * minapplydelay affects SHUTDOWN_MODE option. 'immediate' shutdown mode + * will be specified if it is set to non-zero, otherwise default mode will + * be set. Reworded this comment slightly and give a reference to ApplyWorkerMain. SUGGESTION Time-delayed logical replication affects the SHUTDOWN_MODE clause. The 'immediate' shutdown mode will be specified if min_apply_delay is non-zero, otherwise the default shutdown mode will be used. See ApplyWorkerMain. ~~~ 6. ApplyWorkerMain + /* + * time-delayed logical replication does not support tablesync + * workers, so only the leader apply worker can request walsenders to + * exit before confirming remote flush. + */ "time-delayed" --> "Time-delayed" ====== src/backend/replication/repl_gram.y 7. @@ -91,6 +92,7 @@ Node *replication_parse_result; %type <boolval> opt_temporary %type <list> create_slot_options create_slot_legacy_opt_list %type <defelt> create_slot_legacy_opt +%type <str> opt_shutdown_mode The tab alignment seemed not quite right. Not 100% sure. ~~~ 8. @@ -270,20 +272,22 @@ start_replication: cmd->slotname = $2; cmd->startpoint = $4; cmd->timeline = $5; + cmd->shutdownmode = NULL; $$ = (Node *) cmd; } It seemed a bit inconsistent. E.g. the cmd->options member was not set for physical replication, so why then set this member? Alternatively, maybe should set cmd->options = NULL here as well? ====== src/backend/replication/walsender.c 9. +/* Indicator for specifying the shutdown mode */ +typedef enum +{ + WALSND_SHUTDOWN_MODE_WAIT_FLUSH = 0, + WALSND_SHUTDOWN_MODE_IMMIDEATE +} WalSndShutdownMode; ~ 9a. "Indicator for specifying" (??). How about just saying: "Shutdown modes" ~ 9b. Typo: WALSND_SHUTDOWN_MODE_IMMIDEATE ==> WALSND_SHUTDOWN_MODE_IMMEDIATE ~ 9c. AFAICT the fact that the first enum value is assigned 0 is not really of importance. If that is correct, then IMO maybe it's better to remove the "= 0" because the explicit assignment made me expect that it had special meaning, and then it was confusing when I could not find a reason. ~~~ 10. ProcessPendingWrites + /* + * In this function, there is a possibility that the walsender is + * stuck. It is caused when the opposite worker is stuck and then the + * send-buffer of the walsender becomes full. Therefore, we must add + * an additional path for shutdown for immediate shutdown mode. + */ + if (shutdown_mode == WALSND_SHUTDOWN_MODE_IMMIDEATE && + got_STOPPING) + WalSndDone(XLogSendLogical); 10a. Can this comment say something like "receiving worker" instead of "opposite worker"? SUGGESTION This can happen when the receiving worker is stuck, and then the send-buffer of the walsender... ~ 10b. IMO it makes more sense to check this around the other way. E.g. we don't care what is the shutdown_mode value unless got_STOPPING is true. SUGGESTION if (got_STOPPING && (shutdown_mode == WALSND_SHUTDOWN_MODE_IMMEDIATE)) ~~~ 11. WalSndDone + * If we are in the immediate shutdown mode, flush location and output + * buffer is not checked. This may break the consistency between nodes, + * but it may be useful for the system that has high-latency network to + * reduce the amount of time for shutdown. Add some quotes for the mode. SUGGESTION 'immediate' shutdown mode ~~~ 12. +/* + * Check options for walsender itself and set flags accordingly. + * + * Currently only one option is accepted. + */ +static void +CheckWalSndOptions(const StartReplicationCmd *cmd) +{ + if (cmd->shutdownmode) + ParseShutdownMode(cmd->shutdownmode); +} + +/* + * Parse given shutdown mode. + * + * Currently two values are accepted - "wait_flush" and "immediate" + */ +static void +ParseShutdownMode(char *shutdownmode) +{ + if (pg_strcasecmp(shutdownmode, "wait_flush") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; + else if (pg_strcasecmp(shutdownmode, "immediate") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_IMMIDEATE; + else + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid value for shutdown mode: \"%s\"", shutdownmode), + errhint("Available values: wait_flush, immediate.")); +} IMO the ParseShutdownMode function seems unnecessary because it's not really "parsing" anything and it is only called in one place. I suggest wrapping everything into the CheckWalSndOptions function. The end result is still only a simple function: SUGGESTION static void CheckWalSndOptions(const StartReplicationCmd *cmd) { if (cmd->shutdownmode) { char *mode = cmd->shutdownmode; if (pg_strcasecmp(mode, "wait_flush") == 0) shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; else if (pg_strcasecmp(mode, "immediate") == 0) shutdown_mode = WALSND_SHUTDOWN_MODE_IMMEDIATE; else ereport(ERROR, errcode(ERRCODE_SYNTAX_ERROR), errmsg("invalid value for shutdown mode: \"%s\"", mode), errhint("Available values: wait_flush, immediate.")); } } ====== src/include/replication/walreceiver.h 13. @@ -170,6 +170,7 @@ typedef struct * false if physical stream. */ char *slotname; /* Name of the replication slot or NULL. */ XLogRecPtr startpoint; /* LSN of starting point. */ + char *shutdown_mode; /* Name of specified shutdown name */ union { ~ 13a. Typo (shutdown name?) SUGGESTION /* The specified shutdown mode string, or NULL. */ ~ 13b. Because they have the same member names I kept confusing this option shutdown_mode with the other enum also called shutdown_mode. I wonder if is it possible to call this one something like 'shutdown_mode_str' to make reading the code easier? ~ 13c. Is this member in the right place? AFAIK this is not even implemented for physical replication. e.g. Why isn't this new member part of the 'logical' sub-structure in the union? ====== src/test/subscription/t/001_rep_changes.pl 14. -# Set min_apply_delay parameter to 3 seconds +# Check restart on changing min_apply_delay to 3 seconds my $delay = 3; $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay = '${delay}s')"); +$node_publisher->poll_query_until('postgres', + "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub_renamed' AND state = 'streaming';" + ) + or die + "Timed out while waiting for the walsender to restart after changing min_apply_delay to non-zero value"; IIUC this test is for verifying that a new walsender worker was started if the delay was changed from 0 to non-zero. E.g. I think it is for it is testing your new logic of the maybe_reread_subscription. Probably more complete testing also needs to check the other scenarios: * min_apply_delay from one non-zero value to another non-zero value --> verify a new worker is NOT started. * change min_apply_delay from non-zero to zero --> verify a new worker IS started ------ Kind Regards, Peter Smith. Fujitsu Australia
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Peter, Thank you for reviewing! PSA new version. > ====== > Commit Message > > 1. > This commit extends START_REPLICATION to accept SHUTDOWN_MODE term. > Currently, > it works well only for logical replication. > > ~ > > 1a. > "to accept SHUTDOWN term" --> "to include a SHUTDOWN_MODE clause." Fixed. > 1b. > "it works well only for..." --> do you mean "it is currently > implemented only for..." Fixed. > 2. > When 'wait_flush', which is the default, is specified, the walsender will wait > for all the sent WALs to be flushed on the subscriber side, before exiting the > process. 'immediate' will exit without confirming the remote flush. This may > break the consistency between publisher and subscriber, but it may be useful > for a system that has a high-latency network to reduce the amount of time for > shutdown. This may be useful to shut down the publisher even when the > worker is stuck. > > ~ > > SUGGESTION > The shutdown modes are: > > 1) 'wait_flush' (the default). In this mode, the walsender will wait > for all the sent WALs to be flushed on the subscriber side, before > exiting the process. > > 2) 'immediate'. In this mode, the walsender will exit without > confirming the remote flush. This may break the consistency between > publisher and subscriber. This mode might be useful for a system that > has a high-latency network (to reduce the amount of time for > shutdown), or to allow the shutdown of the publisher even when the > worker is stuck. > > ====== > doc/src/sgml/protocol.sgml > > 3. > + <varlistentry> > + <term><literal>SHUTDOWN_MODE { 'wait_flush' | 'immediate' > }</literal></term> > + <listitem> > + <para> > + Decides the behavior of the walsender process at shutdown. If the > + shutdown mode is <literal>'wait_flush'</literal>, which is the > + default, the walsender waits for all the sent WALs to be flushed > + on the subscriber side. If it is <literal>'immediate'</literal>, > + the walsender exits without confirming the remote flush. > + </para> > + </listitem> > + </varlistentry> > > The synopsis said: > [ SHUTDOWN_MODE shutdown_mode ] > > But then the 'shutdown_mode' term was never mentioned again (??). > Instead it says: > SHUTDOWN_MODE { 'wait_flush' | 'immediate' } > > IMO the detailed explanation should not say SHUTDOWN_MODE again. It > should be writtenmore like this: > > SUGGESTION > shutdown_mode > > Determines the behavior of the walsender process at shutdown. If > shutdown_mode is 'wait_flush', the walsender waits for all the sent > WALs to be flushed on the subscriber side. This is the default when > SHUTDOWN_MODE is not specified. > > If shutdown_mode is 'immediate', the walsender exits without > confirming the remote flush. Fixed. > .../libpqwalreceiver/libpqwalreceiver.c > > 4. > + /* Add SHUTDOWN_MODE option if needed */ > + if (options->shutdown_mode && > + PQserverVersion(conn->streamConn) >= 160000) > + appendStringInfo(&cmd, " SHUTDOWN_MODE '%s'", > + options->shutdown_mode); > > Maybe you can expand on the meaning of "if needed". > > SUGGESTION > Add SHUTDOWN_MODE clause if needed (i.e. if not using the default > shutdown_mode) Fixed, but not completely same as your suggestion. > src/backend/replication/logical/worker.c > > 5. maybe_reread_subscription > > + * > + * minapplydelay affects SHUTDOWN_MODE option. 'immediate' shutdown > mode > + * will be specified if it is set to non-zero, otherwise default mode will > + * be set. > > Reworded this comment slightly and give a reference to ApplyWorkerMain. > > SUGGESTION > Time-delayed logical replication affects the SHUTDOWN_MODE clause. The > 'immediate' shutdown mode will be specified if min_apply_delay is > non-zero, otherwise the default shutdown mode will be used. See > ApplyWorkerMain. Fixed. > 6. ApplyWorkerMain > + /* > + * time-delayed logical replication does not support tablesync > + * workers, so only the leader apply worker can request walsenders to > + * exit before confirming remote flush. > + */ > > "time-delayed" --> "Time-delayed" Fixed. > src/backend/replication/repl_gram.y > > 7. > @@ -91,6 +92,7 @@ Node *replication_parse_result; > %type <boolval> opt_temporary > %type <list> create_slot_options create_slot_legacy_opt_list > %type <defelt> create_slot_legacy_opt > +%type <str> opt_shutdown_mode > > The tab alignment seemed not quite right. Not 100% sure. Fixed accordingly. > 8. > @@ -270,20 +272,22 @@ start_replication: > cmd->slotname = $2; > cmd->startpoint = $4; > cmd->timeline = $5; > + cmd->shutdownmode = NULL; > $$ = (Node *) cmd; > } > > It seemed a bit inconsistent. E.g. the cmd->options member was not set > for physical replication, so why then set this member? > > Alternatively, maybe should set cmd->options = NULL here as well? Removed. I checked makeNode() macro, found that palloc0fast() is called there. This means that we do not have to initialize unused attributes. > src/backend/replication/walsender.c > > 9. > +/* Indicator for specifying the shutdown mode */ > +typedef enum > +{ > + WALSND_SHUTDOWN_MODE_WAIT_FLUSH = 0, > + WALSND_SHUTDOWN_MODE_IMMIDEATE > +} WalSndShutdownMode; > > ~ > > 9a. > "Indicator for specifying" (??). How about just saying: "Shutdown modes" Fixed. > 9b. > Typo: WALSND_SHUTDOWN_MODE_IMMIDEATE ==> > WALSND_SHUTDOWN_MODE_IMMEDIATE Replaced. > 9c. > AFAICT the fact that the first enum value is assigned 0 is not really > of importance. If that is correct, then IMO maybe it's better to > remove the "= 0" because the explicit assignment made me expect that > it had special meaning, and then it was confusing when I could not > find a reason. Removed. This was added for skipping the initialization for previous version, but no longer needed. > 10. ProcessPendingWrites > > + /* > + * In this function, there is a possibility that the walsender is > + * stuck. It is caused when the opposite worker is stuck and then the > + * send-buffer of the walsender becomes full. Therefore, we must add > + * an additional path for shutdown for immediate shutdown mode. > + */ > + if (shutdown_mode == WALSND_SHUTDOWN_MODE_IMMIDEATE && > + got_STOPPING) > + WalSndDone(XLogSendLogical); > > 10a. > Can this comment say something like "receiving worker" instead of > "opposite worker"? > > SUGGESTION > This can happen when the receiving worker is stuck, and then the > send-buffer of the walsender... Changed. > 10b. > IMO it makes more sense to check this around the other way. E.g. we > don't care what is the shutdown_mode value unless got_STOPPING is > true. > > SUGGESTION > if (got_STOPPING && (shutdown_mode == > WALSND_SHUTDOWN_MODE_IMMEDIATE)) Changed. > 11. WalSndDone > > + * If we are in the immediate shutdown mode, flush location and output > + * buffer is not checked. This may break the consistency between nodes, > + * but it may be useful for the system that has high-latency network to > + * reduce the amount of time for shutdown. > > Add some quotes for the mode. > > SUGGESTION > 'immediate' shutdown mode Changed. > 12. > +/* > + * Check options for walsender itself and set flags accordingly. > + * > + * Currently only one option is accepted. > + */ > +static void > +CheckWalSndOptions(const StartReplicationCmd *cmd) > +{ > + if (cmd->shutdownmode) > + ParseShutdownMode(cmd->shutdownmode); > +} > + > +/* > + * Parse given shutdown mode. > + * > + * Currently two values are accepted - "wait_flush" and "immediate" > + */ > +static void > +ParseShutdownMode(char *shutdownmode) > +{ > + if (pg_strcasecmp(shutdownmode, "wait_flush") == 0) > + shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; > + else if (pg_strcasecmp(shutdownmode, "immediate") == 0) > + shutdown_mode = WALSND_SHUTDOWN_MODE_IMMIDEATE; > + else > + ereport(ERROR, > + errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("invalid value for shutdown mode: \"%s\"", shutdownmode), > + errhint("Available values: wait_flush, immediate.")); > +} > > IMO the ParseShutdownMode function seems unnecessary because it's not > really "parsing" anything and it is only called in one place. I > suggest wrapping everything into the CheckWalSndOptions function. The > end result is still only a simple function: > > SUGGESTION > > static void > CheckWalSndOptions(const StartReplicationCmd *cmd) > { > if (cmd->shutdownmode) > { > char *mode = cmd->shutdownmode; > > if (pg_strcasecmp(mode, "wait_flush") == 0) > shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; > else if (pg_strcasecmp(mode, "immediate") == 0) > shutdown_mode = WALSND_SHUTDOWN_MODE_IMMEDIATE; > > else > ereport(ERROR, > errcode(ERRCODE_SYNTAX_ERROR), > errmsg("invalid value for shutdown mode: \"%s\"", mode), > errhint("Available values: wait_flush, immediate.")); > } > } Removed. > ====== > src/include/replication/walreceiver.h > > 13. > @@ -170,6 +170,7 @@ typedef struct > * false if physical stream. */ > char *slotname; /* Name of the replication slot or NULL. */ > XLogRecPtr startpoint; /* LSN of starting point. */ > + char *shutdown_mode; /* Name of specified shutdown name */ > > union > { > ~ > > 13a. > Typo (shutdown name?) > > SUGGESTION > /* The specified shutdown mode string, or NULL. */ Fixed. > 13b. > Because they have the same member names I kept confusing this option > shutdown_mode with the other enum also called shutdown_mode. > > I wonder if is it possible to call this one something like > 'shutdown_mode_str' to make reading the code easier? Changed. > 13c. > Is this member in the right place? AFAIK this is not even implemented > for physical replication. e.g. Why isn't this new member part of the > 'logical' sub-structure in the union? I remained for future extendibility, but it seemed not to be needed. Moved. > ====== > src/test/subscription/t/001_rep_changes.pl > > 14. > -# Set min_apply_delay parameter to 3 seconds > +# Check restart on changing min_apply_delay to 3 seconds > my $delay = 3; > $node_subscriber->safe_psql('postgres', > "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay = > '${delay}s')"); > +$node_publisher->poll_query_until('postgres', > + "SELECT pid != $oldpid FROM pg_stat_replication WHERE > application_name = 'tap_sub_renamed' AND state = 'streaming';" > + ) > + or die > + "Timed out while waiting for the walsender to restart after > changing min_apply_delay to non-zero value"; > > IIUC this test is for verifying that a new walsender worker was > started if the delay was changed from 0 to non-zero. E.g. I think it > is for it is testing your new logic of the maybe_reread_subscription. > > Probably more complete testing also needs to check the other scenarios: > * min_apply_delay from one non-zero value to another non-zero value > --> verify a new worker is NOT started. > * change min_apply_delay from non-zero to zero --> verify a new worker > IS started Hmm. These tests do not improve the coverage, so not sure we should test or not. Moreover, IIUC we do not have a good way to verify that the worker does not restart. Even if the old pid is remained in the pg_stat_replication, there is a possibility that walsender exits after that. So currently I added only the case that change min_apply_delay from non-zero to zero. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Fri, Feb 10, 2023 at 5:24 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > Can't we have this option just as a bool (like shutdown_immediate)? Why do we want to keep multiple modes? -- With Regards, Amit Kapila.
RE: Exit walsender before confirming remote flush in logical replication
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, > Can't we have this option just as a bool (like shutdown_immediate)? > Why do we want to keep multiple modes? Of course we can use boolean instead, but current style is motivated by the post[1]. This allows to add another option in future, whereas I do not have idea now. I want to ask other reviewers which one is better... [1]: https://www.postgresql.org/message-id/20230208.112717.1140830361804418505.horikyota.ntt%40gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
At Fri, 10 Feb 2023 12:40:43 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in > Dear Amit, > > > Can't we have this option just as a bool (like shutdown_immediate)? > > Why do we want to keep multiple modes? > > Of course we can use boolean instead, but current style is motivated by the post[1]. > This allows to add another option in future, whereas I do not have idea now. > > I want to ask other reviewers which one is better... > > [1]: https://www.postgresql.org/message-id/20230208.112717.1140830361804418505.horikyota.ntt%40gmail.com IMHO I vaguely don't like that we lose a means to specify the default behavior here. And I'm not sure we definitely don't need other than flush and immedaite for both physical and logical replication. If it's not the case, I don't object to make it a Boolean. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Feb 13, 2023 at 7:26 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Fri, 10 Feb 2023 12:40:43 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in > > Dear Amit, > > > > > Can't we have this option just as a bool (like shutdown_immediate)? > > > Why do we want to keep multiple modes? > > > > Of course we can use boolean instead, but current style is motivated by the post[1]. > > This allows to add another option in future, whereas I do not have idea now. > > > > I want to ask other reviewers which one is better... > > > > [1]: https://www.postgresql.org/message-id/20230208.112717.1140830361804418505.horikyota.ntt%40gmail.com > > IMHO I vaguely don't like that we lose a means to specify the default > behavior here. And I'm not sure we definitely don't need other than > flush and immedaite for both physical and logical replication. > If we can think of any use case that requires its extension then it makes sense to make it a non-boolean option but otherwise, let's keep things simple by having a boolean option. > If it's > not the case, I don't object to make it a Boolean. > Thanks. -- With Regards, Amit Kapila.
Here are some comments for patch v7-0002. ====== Commit Message 1. This commit extends START_REPLICATION to accept SHUTDOWN_MODE clause. It is currently implemented only for logical replication. ~ "to accept SHUTDOWN_MODE clause." --> "to accept a SHUTDOWN_MODE clause." ====== doc/src/sgml/protocol.sgml 2. START_REPLICATION SLOT slot_name LOGICAL XXX/XXX [ SHUTDOWN_MODE { 'wait_flush' | 'immediate' } ] [ ( option_name [ option_value ] [, ...] ) ] ~ IMO this should say shutdown_mode as it did before: START_REPLICATION SLOT slot_name LOGICAL XXX/XXX [ SHUTDOWN_MODE shutdown_mode ] [ ( option_name [ option_value ] [, ...] ) ] ~~~ 3. + <varlistentry> + <term><literal>shutdown_mode</literal></term> + <listitem> + <para> + Determines the behavior of the walsender process at shutdown. If + shutdown_mode is <literal>'wait_flush'</literal>, the walsender waits + for all the sent WALs to be flushed on the subscriber side. This is + the default when SHUTDOWN_MODE is not specified. If shutdown_mode is + <literal>'immediate'</literal>, the walsender exits without + confirming the remote flush. + </para> + </listitem> + </varlistentry> Is the font of the "shutdown_mode" correct? I expected it to be like the others (e.g. slot_name) ====== src/backend/replication/walsender.c 4. +static void +CheckWalSndOptions(const StartReplicationCmd *cmd) +{ + if (cmd->shutdownmode) + { + char *mode = cmd->shutdownmode; + + if (pg_strcasecmp(mode, "wait_flush") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; + else if (pg_strcasecmp(mode, "immediate") == 0) + shutdown_mode = WALSND_SHUTDOWN_MODE_IMMEDIATE; + else + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid value for shutdown mode: \"%s\"", mode), + errhint("Available values: wait_flush, immediate.")); + } + +} Unnecessary extra whitespace at end of the function. ====== src/include/nodes/replnodes. 5. @@ -83,6 +83,7 @@ typedef struct StartReplicationCmd char *slotname; TimeLineID timeline; XLogRecPtr startpoint; + char *shutdownmode; List *options; } StartReplicationCmd; IMO I those the last 2 members should have a comment something like: /* Only for logical replication */ because that will make it more clear why sometimes they are assigned and sometimes they are not. ====== src/include/replication/walreceiver.h 6. Should the protocol version be bumped (and documented) now that the START REPLICATION supports a new extended syntax? Or is that done only for messages sent by pgoutput? ------ Kind Regards, Peter Smith. Fujitsu Australia.
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
At Mon, 13 Feb 2023 08:27:01 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Mon, Feb 13, 2023 at 7:26 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > IMHO I vaguely don't like that we lose a means to specify the default > > behavior here. And I'm not sure we definitely don't need other than > > flush and immedaite for both physical and logical replication. > > > > If we can think of any use case that requires its extension then it > makes sense to make it a non-boolean option but otherwise, let's keep > things simple by having a boolean option. What do you think about the need for explicitly specifying the default? I'm fine with specifying the default using a single word, such as WAIT_FOR_REMOTE_FLUSH. > > If it's > > not the case, I don't object to make it a Boolean. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2023-02-14 10:05:40 +0900, Kyotaro Horiguchi wrote: > What do you think about the need for explicitly specifying the > default? I'm fine with specifying the default using a single word, > such as WAIT_FOR_REMOTE_FLUSH. We obviously shouldn't force the option to be present. Why would we want to break existing clients unnecessarily? Without it the behaviour should be unchanged from today's.
Re: Exit walsender before confirming remote flush in logical replication
From
Kyotaro Horiguchi
Date:
At Mon, 13 Feb 2023 17:13:43 -0800, Andres Freund <andres@anarazel.de> wrote in > On 2023-02-14 10:05:40 +0900, Kyotaro Horiguchi wrote: > > What do you think about the need for explicitly specifying the > > default? I'm fine with specifying the default using a single word, > > such as WAIT_FOR_REMOTE_FLUSH. > > We obviously shouldn't force the option to be present. Why would we want to > break existing clients unnecessarily? Without it the behaviour should be > unchanged from today's. I didn't suggest making the option mandatory. I just suggested providing a way to specify the default value explicitly, like in the recent commit 746915c686. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Exit walsender before confirming remote flush in logical replication
From
Greg Sabino Mullane
Date:
Thanks for everyone's work on this, I am very interested in it getting into a release. What is the status of this?
My use case is Patroni - when it needs to do a failover, it shuts down the primary. However, large transactions can cause it to stay in the "shutting down" state for a long time, which means your entire HA system is now non-functional. I like the idea of a new flag. I'll test this out soon if the original authors want to make a rebased patch. This thread is old, so if I don't hear back in a bit, I'll create and test a new one myself. :)
Cheers,
Greg