Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Time delayed LR (WAS Re: logical replication restrictions) |
Date | |
Msg-id | CAA4eK1LyetktcphdRrufHac4t5DGyhsS2xG2DSOGb7OaOVcDVg@mail.gmail.com Whole thread Raw |
In response to | RE: Time delayed LR (WAS Re: logical replication restrictions) ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
RE: Time delayed LR (WAS Re: logical replication restrictions)
|
List | pgsql-hackers |
On Thu, Dec 15, 2022 at 1:42 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Horiguchi-san, Amit, > > > > Yes, that would be ideal. But do you know why that is a must? > > > > I believe a graceful shutdown (fast and smart) of a replication set is expected to > > be in sync. Of course we can change the policy to allow walsnder to stop before > > confirming all WAL have been applied. However walsender doesn't have an idea > > of wheter the peer is intentionally delaying or not. > > This mechanism was introduced by 985bd7[1], which was needed to support a > "clean" switchover. I think it is needed for physical replication, but it is not > clear for the logical case. > > When the postmaster is stopped in fast or smart mode, we expected that all > modifications were received by secondary. This requirement seems to be not changed > from the initial commit. > > Before 985bd7, the walsender exited just after sending the final WAL, which meant > that sometimes the last packet could not reach to secondary. So there was a possibility > of failing to reboot the primary as a new secondary because the new primary does > not have the last WAL record. To avoid the above walsender started waiting for > flush before exiting. > > But in the case of logical replication, I'm not sure whether this limitation is > really needed or not. I think it may be OK that walsender exits without waiting, > in case of delaying applies. Because we don't have to consider the above issue > for logical replication. > I also don't see the need for this mechanism for logical replication, and in fact, why do we need to even wait for sending the existing WAL? I think the reason why we don't need to wait for logical replication is that after the restart, we always start sending WAL from the location requested by the subscriber, or till the point where the publisher knows the confirmed flush location of the subscriber. Consider another case where after restart publisher (node-1) wants to act as a subscriber for the previous subscriber (node-2). Now, the new subscriber (node-1) won't have a way to tell the new publisher (node-2) that starts from the location where the node-1 went down as WAL locations between publisher and subscriber need not be same. This brings us to the question of whether users can use logical replication for the scenario where they want the old master to follow the new master after the restart which we typically do in physical replication, if so how? Another related point to consider is what is the behavior of synchronous replication when shutdown has been performed both in the case of physical and logical replication especially when the time-delayed replication feature is enabled? -- With Regards, Amit Kapila.
pgsql-hackers by date: