RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Synchronizing slots from primary to standby |
Date | |
Msg-id | OS0PR01MB5716D800F17302FD5289C7E1943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Thursday, March 28, 2024 7:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > When analyzing one BF error[1], we find an issue of slotsync: Since we > > don't perform logical decoding for the synced slots when syncing the > > lsn/xmin of slot, no logical snapshots will be serialized to disk. So, > > when user starts to use these synced slots after promotion, it needs > > to re-build the consistent snapshot from the restart_lsn if the > > WAL(xl_running_xacts) at restart_lsn position indicates that there are > > running transactions. This however could cause the data that before the > consistent point to be missed[2]. > > > > This issue doesn't exist on the primary because the snapshot at > > restart_lsn should have been serialized to disk > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the > > logical decoding restarts, it can find consistent snapshot immediately at > restart_lsn. > > > > To fix this, we could use the fast forward logical decoding to advance > > the synced slot's lsn/xmin when syncing these values instead of > > directly updating the slot's info. This way, the snapshot will be serialized to > disk when decoding. > > If we could not reach to the consistent point at the remote > > restart_lsn, the slot is marked as temp and will be persisted once it > > reaches the consistent point. I am still analyzing the fix and will share once > ready. > > > > Yes, we can use this but one thing to note is that > CreateDecodingContext() will expect that the slot's and current database are > the same. I think the reason for that is we need to check system tables of the > current database while decoding and sending data to the output_plugin which > won't be a requirement for the fast_forward case. So, we need to skip that > check in fast_forward mode. Agreed. > > Next, I was thinking about the case of the first time updating the restart and > confirmed_flush LSN while syncing the slots. I think we can keep the current > logic as it is based on the following analysis. > > For each logical slot, cases possible on the primary: > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached the > consistent point. > 2. The restart_lsn doesn't have a serialized snapshot but has reached a > consistent point. > 3. The restart_lsn has a serialized snapshot which means it has reached a > consistent point as well. > > Considering we keep the logic to reserve initial WAL positions the same as the > current (Reserve WAL for the currently active local slot using the specified WAL > location (restart_lsn). If the given WAL location has been removed, reserve > WAL using the oldest existing WAL segment.), I could think of the below > scenarios: > A. For 1, we shouldn't sync the slot as it still wouldn't have been marked > persistent on the primary. > B. For 2, we would sync the slot > B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by calling > pg_logical_replication_slot_advance(). > B11. If we reach consistent point, then it should be okay because after > promotion as well we should reach consistent point. > B111. But again is it possible that there is some xact that comes > before consistent_point on primary and the same is after consistent_point on > standby? This shouldn't matter as we will start decoding transactions after > confirmed_flush_lsn which would be the same on primary and standby. > B22. If we haven't reached consistent_point, then we won't mark the slot > as persistent, and at the next sync we will do the same till it reaches > consistent_point. At that time, the situation will be similar to B11. > B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the next > sync cycle and keep the slot as temporary. Once in the next or some > consecutive sync cycle, we reach the condition remote_restart_lsn >= > local_restart_lsn, we will proceed to advance the slot and we should have the > same behavior as B1. > C. For 3, we would sync the slot, but the behavior should be the same as B. > > Thoughts? Looks reasonable to me. Here is the patch based on above lines. I am also testing and verifying the patch locally. Best Regards, Hou zj
Attachment
pgsql-hackers by date: