Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAA4eK1+tZNzxee-+=3jues8NX5ACpwNHmdtHFdwtryxsg47QEA@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > There is an intermittent BF failure observed at [1] after this commit (2ec005b). > > > > Thanks for analyzing and providing the patch. I'll look into it. There > is another BF failure [1] which I have analyzed. The main reason for > failure is the following test: > > # Failed test 'logical slots have synced as true on standby' > # at /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl > line 198. > # got: 'f' > # expected: 't' > > Here, we are expecting that the two logical slots (lsub1_slot, and > lsub2_slot), one created via subscription and another one via API > pg_create_logical_replication_slot() are synced. The standby LOGs > which are as follows show that the one created by API 'lsub2_slot' is > synced but the other one 'lsub1_slot': > > LOG for lsub1_slot: > ================ > 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL: > Streaming transactions committing after 0/0, reading WAL from > 0/3000060. > 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] > STATEMENT: SELECT pg_sync_replication_slots(); > 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG: > xmin required by slots: data 0, catalog 740 > 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG: > could not sync slot "lsub1_slot" > > LOG for lsub2_slot: > ================ > 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG: > xmin required by slots: data 0, catalog 740 > 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG: > newly created slot "lsub2_slot" is sync-ready now > 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] > STATEMENT: SELECT pg_sync_replication_slots(); > > We can see from the log of lsub1_slot that its restart_lsn is > 0/3000060 which means it will start reading from the WAL from that > location. Now, if we check the publisher log, we have running_xacts > record at that location. See following LOGs: > > 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG: > statement: SELECT pg_create_logical_replication_slot('lsub2_slot', > 'test_decoding', false, false, true); > 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG: > snapshot of 0+0 running transaction ids (lsn 0/3000060 oldest xid 740 > latest complete 739 next xid 740) > .... > .... > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: > snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740 > latest complete 739 next xid 740) > > The first running_xact record ends at 3000060 and the second one at > 3000098. So, the start location of the second running_xact is 3000060, > the same can be confirmed by the following LOG line of walsender: > > 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG: > serializing snapshot to pg_logical/snapshots/0-3000060.snap > > This shows that while processing running_xact at location 3000060, we > have serialized the snapshot. As there is no running transaction in > WAL at 3000060 so ideally we should have reached a consistent state > after processing that record on standby. But the reason standby didn't > process that LOG is that the confirmed_flush LSN is also at the same > location so the function LogicalSlotAdvanceAndCheckSnapState() exits > without reading the WAL at that location. Now, this can be confirmed > by the below walsender-specific LOG in publisher: > > 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write > 0/3000060 flush 0/3000060 apply 0/3000060 reply_time 2024-04-05 > 04:36:59.155181+00 > > We update the confirmed_flush location with the flush location after > receiving the above feedback. You can notice that we didn't receive > the feedback for the 3000098 location and hence both the > confirmed_flush and restart_lsn are at the same location 0/3000060. > Now, the test is waiting for the subscriber to send feedback of the > last WAL write location by > $primary->wait_for_catchup('regress_mysub1'); As noticed from the > publisher LOGs, the query we used for wait is: > > SELECT '0/3000060' <= replay_lsn AND state = 'streaming' > FROM pg_catalog.pg_stat_replication > WHERE application_name IN ('regress_mysub1', 'walreceiver') > > Here, instead of '0/3000060' it should have used ''0/3000098' which is > the last write location. This position we get via function > pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And > this variable seems to be touched by commit > c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At > this stage, I am not sure so just sharing with others to see if what I > am saying sounds logical. I'll think more about this. > Thinking more on this, it doesn't seem related to c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change any locking or something like that which impacts write positions. I think what has happened here is that running_xact record written by the background writer [1] is not written to the kernel or disk (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the current_lsn to be compared with replayed LSN. Note that the reason why walsender has picked the running_xact written by background writer is because it has checked after pg_current_wal_lsn() query, see LOGs [2]. I think we can probably try to reproduce manually via debugger. If this theory is correct then I think we will need to use injection points to control the behavior of bgwriter or use the slots created via SQL API for syncing in tests. Thoughts? [1] - 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740 latest complete 739 next xid 740) [2] - 2024-04-05 04:37:05.134 UTC [3866413][client backend][1/4:0] LOG: statement: SELECT pg_current_wal_lsn() 2024-04-05 04:37:05.144 UTC [3866413][client backend][:0] LOG: disconnection: session time: 0:00:00.021 user=bf database=postgres host=[local] 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG: serializing snapshot to pg_logical/snapshots/0-3000060.snap -- With Regards, Amit Kapila.
pgsql-hackers by date: