Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1+tZNzxee-+=3jues8NX5ACpwNHmdtHFdwtryxsg47QEA@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > There is an intermittent BF failure observed at [1] after this commit (2ec005b).
> >
>
> Thanks for analyzing and providing the patch. I'll look into it. There
> is another BF failure [1] which I have analyzed. The main reason for
> failure is the following test:
>
> #   Failed test 'logical slots have synced as true on standby'
> #   at /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
> line 198.
> #          got: 'f'
> #     expected: 't'
>
> Here, we are expecting that the two logical slots (lsub1_slot, and
> lsub2_slot), one created via subscription and another one via API
> pg_create_logical_replication_slot() are synced. The standby LOGs
> which are as follows show that the one created by API 'lsub2_slot' is
> synced but the other one 'lsub1_slot':
>
> LOG for lsub1_slot:
> ================
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL:
> Streaming transactions committing after 0/0, reading WAL from
> 0/3000060.
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG:
> could not sync slot "lsub1_slot"
>
> LOG for lsub2_slot:
> ================
> 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG:
> newly created slot "lsub2_slot" is sync-ready now
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
>
> We can see from the log of lsub1_slot that its restart_lsn is
> 0/3000060 which means it will start reading from the WAL from that
> location. Now, if we check the publisher log, we have running_xacts
> record at that location. See following LOGs:
>
> 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG:
> statement: SELECT pg_create_logical_replication_slot('lsub2_slot',
> 'test_decoding', false, false, true);
> 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/3000060 oldest xid 740
> latest complete 739 next xid 740)
> ....
> ....
> 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740
> latest complete 739 next xid 740)
>
> The first running_xact record ends at 3000060 and the second one at
> 3000098. So, the start location of the second running_xact is 3000060,
> the same can be confirmed by the following LOG line of walsender:
>
> 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
> serializing snapshot to pg_logical/snapshots/0-3000060.snap
>
> This shows that while processing running_xact at location 3000060, we
> have serialized the snapshot. As there is no running transaction in
> WAL at 3000060 so ideally we should have reached a consistent state
> after processing that record on standby. But the reason standby didn't
> process that LOG is that the confirmed_flush LSN is also at the same
> location so the function LogicalSlotAdvanceAndCheckSnapState() exits
> without reading the WAL at that location. Now, this can be confirmed
> by the below walsender-specific LOG in publisher:
>
> 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write
> 0/3000060 flush 0/3000060 apply 0/3000060 reply_time 2024-04-05
> 04:36:59.155181+00
>
> We update the confirmed_flush location with the flush location after
> receiving the above feedback. You can notice that we didn't receive
> the feedback for the 3000098 location and hence both the
> confirmed_flush and restart_lsn are at the same location 0/3000060.
> Now, the test is waiting for the subscriber to send feedback of the
> last WAL write location by
> $primary->wait_for_catchup('regress_mysub1'); As noticed from the
> publisher LOGs, the query we used for wait is:
>
> SELECT '0/3000060' <= replay_lsn AND state = 'streaming'
>         FROM pg_catalog.pg_stat_replication
>         WHERE application_name IN ('regress_mysub1', 'walreceiver')
>
> Here, instead of '0/3000060' it should have used ''0/3000098' which is
> the last write location. This position we get via function
> pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And
> this variable seems to be touched by commit
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At
> this stage, I am not sure so just sharing with others to see if what I
> am saying sounds logical. I'll think more about this.
>

Thinking more on this, it doesn't seem related to
c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
any locking or something like that which impacts write positions. I
think what has happened here is that running_xact record written by
the background writer [1] is not written to the kernel or disk (see
LogStandbySnapshot()), before pg_current_wal_lsn() checks the
current_lsn to be compared with replayed LSN. Note that the reason why
walsender has picked the running_xact written by background writer is
because it has checked after pg_current_wal_lsn() query, see LOGs [2].
I think we can probably try to reproduce manually via debugger.

If this theory is correct then I think we will need to use injection
points to control the behavior of bgwriter or use the slots created
via SQL API for syncing in tests.

Thoughts?

[1] - 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0]
DEBUG: snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest
xid 740 latest complete 739 next xid 740)

[2] -
2024-04-05 04:37:05.134 UTC [3866413][client backend][1/4:0] LOG:
statement: SELECT pg_current_wal_lsn()
2024-04-05 04:37:05.144 UTC [3866413][client backend][:0] LOG:
disconnection: session time: 0:00:00.021 user=bf database=postgres
host=[local]
2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
serializing snapshot to pg_logical/snapshots/0-3000060.snap

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: remaining sql/json patches
Next
From: Robert Haas
Date:
Subject: Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs