Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id ZhF8o8Ve9YzlCqSz@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,

On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> 
> I think the new LSN can be visible only when the corresponding WAL is
> written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can
> make it visible. In your experiment below, isn't it possible that in
> the meantime WAL writer has written that WAL due to which you are
> seeing an updated location?

What I did is:

session 1:  select pg_current_wal_lsn();\watch 1
session 2:  select pg_backend_pid();

terminal 1: tail -f logfile | grep -i snap 
terminal 2 : gdb -p <backendpid session 2) -ex 'b LogCurrentRunningXacts' + "continue" once in gdb

session 2: SELECT pg_log_standby_snapshot();

That produces a break in the gdb session, then:

Breakpoint 1, LogCurrentRunningXacts (CurrRunningXacts=0x5774f92f8da0 <CurrentRunningXactsData.13>) at standby.c:1346
1346    {
(gdb) n
1350
 
Then next, next until the DEBUG message is emitted (confirmed in terminal 1).

At this stage the DEBUG message shows the new LSN while session 1 still displays
the previous LSN.

Then once XLogSetAsyncXactLSN() is done in the gdb session (terminal 2) then
session 1 displays the new LSN.

This is reproducible as desired.

With more debugging I can see that when the spinlock is released in XLogSetAsyncXactLSN()
then XLogWrite() is doing its job and then session 1 does see the new value (that
happens in this order, and as you said that's expected).

My point is that while the DEBUG message is emitted session 1 still 
see the old LSN (until the new LSN is vsible). I think that we should emit the
DEBUG message once session 1 can see the new value (If not, I think the timestamp
of the DEBUG message can be missleading during debugging purpose).

> I think I am missing how exactly moving DEBUG2 can confirm the above theory.

I meant to say that instead of seeing:

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot of 0+0 running transaction ids (lsn
0/3000098oldest xid 740 latest complete 739 next xid 740)
 
2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: SELECT '0/3000060' <= replay_lsn AND
state= 'streaming'
 

We would probably see something like:

2024-04-05 04:37:05.<something> UTC [3866475][client backend][2/4:0] LOG:  statement: SELECT '0/3000060' <= replay_lsn
ANDstate = 'streaming'
 
2024-04-05 04:37:05.<something>+xx UTC [3854278][background writer][:0] DEBUG:  snapshot of 0+0 running transaction ids
(lsn0/3000098 oldest xid 740 latest complete 739 next xid 740)
 

And then it would be clear that the query has ran before the new LSN is visible.

> > If the theory is proven then I'm not sure we need the extra complexity of
> > injection point here, maybe just relying on the slots created via SQL API could
> > be enough.
> >
> 
> Yeah, that could be the first step. We can probably add an injection
> point to control the bgwrite behavior and then add tests involving
> walsender performing the decoding. But I think it is important to have
> sufficient tests in this area as I see they are quite helpful in
> uncovering the issues.
>

Yeah agree. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: LogwrtResult contended spinlock
Next
From: Tom Lane
Date:
Subject: Re: Fixing backslash dot for COPY FROM...CSV