Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id ZhAMPqJrcWdTxEJj@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
List pgsql-hackers
Hi,

On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Thinking more on this, it doesn't seem related to
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> any locking or something like that which impacts write positions.

Agree.

> I think what has happened here is that running_xact record written by
> the background writer [1] is not written to the kernel or disk (see
> LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> current_lsn to be compared with replayed LSN.

Agree, I think it's not visible through pg_current_wal_lsn() yet.

Also I think that the DEBUG message in LogCurrentRunningXacts() 

"
        elog(DEBUG2,
             "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
             CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
             LSN_FORMAT_ARGS(recptr),
             CurrRunningXacts->oldestRunningXid,
             CurrRunningXacts->latestCompletedXid,
             CurrRunningXacts->nextXid);
"

should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is
visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is
released, see:

\watch on Session 1 provides: 

 pg_current_wal_lsn
--------------------
 0/87D110
(1 row)

Until:

Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 
2579            XLogRecPtr      WriteRqstPtr = asyncXactLSN;
(gdb) n
2581            bool            wakeup = false;
(gdb)
2584            SpinLockAcquire(&XLogCtl->info_lck);
(gdb)
2585            RefreshXLogWriteResult(LogwrtResult);
(gdb)
2586            sleeping = XLogCtl->WalWriterSleeping;
(gdb)
2587            prevAsyncXactLSN = XLogCtl->asyncXactLSN;
(gdb)
2588            if (XLogCtl->asyncXactLSN < asyncXactLSN)
(gdb)
2589                    XLogCtl->asyncXactLSN = asyncXactLSN;
(gdb)
2590            SpinLockRelease(&XLogCtl->info_lck);
(gdb) p p/x (uint32) XLogCtl->asyncXactLSN
$1 = 0x87d148

Then session 1 provides:

 pg_current_wal_lsn
--------------------
 0/87D148
(1 row)

So, when we see in the log:

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot of 0+0 running transaction ids (lsn
0/3000098oldest xid 740 latest complete 739 next xid 740)
 
2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: SELECT '0/3000060' <= replay_lsn AND
state= 'streaming'
 

It's indeed possible that the new LSN was not visible yet (spinlock not released?)
before the query began (because we can not rely on the time the DEBUG message has
been emitted).

> Note that the reason why
> walsender has picked the running_xact written by background writer is
> because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> I think we can probably try to reproduce manually via debugger.
> 
> If this theory is correct

It think it is.

> then I think we will need to use injection
> points to control the behavior of bgwriter or use the slots created
> via SQL API for syncing in tests.
> 
> Thoughts?

I think that maybe as a first step we should move the "elog(DEBUG2," message as
proposed above to help debugging (that could help to confirm the above theory).

If the theory is proven then I'm not sure we need the extra complexity of
injection point here, maybe just relying on the slots created via SQL API could
be enough.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Add bump memory context type and use it for tuplesorts
Next
From: Peter Eisentraut
Date:
Subject: Re: Psql meta-command conninfo+