RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Synchronizing slots from primary to standby
Date
Msg-id OS0PR01MB57162C119597F999895E9DAA94002@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Saturday, April 6, 2024 12:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > Thinking more on this, it doesn't seem related to
> > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't
> > > change any locking or something like that which impacts write positions.
> >
> > Agree.
> >
> > > I think what has happened here is that running_xact record written
> > > by the background writer [1] is not written to the kernel or disk
> > > (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > > current_lsn to be compared with replayed LSN.
> >
> > Agree, I think it's not visible through pg_current_wal_lsn() yet.
> >
> > Also I think that the DEBUG message in LogCurrentRunningXacts()
> >
> > "
> >         elog(DEBUG2,
> >              "snapshot of %d+%d running transaction ids (lsn %X/%X oldest
> xid %u latest complete %u next xid %u)",
> >              CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
> >              LSN_FORMAT_ARGS(recptr),
> >              CurrRunningXacts->oldestRunningXid,
> >              CurrRunningXacts->latestCompletedXid,
> >              CurrRunningXacts->nextXid); "
> >
> > should be located after the XLogSetAsyncXactLSN() call. Indeed, the
> > new LSN is visible after the spinlock (XLogCtl->info_lck) in
> > XLogSetAsyncXactLSN() is released,
> >
> 
> I think the new LSN can be visible only when the corresponding WAL is written
> by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it
> visible. In your experiment below, isn't it possible that in the meantime WAL
> writer has written that WAL due to which you are seeing an updated location?
> 
> >see:
> >
> > \watch on Session 1 provides:
> >
> >  pg_current_wal_lsn
> > --------------------
> >  0/87D110
> > (1 row)
> >
> > Until:
> >
> > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at
> xlog.c:2579
> > 2579            XLogRecPtr      WriteRqstPtr = asyncXactLSN;
> > (gdb) n
> > 2581            bool            wakeup = false;
> > (gdb)
> > 2584            SpinLockAcquire(&XLogCtl->info_lck);
> > (gdb)
> > 2585            RefreshXLogWriteResult(LogwrtResult);
> > (gdb)
> > 2586            sleeping = XLogCtl->WalWriterSleeping;
> > (gdb)
> > 2587            prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> > (gdb)
> > 2588            if (XLogCtl->asyncXactLSN < asyncXactLSN)
> > (gdb)
> > 2589                    XLogCtl->asyncXactLSN = asyncXactLSN;
> > (gdb)
> > 2590            SpinLockRelease(&XLogCtl->info_lck);
> > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> > $1 = 0x87d148
> >
> > Then session 1 provides:
> >
> >  pg_current_wal_lsn
> > --------------------
> >  0/87D148
> > (1 row)
> >
> > So, when we see in the log:
> >
> > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> > snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740
> > latest complete 739 next xid 740)
> > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:
> statement: SELECT '0/3000060' <= replay_lsn AND state = 'streaming'
> >
> > It's indeed possible that the new LSN was not visible yet (spinlock
> > not released?) before the query began (because we can not rely on the
> > time the DEBUG message has been emitted).
> >
> > > Note that the reason why
> > > walsender has picked the running_xact written by background writer
> > > is because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > > I think we can probably try to reproduce manually via debugger.
> > >
> > > If this theory is correct
> >
> > It think it is.
> >
> > > then I think we will need to use injection points to control the
> > > behavior of bgwriter or use the slots created via SQL API for
> > > syncing in tests.
> > >
> > > Thoughts?
> >
> > I think that maybe as a first step we should move the "elog(DEBUG2,"
> > message as proposed above to help debugging (that could help to confirm
> the above theory).
> >
> 
> I think I am missing how exactly moving DEBUG2 can confirm the above theory.
> 
> > If the theory is proven then I'm not sure we need the extra complexity
> > of injection point here, maybe just relying on the slots created via
> > SQL API could be enough.
> >
> 
> Yeah, that could be the first step. We can probably add an injection point to
> control the bgwrite behavior and then add tests involving walsender
> performing the decoding. But I think it is important to have sufficient tests in
> this area as I see they are quite helpful in uncovering the issues.

Here is the patch to drop the subscription in the beginning so that the
restart_lsn of the lsub1_slot won't be advanced due to concurrent
xl_running_xacts from bgwriter. The subscription will be re-created after all
the slots are sync-ready. I think maybe we can use this to stabilize the test
as a first step and then think about how to make use of injection point to add
more tests if it's worth it.

Best Regards,
Hou zj


Attachment

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE
Next
From: Pavel Borisov
Date:
Subject: Re: Table AM Interface Enhancements