Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | ZhF8o8Ve9YzlCqSz@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
Hi, On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > I think the new LSN can be visible only when the corresponding WAL is > written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can > make it visible. In your experiment below, isn't it possible that in > the meantime WAL writer has written that WAL due to which you are > seeing an updated location? What I did is: session 1: select pg_current_wal_lsn();\watch 1 session 2: select pg_backend_pid(); terminal 1: tail -f logfile | grep -i snap terminal 2 : gdb -p <backendpid session 2) -ex 'b LogCurrentRunningXacts' + "continue" once in gdb session 2: SELECT pg_log_standby_snapshot(); That produces a break in the gdb session, then: Breakpoint 1, LogCurrentRunningXacts (CurrRunningXacts=0x5774f92f8da0 <CurrentRunningXactsData.13>) at standby.c:1346 1346 { (gdb) n 1350 Then next, next until the DEBUG message is emitted (confirmed in terminal 1). At this stage the DEBUG message shows the new LSN while session 1 still displays the previous LSN. Then once XLogSetAsyncXactLSN() is done in the gdb session (terminal 2) then session 1 displays the new LSN. This is reproducible as desired. With more debugging I can see that when the spinlock is released in XLogSetAsyncXactLSN() then XLogWrite() is doing its job and then session 1 does see the new value (that happens in this order, and as you said that's expected). My point is that while the DEBUG message is emitted session 1 still see the old LSN (until the new LSN is vsible). I think that we should emit the DEBUG message once session 1 can see the new value (If not, I think the timestamp of the DEBUG message can be missleading during debugging purpose). > I think I am missing how exactly moving DEBUG2 can confirm the above theory. I meant to say that instead of seeing: 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/3000098oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/3000060' <= replay_lsn AND state= 'streaming' We would probably see something like: 2024-04-05 04:37:05.<something> UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/3000060' <= replay_lsn ANDstate = 'streaming' 2024-04-05 04:37:05.<something>+xx UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn0/3000098 oldest xid 740 latest complete 739 next xid 740) And then it would be clear that the query has ran before the new LSN is visible. > > If the theory is proven then I'm not sure we need the extra complexity of > > injection point here, maybe just relying on the slots created via SQL API could > > be enough. > > > > Yeah, that could be the first step. We can probably add an injection > point to control the bgwrite behavior and then add tests involving > walsender performing the decoding. But I think it is important to have > sufficient tests in this area as I see they are quite helpful in > uncovering the issues. > Yeah agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: