RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Synchronizing slots from primary to standby |
Date | |
Msg-id | OS0PR01MB5716CB015BAD807B29BC55BE944C2@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | RE: Synchronizing slots from primary to standby ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Friday, February 16, 2024 8:33 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > On Thursday, February 15, 2024 8:29 PM Amit Kapila > <amit.kapila16@gmail.com> wrote: > > > > > On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote: > > > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu) > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > Attach the v2 patch here. > > > > > > > > > > Apart from the new log message. I think we can add one more > > > > > debug message in reserve_wal_for_local_slot, this could be > > > > > useful to analyze the > > failure. > > > > > > > > Yeah, that can also be helpful, but the added message looks naive to me. > > > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, > > > > + segno); > > > > > > > > Instead of the above, how about something like: "segno: %ld of > > > > purposed restart_lsn for the synced slot, oldest_segno: %ld > > > > available"? > > > > > > Looks good to me. I'm not sure if it would make more sense to elog > > > only if segno < oldest_segno means just before the > > XLogSegNoOffsetToRecPtr() call? > > > > > > But I'm fine with the proposed location too. > > > > > > > I am also fine either way but the current location gives required > > information in more number of cases and could be helpful in debugging this > new facility. > > > > > > > > > > > And we > > > > > can also enable the DEBUG log in the 040 tap-test, I see we have > > > > > similar setting in 010_logical_decoding_timline and logging > > > > > debug1 message doesn't increase noticable time on my machine. > > > > > These are done > > in 0002. > > > > > > > > > > > > > I haven't tested it but I think this can help in debugging BF > > > > failures, if any. I am not sure if to keep it always like that but > > > > till the time these tests are stabilized, this sounds like a good > > > > idea. So, how, about just making test changes as a separate patch > > > > so that later if required we can revert/remove it easily? > > > > Bertrand, do you have any thoughts on this? > > > > > > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I > > > +think we > > > took the same approach for 035_standby_logical_decoding.pl IIRC) and > > > then revert it back. > > > > > > > Good to know! > > > > > Also I was thinking: what about adding an output to > > pg_sync_replication_slots()? > > > The output could be the number of sync slots that have been created > > > and are not considered as sync-ready during the execution. > > > > > > > Yeah, we can consider outputting some information via this function > > like how many slots are synced and persisted but not sure what would > > be appropriate here. Because one can anyway find that or more > > information by querying pg_replication_slots. I think we can keep > > discussing what makes more sense as a return value but let's commit > > the debug/log patches as they will be helpful to analyze BF failures or any > other issues reported. > > Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it could > provide more information about communication between primary and standby. > This also doesn't increase noticeable testing time on my machine for debug > build. Sorry, there was a miss in the DEBUG message, I should have used UINT64_FORMAT for XLogSegNo(uint64) instead of %ld. Here is a small patch to fix this. Best Regards, Hou zj
Attachment
pgsql-hackers by date: