Thread: A new message seems missing a punctuation
A recent commit (7a424ece48) added the following message: > could not sync slot information as remote slot precedes local slot: > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN > (%X/%X), catalog xmin (%u) Since it is a bit overloaded but doesn't have a separator between "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't we need something, for example a semicolon there to improve readability and reduce clutter? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Feb 19, 2024 at 10:10 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > A recent commit (7a424ece48) added the following message: > > > could not sync slot information as remote slot precedes local slot: > > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN > > (%X/%X), catalog xmin (%u) > > Since it is a bit overloaded but doesn't have a separator between > "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't > we need something, for example a semicolon there to improve > readability and reduce clutter? I think maybe we should even revise the message even more. In most places we do not just print out a whole bunch of values, but use a sentence to connect them. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Feb 19, 2024 at 10:31 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 19, 2024 at 10:10 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > A recent commit (7a424ece48) added the following message: > > > > > could not sync slot information as remote slot precedes local slot: > > > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN > > > (%X/%X), catalog xmin (%u) > > > > Since it is a bit overloaded but doesn't have a separator between > > "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't > > we need something, for example a semicolon there to improve > > readability and reduce clutter? > > I think maybe we should even revise the message even more. In most > places we do not just print out a whole bunch of values, but use a > sentence to connect them. I have tried to reword the msg, please have a look. thanks Shveta
Attachment
At Mon, 19 Feb 2024 10:31:33 +0530, Robert Haas <robertmhaas@gmail.com> wrote in > On Mon, Feb 19, 2024 at 10:10 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > A recent commit (7a424ece48) added the following message: > > > > > could not sync slot information as remote slot precedes local slot: > > > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN > > > (%X/%X), catalog xmin (%u) > > > > Since it is a bit overloaded but doesn't have a separator between > > "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't > > we need something, for example a semicolon there to improve > > readability and reduce clutter? > > I think maybe we should even revise the message even more. In most > places we do not just print out a whole bunch of values, but use a > sentence to connect them. Mmm. Something like this?: "could not sync slot information: local slot LSN (%X/%X) or xmin(%u) behind remote (%X/%X, %u)" Or I thought the values could be moved to DETAILS: line. By the way, the code around the message is as follows. > /* > * The remote slot didn't catch up to locally reserved position. > * > * We do not drop the slot because the restart_lsn can be ahead of the > * current location when recreating the slot in the next cycle. It may > * take more time to create such a slot. Therefore, we keep this slot > * and attempt the synchronization in the next cycle. > * > * XXX should this be changed to elog(DEBUG1) perhaps? > */ > ereport(LOG, > errmsg("could not sync slot information as remote slot precedes local slot:" > " remote slot \"%s\": LSN (%X/%X), catalog xmin (%u) local slot: LSN (%X/%X), catalog xmin (%u)", If we change this message to DEBUG1, a timeout feature to show a WARNING message might be needed for DBAs to notice that something bad is ongoing. However, it doesn't seem appropriate as a LOG message to me. Perhaps, a LOG message should be more like: > "LOG: waiting for local slot to catch up to remote" > "DETAIL: remote slot LSN (%X/%X)....; " regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Feb 19, 2024 at 11:04 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Or I thought the values could be moved to DETAILS: line. Yeah, I think that's likely to be the right approach here. The details aren't too clear to me. Does the primary error message really need to say "could not sync slot"? If it will be obvious from context that we were trying to sync a slot, then it would be fine to just say "ERROR: remote slot precedes local slot". But I also don't quite understand what problem this is trying to report. Is this slot-syncing code running on the primary or the standby? If it's running on the primary, then surely it's expected that the remote slot will precede the local one. And if it's running on the standby, then the comments in update_and_persist_local_synced_slot about waiting for the remote side to catch up seem quite confusing, because surely we're chasing the primary and not the other way around? But if we ignore all of that, then we could just do this: ERROR: could not sync slot information as remote slot precedes local slot DETAIL: Remote slot has LSN %X/%X and catalog xmin %u, but remote slot has LSN %X/%X and catalog xmin %u. which would fix the original complaint, and my point about using English rather than just printing a bunch of values. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Feb 19, 2024 at 11:42 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 19, 2024 at 11:04 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > Or I thought the values could be moved to DETAILS: line. > > Yeah, I think that's likely to be the right approach here. The details > aren't too clear to me. > > Does the primary error message really need to say "could not sync > slot"? If it will be obvious from context that we were trying to sync > a slot, then it would be fine to just say "ERROR: remote slot precedes > local slot". > As this is a LOG message, I feel one may need some more information on the context but it is not mandatory. > But I also don't quite understand what problem this is trying to > report. Is this slot-syncing code running on the primary or the > standby? If it's running on the primary, then surely it's expected > that the remote slot will precede the local one. And if it's running > on the standby, then the comments in > update_and_persist_local_synced_slot about waiting for the remote side > to catch up seem quite confusing, because surely we're chasing the > primary and not the other way around? > The local's restart_lsn could be ahead of than primary's for the very first sync when the WAL corresponding to the remote's restart_lsn is not available on standby (say due to a different wal related settings the required WAL has been removed when we first time tried to sync the slot). For more details, you can refer to comments atop slotsync.c starting from "If the WAL corresponding to the remote's restart_lsn ..." > But if we ignore all of that, then we could just do this: > > ERROR: could not sync slot information as remote slot precedes local slot > DETAIL: Remote slot has LSN %X/%X and catalog xmin %u, but remote slot > has LSN %X/%X and catalog xmin %u. > This looks good to me but instead of ERROR here we want to use LOG. -- With Regards, Amit Kapila.
On Mon, Feb 19, 2024 at 12:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Feb 19, 2024 at 11:42 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Mon, Feb 19, 2024 at 11:04 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > Or I thought the values could be moved to DETAILS: line. > > > > Yeah, I think that's likely to be the right approach here. The details > > aren't too clear to me. > > > > Does the primary error message really need to say "could not sync > > slot"? If it will be obvious from context that we were trying to sync > > a slot, then it would be fine to just say "ERROR: remote slot precedes > > local slot". > > > > As this is a LOG message, I feel one may need some more information on > the context but it is not mandatory. > > > But I also don't quite understand what problem this is trying to > > report. Is this slot-syncing code running on the primary or the > > standby? If it's running on the primary, then surely it's expected > > that the remote slot will precede the local one. And if it's running > > on the standby, then the comments in > > update_and_persist_local_synced_slot about waiting for the remote side > > to catch up seem quite confusing, because surely we're chasing the > > primary and not the other way around? > > > > The local's restart_lsn could be ahead of than primary's for the very > first sync when the WAL corresponding to the remote's restart_lsn is > not available on standby (say due to a different wal related settings > the required WAL has been removed when we first time tried to sync the > slot). For more details, you can refer to comments atop slotsync.c > starting from "If the WAL corresponding to the remote's restart_lsn > ..." > Sorry, I gave the wrong reference, the comments I was referring to start with: "If on physical standby, the WAL corresponding ...". -- With Regards, Amit Kapila.
On Mon, Feb 19, 2024 at 12:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Does the primary error message really need to say "could not sync > > slot"? If it will be obvious from context that we were trying to sync > > a slot, then it would be fine to just say "ERROR: remote slot precedes > > local slot". > > As this is a LOG message, I feel one may need some more information on > the context but it is not mandatory. I'll defer to you. > > But I also don't quite understand what problem this is trying to > > report. Is this slot-syncing code running on the primary or the > > standby? If it's running on the primary, then surely it's expected > > that the remote slot will precede the local one. And if it's running > > on the standby, then the comments in > > update_and_persist_local_synced_slot about waiting for the remote side > > to catch up seem quite confusing, because surely we're chasing the > > primary and not the other way around? > > The local's restart_lsn could be ahead of than primary's for the very > first sync when the WAL corresponding to the remote's restart_lsn is > not available on standby (say due to a different wal related settings > the required WAL has been removed when we first time tried to sync the > slot). For more details, you can refer to comments atop slotsync.c > starting from "If the WAL corresponding to the remote's restart_lsn > ..." So why do we log a message about this? > > But if we ignore all of that, then we could just do this: > > > > ERROR: could not sync slot information as remote slot precedes local slot > > DETAIL: Remote slot has LSN %X/%X and catalog xmin %u, but remote slot > > has LSN %X/%X and catalog xmin %u. > > > > This looks good to me but instead of ERROR here we want to use LOG. Fair enough! -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Feb 20, 2024 at 4:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 19, 2024 at 12:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > But I also don't quite understand what problem this is trying to > > > report. Is this slot-syncing code running on the primary or the > > > standby? If it's running on the primary, then surely it's expected > > > that the remote slot will precede the local one. And if it's running > > > on the standby, then the comments in > > > update_and_persist_local_synced_slot about waiting for the remote side > > > to catch up seem quite confusing, because surely we're chasing the > > > primary and not the other way around? > > > > The local's restart_lsn could be ahead of than primary's for the very > > first sync when the WAL corresponding to the remote's restart_lsn is > > not available on standby (say due to a different wal related settings > > the required WAL has been removed when we first time tried to sync the > > slot). For more details, you can refer to comments atop slotsync.c > > starting from "If the WAL corresponding to the remote's restart_lsn > > ..." > > So why do we log a message about this? > This was added after the main commit of this functionality to find some BF failures (where we were expecting the slot to sync but due to one of these conditions not being met the slot was not synced) and we can probably change it to DEBUG1 as well. I think we would need this information w.r.t this functionality to gather more information in case expected slots are not being synced and it may be helpful for users to also know why the slots are not synced, if that happens. -- With Regards, Amit Kapila.
On Tue, Feb 20, 2024 at 4:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > So why do we log a message about this? > > This was added after the main commit of this functionality to find > some BF failures (where we were expecting the slot to sync but due to > one of these conditions not being met the slot was not synced) and we > can probably change it to DEBUG1 as well. I think we would need this > information w.r.t this functionality to gather more information in > case expected slots are not being synced and it may be helpful for > users to also know why the slots are not synced, if that happens. Ah, OK. Do you think we need any kind of system view to provide more insight here or is a log message sufficient? -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Feb 20, 2024 at 4:50 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Feb 20, 2024 at 4:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > So why do we log a message about this? > > > > This was added after the main commit of this functionality to find > > some BF failures (where we were expecting the slot to sync but due to > > one of these conditions not being met the slot was not synced) and we > > can probably change it to DEBUG1 as well. I think we would need this > > information w.r.t this functionality to gather more information in > > case expected slots are not being synced and it may be helpful for > > users to also know why the slots are not synced, if that happens. > > Ah, OK. Do you think we need any kind of system view to provide more > insight here or is a log message sufficient? > We do expose the required information (restart_lsn, catalog_xmin, synced, temporary, etc) via pg_replication_slots. So, I feel the LOG message here is sufficient to DEBUG (or know the details) when the slot sync doesn't succeed. -- With Regards, Amit Kapila.
On Tue, Feb 20, 2024 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > We do expose the required information (restart_lsn, catalog_xmin, > synced, temporary, etc) via pg_replication_slots. So, I feel the LOG > message here is sufficient to DEBUG (or know the details) when the > slot sync doesn't succeed. > Please find the patch having the suggested changes. thanks Shveta
Attachment
On Tue, Feb 20, 2024 at 7:25 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Feb 20, 2024 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > We do expose the required information (restart_lsn, catalog_xmin, > > synced, temporary, etc) via pg_replication_slots. So, I feel the LOG > > message here is sufficient to DEBUG (or know the details) when the > > slot sync doesn't succeed. > > > > Please find the patch having the suggested changes. > LGTM. I'll push this tomorrow unless someone has further comments/suggestions. -- With Regards, Amit Kapila.