Thread: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Bharath Rupireddy
Date:
Hi, The active_pid of ReplicationSlot structure, which tells whether a replication slot is active or inactive, isn't persisted to the disk i.e has no entry in ReplicationSlotPersistentData structure. Isn't it better if we add that info to ReplicationSlotPersistentData structure and persist to the disk? This will help to know what were the inactive replication slots in case the server goes down or crashes for some reason. Currently, we don't have a way to interpret the replication slot info in the disk but there's a patch for pg_replslotdata tool at [1]. This way, one can figure out the reasons for the server down/crash and figure out which replication slots to remove to bring the server up and running without touching the other replication slots. Thoughts? [1] - https://www.postgresql.org/message-id/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com Regards, Bharath Rupireddy.
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Alvaro Herrera
Date:
On 2021-Dec-01, Bharath Rupireddy wrote: > The active_pid of ReplicationSlot structure, which tells whether a > replication slot is active or inactive, isn't persisted to the disk > i.e has no entry in ReplicationSlotPersistentData structure. Isn't it > better if we add that info to ReplicationSlotPersistentData structure > and persist to the disk? This will help to know what were the inactive > replication slots in case the server goes down or crashes for some > reason. Currently, we don't have a way to interpret the replication > slot info in the disk but there's a patch for pg_replslotdata tool at > [1]. This way, one can figure out the reasons for the server > down/crash and figure out which replication slots to remove to bring > the server up and running without touching the other replication > slots. I think the PIDs are log-worthy for sure, but it's not clear to me that it is desirable to write them to the persistent state file. In case of crashes, the log should serve just fine to aid root cause investigation -- in fact even better than the persistent file, where the data would be lost as soon as the next client acquires that slot. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "The Postgresql hackers have what I call a "NASA space shot" mentality. Quite refreshing in a world of "weekend drag racer" developers." (Scott Marlowe)
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Bharath Rupireddy
Date:
On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Dec-01, Bharath Rupireddy wrote: > > > The active_pid of ReplicationSlot structure, which tells whether a > > replication slot is active or inactive, isn't persisted to the disk > > i.e has no entry in ReplicationSlotPersistentData structure. Isn't it > > better if we add that info to ReplicationSlotPersistentData structure > > and persist to the disk? This will help to know what were the inactive > > replication slots in case the server goes down or crashes for some > > reason. Currently, we don't have a way to interpret the replication > > slot info in the disk but there's a patch for pg_replslotdata tool at > > [1]. This way, one can figure out the reasons for the server > > down/crash and figure out which replication slots to remove to bring > > the server up and running without touching the other replication > > slots. > > I think the PIDs are log-worthy for sure, but it's not clear to me that > it is desirable to write them to the persistent state file. In case of > crashes, the log should serve just fine to aid root cause investigation > -- in fact even better than the persistent file, where the data would be > lost as soon as the next client acquires that slot. Thanks. +1 to log a message at LOG level whenever a replication slot becomes active (gets assigned a valid pid to active_pid) and becomes inactive(gets assigned 0 to active_pid). Having said that, isn't it also helpful if we write a bool (1 byte character) whenever the slot becomes active and inactive to the disk? Regards, Bharath Rupireddy.
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Bharath Rupireddy
Date:
On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2021-Dec-01, Bharath Rupireddy wrote: > > > > > The active_pid of ReplicationSlot structure, which tells whether a > > > replication slot is active or inactive, isn't persisted to the disk > > > i.e has no entry in ReplicationSlotPersistentData structure. Isn't it > > > better if we add that info to ReplicationSlotPersistentData structure > > > and persist to the disk? This will help to know what were the inactive > > > replication slots in case the server goes down or crashes for some > > > reason. Currently, we don't have a way to interpret the replication > > > slot info in the disk but there's a patch for pg_replslotdata tool at > > > [1]. This way, one can figure out the reasons for the server > > > down/crash and figure out which replication slots to remove to bring > > > the server up and running without touching the other replication > > > slots. > > > > I think the PIDs are log-worthy for sure, but it's not clear to me that > > it is desirable to write them to the persistent state file. In case of > > crashes, the log should serve just fine to aid root cause investigation > > -- in fact even better than the persistent file, where the data would be > > lost as soon as the next client acquires that slot. > > Thanks. +1 to log a message at LOG level whenever a replication slot > becomes active (gets assigned a valid pid to active_pid) and becomes > inactive(gets assigned 0 to active_pid). Having said that, isn't it > also helpful if we write a bool (1 byte character) whenever the slot > becomes active and inactive to the disk? Here's the patch that adds a LOG message whenever a replication slot becomes active and inactive. These logs will be extremely useful on production servers to debug and analyze inactive replication slot issues. Thoughts? Regards, Bharath Rupireddy.
Attachment
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Kyotaro Horiguchi
Date:
At Tue, 14 Dec 2021 19:04:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > I think the PIDs are log-worthy for sure, but it's not clear to me that > > > it is desirable to write them to the persistent state file. In case of > > > crashes, the log should serve just fine to aid root cause investigation > > > -- in fact even better than the persistent file, where the data would be > > > lost as soon as the next client acquires that slot. > > > > Thanks. +1 to log a message at LOG level whenever a replication slot > > becomes active (gets assigned a valid pid to active_pid) and becomes > > inactive(gets assigned 0 to active_pid). Having said that, isn't it > > also helpful if we write a bool (1 byte character) whenever the slot > > becomes active and inactive to the disk? > > Here's the patch that adds a LOG message whenever a replication slot > becomes active and inactive. These logs will be extremely useful on > production servers to debug and analyze inactive replication slot > issues. > > Thoughts? If I create a replication slot, I saw the following lines in server log. [22054:client backend] LOG: replication slot "s1" becomes active [22054:client backend] DETAIL: The process with PID 22054 acquired it. [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); [22054:client backend] LOG: replication slot "s1" becomes inactive [22054:client backend] DETAIL: The process with PID 22054 released it. [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); They are apparently too much as if they were DEBUG3 lines. The process PID shown is of the process the slot operations took place so I think it conveys no information. The STATEMENT lines are also noisy for non-ERROR emssages. Couldn't we hide that line? That is, how about making the log lines as simple as the follows? [17156:walsender] LOG: acquired replication slot "s1" [17156:walsender] LOG: released replication slot "s1" I think in the first place we don't need this log lines at slot creation since it is actually not acquirement nor releasing of a slot. It behaves in a strange way when executing pg_basebackup. [22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active [22864:walsender] DETAIL: The process with PID 22864 acquired it. [22864:walsender] STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_22864" TEMPORARY PHYSICAL ( RESERVE_WAL) [22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active [22864:walsender] DETAIL: The process with PID 22864 acquired it. [22864:walsender] STATEMENT: START_REPLICATION SLOT "pg_basebackup_22864" 0/6000000 TIMELINE 1 [22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes inactive [22864:walsender] DETAIL: The process with PID 22864 released it. The slot is acquired twice then released once. It is becuase the patch doesn't emit "becomes inactive" line when releasing a temporary slot. However, I'd rather think we don't need the first 'become active' line like the previous example. @@ -658,6 +690,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot) slot->active_pid = 0; slot->in_use = false; LWLockRelease(ReplicationSlotControlLock); + + if (pid > 0) + ereport(LOG, + (errmsg("replication slot \"%s\" becomes inactive", + NameStr(slot->data.name)), + errdetail("The process with PID %d released it.", pid))); + This is wrong. I see a "become inactive" message if I droped an "inactive" replication slot. The reason the inactive slot looks as if it were acquired is it is temporarily aquired as a preparing step of dropping. Even assuming that the log lines are simplified to this extent, I still see it a bit strange that the "becomes acitve (or acruied)" message shows alone without having a message like "replication connection accepted". But that would be another issue even if it is true. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Bharath Rupireddy
Date:
On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Here's the patch that adds a LOG message whenever a replication slot > > becomes active and inactive. These logs will be extremely useful on > > production servers to debug and analyze inactive replication slot > > issues. > > > > Thoughts? > > If I create a replication slot, I saw the following lines in server log. > > [22054:client backend] LOG: replication slot "s1" becomes active > [22054:client backend] DETAIL: The process with PID 22054 acquired it. > [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); > [22054:client backend] LOG: replication slot "s1" becomes inactive > [22054:client backend] DETAIL: The process with PID 22054 released it. > [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); > > They are apparently too much as if they were DEBUG3 lines. The > process PID shown is of the process the slot operations took place so > I think it conveys no information. The STATEMENT lines are also noisy > for non-ERROR emssages. Couldn't we hide that line? > > That is, how about making the log lines as simple as the follows? > > [17156:walsender] LOG: acquired replication slot "s1" > [17156:walsender] LOG: released replication slot "s1" Thanks for taking a look at the patch. Here's what I've come up with: for drops: 2021-12-28 06:39:34.963 UTC [2541600] LOG: acquired persistent physical replication slot "myslot1" 2021-12-28 06:39:34.980 UTC [2541600] LOG: dropped persistent physical replication slot "myslot1" 2021-12-28 06:47:39.994 UTC [2544153] LOG: acquired persistent logical replication slot "myslot2" 2021-12-28 06:47:40.003 UTC [2544153] LOG: dropped persistent logical replication slot "myslot2" for creates: 2021-12-28 06:39:46.859 UTC [2541600] LOG: created persistent physical replication slot "myslot1" 2021-12-28 06:39:46.859 UTC [2541600] LOG: released persistent physical replication slot "myslot1" 2021-12-28 06:45:20.037 UTC [2544153] LOG: created persistent logical replication slot "myslot2" 2021-12-28 06:45:20.058 UTC [2544153] LOG: released persistent logical replication slot "myslot2" for pg_basebackup: 2021-12-28 06:41:04.601 UTC [2542686] LOG: created temporary physical replication slot "pg_basebackup_2542686" 2021-12-28 06:41:04.602 UTC [2542686] LOG: released temporary physical replication slot "pg_basebackup_2542686" 2021-12-28 06:41:04.602 UTC [2542686] LOG: acquired temporary physical replication slot "pg_basebackup_2542686" 2021-12-28 06:41:04.867 UTC [2542686] LOG: released temporary physical replication slot "pg_basebackup_2542686" 2021-12-28 06:41:04.954 UTC [2542686] LOG: dropped temporary physical replication slot "pg_basebackup_2542686" The amount of logs may seem noisy, but they do help a lot given the fact that the server generates much more noise, for instance, the server logs the syntax error statements. And, the replication slots don't get created/dropped every now and then (at max, the pg_basebackup if at all used and configured to take the backups, say, every 24hrs or so). With the above set of logs debugging for inactive replication slots becomes easier. One can find the root cause and answer questions like "why there was a huge WAL file growth at some point or when did a replication slot become inactive or how much time a replication slot was inactive or when did a standby disconnected and connected again or when did a pg_receivewal or pg_recvlogical connected and disconnected so on.". Here's the v2 patch. Please provide your thoughts. Regards, Bharath Rupireddy.
Attachment
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Kyotaro Horiguchi
Date:
At Tue, 28 Dec 2021 12:28:07 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > Here's the patch that adds a LOG message whenever a replication slot > > > becomes active and inactive. These logs will be extremely useful on > > > production servers to debug and analyze inactive replication slot > > > issues. > > > > > > Thoughts? > > > > If I create a replication slot, I saw the following lines in server log. > > > > [22054:client backend] LOG: replication slot "s1" becomes active > > [22054:client backend] DETAIL: The process with PID 22054 acquired it. > > [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); > > [22054:client backend] LOG: replication slot "s1" becomes inactive > > [22054:client backend] DETAIL: The process with PID 22054 released it. > > [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); > > > > They are apparently too much as if they were DEBUG3 lines. The > > process PID shown is of the process the slot operations took place so > > I think it conveys no information. The STATEMENT lines are also noisy > > for non-ERROR emssages. Couldn't we hide that line? > > > > That is, how about making the log lines as simple as the follows? > > > > [17156:walsender] LOG: acquired replication slot "s1" > > [17156:walsender] LOG: released replication slot "s1" > > Thanks for taking a look at the patch. Here's what I've come up with: > > for drops: (two log lines per slot: acquire->drop) > > for creates: (two log lines per slot: create->release) .. Theses are still needlessly verbose. Even for those who want slot activities to be logged are not interested in this detail. This is still debug logs in that sense. > The amount of logs may seem noisy, but they do help a lot given the > fact that the server generates much more noise, for instance, the > server logs the syntax error statements. And, the replication slots > don't get created/dropped every now and then (at max, the > pg_basebackup if at all used and configured to take the backups, say, > every 24hrs or so). With the above set of logs debugging for inactive In a nearby thread, there was a discussion that checkpoint logs are too noisy to defaultly turn on. Finally it is committed but I don't think this is committed as is as it is more verbose (IMV) than the checkpoint logs. Thus this logs need to be muteable. Meanwhile I don't think we willingly add a new knob for this feature. I think we can piggy-back on log_replication_commands for the purpose, changing its meaning slightly to "log replication commands and related activity". > replication slots becomes easier. One can find the root cause and > answer questions like "why there was a huge WAL file growth at some > point or when did a replication slot become inactive or how much time > a replication slot was inactive or when did a standby disconnected and > connected again or when did a pg_receivewal or pg_recvlogical > connected and disconnected so on.". I don't deny it is useful in that cases. If you are fine that the logs are debug-only, making them DEBUG1 would work. But I don't think you are fine with that since I think you are going to turn on them on production systems. > Here's the v2 patch. Please provide your thoughts. Thanks! I have three comments on this version. - I still think "acquire/release" logs on creation/dropping should be silenced. Adding the third parameter to ReplicationSlotAcquire() that can mute the acquiring (and as well as the corresponding releasing) log will work. - Need to mute the logs by log_replication_commands. (We could add another choice for the variable for this purpose but I think we don't need it.) - The messages are not translatable as the variable parts are adjectives. They need to consist of static sentences. The combinations of the two properties are 6 (note that persistence is tristate) but I don't think we accept that complexity for the information. So I recommend to just remove the variable parts from the messages. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Bharath Rupireddy
Date:
On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > Here's the v2 patch. Please provide your thoughts.
>
> Thanks! I have three comments on this version.
Thanks for your review.
> - I still think "acquire/release" logs on creation/dropping should be
> silenced. Adding the third parameter to ReplicationSlotAcquire()
> that can mute the acquiring (and as well as the corresponding
> releasing) log will work.
Done.
> can piggy-back on log_replication_commands for the purpose, changing
> its meaning slightly to "log replication commands and related
> activity".
> - Need to mute the logs by log_replication_commands. (We could add
> another choice for the variable for this purpose but I think we
> don't need it.)
Done.
> - The messages are not translatable as the variable parts are
> adjectives. They need to consist of static sentences. The
> combinations of the two properties are 6 (note that persistence is
> tristate) but I don't think we accept that complexity for the
> information. So I recommend to just remove the variable parts from
> the messages.
Done.
Here's v3, please review it further.
Regards,
Bharath Rupireddy.
> > Here's the v2 patch. Please provide your thoughts.
>
> Thanks! I have three comments on this version.
Thanks for your review.
> - I still think "acquire/release" logs on creation/dropping should be
> silenced. Adding the third parameter to ReplicationSlotAcquire()
> that can mute the acquiring (and as well as the corresponding
> releasing) log will work.
Done.
> can piggy-back on log_replication_commands for the purpose, changing
> its meaning slightly to "log replication commands and related
> activity".
> - Need to mute the logs by log_replication_commands. (We could add
> another choice for the variable for this purpose but I think we
> don't need it.)
Done.
> - The messages are not translatable as the variable parts are
> adjectives. They need to consist of static sentences. The
> combinations of the two properties are 6 (note that persistence is
> tristate) but I don't think we accept that complexity for the
> information. So I recommend to just remove the variable parts from
> the messages.
Done.
Here's v3, please review it further.
Regards,
Bharath Rupireddy.
Attachment
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
From
Bharath Rupireddy
Date:
On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > Here's the v2 patch. Please provide your thoughts. > > > > Thanks! I have three comments on this version. > > Thanks for your review. > > > - I still think "acquire/release" logs on creation/dropping should be > > silenced. Adding the third parameter to ReplicationSlotAcquire() > > that can mute the acquiring (and as well as the corresponding > > releasing) log will work. > > Done. > > > can piggy-back on log_replication_commands for the purpose, changing > > its meaning slightly to "log replication commands and related > > activity". > > - Need to mute the logs by log_replication_commands. (We could add > > another choice for the variable for this purpose but I think we > > don't need it.) > > Done. > > > - The messages are not translatable as the variable parts are > > adjectives. They need to consist of static sentences. The > > combinations of the two properties are 6 (note that persistence is > > tristate) but I don't think we accept that complexity for the > > information. So I recommend to just remove the variable parts from > > the messages. > > Done. > > Here's v3, please review it further. Here's the rebased v4 patch. Regards, Bharath Rupireddy.
Attachment
add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Sun, Feb 27, 2022 at 3:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > Here's the v2 patch. Please provide your thoughts. > > > > > > Thanks! I have three comments on this version. > > > > Thanks for your review. > > > > > - I still think "acquire/release" logs on creation/dropping should be > > > silenced. Adding the third parameter to ReplicationSlotAcquire() > > > that can mute the acquiring (and as well as the corresponding > > > releasing) log will work. > > > > Done. > > > > > can piggy-back on log_replication_commands for the purpose, changing > > > its meaning slightly to "log replication commands and related > > > activity". > > > - Need to mute the logs by log_replication_commands. (We could add > > > another choice for the variable for this purpose but I think we > > > don't need it.) > > > > Done. > > > > > - The messages are not translatable as the variable parts are > > > adjectives. They need to consist of static sentences. The > > > combinations of the two properties are 6 (note that persistence is > > > tristate) but I don't think we accept that complexity for the > > > information. So I recommend to just remove the variable parts from > > > the messages. > > > > Done. > > > > Here's v3, please review it further. > > Here's the rebased v4 patch. Here's the rebased v5 patch. Regards, Bharath Rupireddy.
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Alvaro Herrera
Date:
If I read the code right, this patch emits logs when pg_logical_slot_get_changes and pg_replication_slot_advance SQL functions are called. Is this desirable/useful, for the use case you stated at start of thread? I think it is most likely pointless. If you get rid of those, then the only acquisitions that would log messages are those in StartReplication and StartLogicalReplication. So I wonder if it would be better to leave the API of ReplicationSlotAcquire() alone, and instead make StartReplication and StartLogicalReplication responsible for those messages. I didn't look at the release-side messages you're adding, but I suppose it should be symmetrical. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > If I read the code right, this patch emits logs when > pg_logical_slot_get_changes and pg_replication_slot_advance SQL > functions are called. Is this desirable/useful, for the use case you > stated at start of thread? I think it is most likely pointless. If you > get rid of those, then the only acquisitions that would log messages are > those in StartReplication and StartLogicalReplication. So I wonder if > it would be better to leave the API of ReplicationSlotAcquire() alone, > and instead make StartReplication and StartLogicalReplication > responsible for those messages. > > I didn't look at the release-side messages you're adding, but I suppose > it should be symmetrical. Adding the messages right after ReplicationSlotAcquire in the StartReplication and StartLogicalReplication looks okay, but, if we just add "released replication slot \"%s\" for logical/physical replication". in StartReplication and StartLogicalReplication right after ReplicationSlotRelease, how about ReplicationSlotRelease in WalSndErrorCleanup and ReplicationSlotShmemExit? The whole idea is to get to know when and how much time a slot was inactive/unused when log_replication_commands is set to true. We can think of adding a timestamp column to on-disk replication slot data but that will be too much. Regards, Bharath Rupireddy.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Thu, May 5, 2022 at 2:57 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > If I read the code right, this patch emits logs when > > pg_logical_slot_get_changes and pg_replication_slot_advance SQL > > functions are called. Is this desirable/useful, for the use case you > > stated at start of thread? I think it is most likely pointless. If you > > get rid of those, then the only acquisitions that would log messages are > > those in StartReplication and StartLogicalReplication. So I wonder if > > it would be better to leave the API of ReplicationSlotAcquire() alone, > > and instead make StartReplication and StartLogicalReplication > > responsible for those messages. > > > > I didn't look at the release-side messages you're adding, but I suppose > > it should be symmetrical. Here's the v6 patch, a much simpler one - no changes to any of the existing function APIs. Please see the sample logs at [1]. There's a bit of duplicate code in the v6 patch, if the overall approach looks okay, I can remove that too in the next version of the patch. Thoughts? [1] 2022-07-25 12:30:14.847 UTC [152873] LOG: acquired physical replication slot "foo" 2022-07-25 12:30:20.878 UTC [152873] LOG: released physical replication slot "foo" 2022-07-25 12:49:18.023 UTC [168738] LOG: acquired logical replication slot "bar" 2022-07-25 12:49:28.105 UTC [168738] LOG: released logical replication slot "bar" Regards, Bharath Rupireddy.
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Here's the v6 patch, a much simpler one - no changes to any of the > existing function APIs. Please see the sample logs at [1]. There's a > bit of duplicate code in the v6 patch, if the overall approach looks > okay, I can remove that too in the next version of the patch. I modified the log_replication_commands description in guc_tables.c. Please review the v7 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Thu, Sep 15, 2022 at 10:39 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Jul 25, 2022 at 6:31 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Here's the v6 patch, a much simpler one - no changes to any of the > > existing function APIs. Please see the sample logs at [1]. There's a > > bit of duplicate code in the v6 patch, if the overall approach looks > > okay, I can remove that too in the next version of the patch. > > I modified the log_replication_commands description in guc_tables.c. > Please review the v7 patch further. > I see that you have modified the patch to address the comments from Alvaro. Personally, I feel it would be better to add such a message at a centralized location instead of spreading these in different callers of slot acquire/release functionality to avoid getting these missed in the new callers in the future. However, if Alvaro and others think that the current style is better then we should go ahead and do it that way. I hope that we should be able to decide on this and get it into PG16. Anyone else would like to weigh in here? -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
Hi, I had a quick look at the v7 patch. You might consider to encapsulate some of this logic in new functions like: void LogReplicationSlotAquired(bool is_physical, char *slotname) { loglevel = log_replication_commands ? LOG : DEBUG3; if (is_physical) ereport(loglevel, (errmsg("acquired physical replication slot \"%s\"", slotname))); else ereport(loglevel, (errmsg("acquired logical replication slot \"%s\"", slotname))); } void LogReplicationSlotReleased(bool is_physical, char *slotname) { loglevel = log_replication_commands ? LOG : DEBUG3; if (is_physical) ereport(loglevel, (errmsg("released physical replication slot \"%s\"", slotname))); else ereport(loglevel, (errmsg("released logical replication slot \"%s\"", slotname))); } ~~ THEN ReplicationSlotShmemExit and WalSndErrorCleanup can call it like: if (MyReplicationSlot != NULL) { bool is_physical = SlotIsPhysical(MyReplicationSlot); char *slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); ReplicationSlotRelease(); LogReplicationSlotReleased(is_physical, slotname); } ~ StartlReplication can call like: LogReplicationSlotAquired(true, cmd->slotname); ... LogReplicationSlotReleased(true, cmd->slotname); ~ StartLogicalReplication can call like: LogReplicationSlotAquired(false, cmd->slotname); ... LogReplicationSlotReleased(false, cmd->slotname); ~~~ TBH, I am not sure for the *current* code if the encapsulation is worth the trouble or not. But maybe at least it helps message consistency and will make it easier if future callers are needed. I guess those functions could also give you some central point to comment the intent of this logging? Feel free to take or leave this code suggestion. ------ Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Alvaro Herrera
Date:
On 2023-Mar-22, Amit Kapila wrote: > I see that you have modified the patch to address the comments from > Alvaro. Personally, I feel it would be better to add such a message at > a centralized location instead of spreading these in different callers > of slot acquire/release functionality to avoid getting these missed in > the new callers in the future. However, if Alvaro and others think > that the current style is better then we should go ahead and do it > that way. I hope that we should be able to decide on this and get it > into PG16. Anyone else would like to weigh in here? I like Peter Smith's suggestion downthread. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Mar-22, Amit Kapila wrote: > > > I see that you have modified the patch to address the comments from > > Alvaro. Personally, I feel it would be better to add such a message at > > a centralized location instead of spreading these in different callers > > of slot acquire/release functionality to avoid getting these missed in > > the new callers in the future. However, if Alvaro and others think > > that the current style is better then we should go ahead and do it > > that way. I hope that we should be able to decide on this and get it > > into PG16. Anyone else would like to weigh in here? > > I like Peter Smith's suggestion downthread. +1. Please review the attached v8 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
On Fri, Mar 24, 2023 at 4:33 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2023-Mar-22, Amit Kapila wrote: > > > > > I see that you have modified the patch to address the comments from > > > Alvaro. Personally, I feel it would be better to add such a message at > > > a centralized location instead of spreading these in different callers > > > of slot acquire/release functionality to avoid getting these missed in > > > the new callers in the future. However, if Alvaro and others think > > > that the current style is better then we should go ahead and do it > > > that way. I hope that we should be able to decide on this and get it > > > into PG16. Anyone else would like to weigh in here? > > > > I like Peter Smith's suggestion downthread. > > +1. Please review the attached v8 patch further. > Patch v8 applied OK, and builds/renders the HTML docs OK, and passes the regression and subscription TAP tests OK. (Note - I didn't do any additional manual testing, and I've assumed it to be covering all the necessary acquire/related logging that you wanted). ~~ Here are some minor comments: 1. + ereport(log_replication_commands ? LOG : DEBUG3, + (errmsg("acquired physical replication slot \"%s\"", + slotname))); AFAIK those extra parentheses wrapping the "errmsg" part are not necessary. ~~ 2. extern void LogReplicationSlotAquired(bool is_physical, char *slotname); extern void LogReplicationSlotReleased(bool is_physical, char *slotname); The "char *slotname" params of those helper functions should probably be declared and defined as "const char *slotname". ~~ Otherwise, from a code review perspective the patch v8 LGTM. ------ Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
"Euler Taveira"
Date:
On Thu, Mar 23, 2023, at 2:33 PM, Bharath Rupireddy wrote:
On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:>> On 2023-Mar-22, Amit Kapila wrote:>> > I see that you have modified the patch to address the comments from> > Alvaro. Personally, I feel it would be better to add such a message at> > a centralized location instead of spreading these in different callers> > of slot acquire/release functionality to avoid getting these missed in> > the new callers in the future. However, if Alvaro and others think> > that the current style is better then we should go ahead and do it> > that way. I hope that we should be able to decide on this and get it> > into PG16. Anyone else would like to weigh in here?>> I like Peter Smith's suggestion downthread.+1. Please review the attached v8 patch further.
If you are adding separate functions as suggested, you should add a comment at
the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions
saying that LogReplicationSlotAquired() and LogReplicationSlotReleased()
functions should be called respectively after it.
My suggestion is that the functions should have the same name with a "Log"
prefix. On of them has a typo "Aquired" in its name. Hence,
LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is
easier to find if someone is grepping by the origin function.
I prefer a sentence that includes a verb.
physical replication slot \"%s\" is acquired
logical replication slot \"%s\" is released
Isn't the PID important for this use case? If so, of course, you can rely on
log_line_prefix (%p) but if the PID is crucial for an investigation then it
should also be included in the message.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Fri, Mar 24, 2023 at 3:11 AM Euler Taveira <euler@eulerto.com> wrote: > > If you are adding separate functions as suggested, you should add a comment at > the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions > saying that LogReplicationSlotAquired() and LogReplicationSlotReleased() > functions should be called respectively after it. Done. > My suggestion is that the functions should have the same name with a "Log" > prefix. On of them has a typo "Aquired" in its name. Hence, > LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is > easier to find if someone is grepping by the origin function. Done. > I prefer a sentence that includes a verb. > > physical replication slot \"%s\" is acquired > logical replication slot \"%s\" is released Hm, changed for now. But I'll leave it to the committer's discretion. > Isn't the PID important for this use case? If so, of course, you can rely on > log_line_prefix (%p) but if the PID is crucial for an investigation then it > should also be included in the message. On Fri, Mar 24, 2023 at 3:10 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Patch v8 applied OK, and builds/renders the HTML docs OK, and passes > the regression and subscription TAP tests OK. > > Here are some minor comments: > > 1. > + ereport(log_replication_commands ? LOG : DEBUG3, > + (errmsg("acquired physical replication slot \"%s\"", > + slotname))); > > AFAIK those extra parentheses wrapping the "errmsg" part are not necessary. Done > 2. > extern void LogReplicationSlotAquired(bool is_physical, char *slotname); > extern void LogReplicationSlotReleased(bool is_physical, char *slotname); > > The "char *slotname" params of those helper functions should probably > be declared and defined as "const char *slotname". Done. > Otherwise, from a code review perspective the patch v8 LGTM. Thanks. Please have a look at the v9 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Greg Stark
Date:
On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > + ereport(log_replication_commands ? LOG : DEBUG3, > > + (errmsg("acquired physical replication slot \"%s\"", > > + slotname))); So this is just a bit of bike-shedding but I don't feel like these log messages really meet the standard we set for our logging. Like what did the acquiring? What does "acquired" actually mean for a replication slot? Is there not any meta information about the acquisition that can give more context to the reader to make this message more meaningful? I would expect a log message like this to say, I dunno, something like "physical replication slot \"%s\" acquired by streaming TCP connection to 192.168.0.1:999 at LSN ... with xxxMB of logs to read" I even would be wondering if the other end shouldn't also be logging a corresponding log and we shouldn't be going out of our way to ensure there's enough information to match them up and presenting them in a way that makes that easy. -- greg
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Alvaro Herrera
Date:
On 2023-Mar-23, Greg Stark wrote: > On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > + ereport(log_replication_commands ? LOG : DEBUG3, > > > + (errmsg("acquired physical replication slot \"%s\"", > > > + slotname))); > > So this is just a bit of bike-shedding but I don't feel like these log > messages really meet the standard we set for our logging. Like what > did the acquiring? What does "acquired" actually mean for a > replication slot? Is there not any meta information about the > acquisition that can give more context to the reader to make this > message more meaningful? > > I would expect a log message like this to say, I dunno, something like > "physical replication slot \"%s\" acquired by streaming TCP connection > to 192.168.0.1:999 at LSN ... with xxxMB of logs to read" Hmm, I don't disagree with your argument in principle, but I think this proposal is going too far. I think stating the PID is more than sufficient. And I don't think we need this patch to go great lengths to explain what acquisition is, either; I mean, maybe that's a good thing to have, but then that's a different patch. > I even would be wondering if the other end shouldn't also be logging a > corresponding log and we shouldn't be going out of our way to ensure > there's enough information to match them up and presenting them in a > way that makes that easy. Hmm, you should be able to match things using the connection information. I don't think the slot acquisition operation in itself is that important. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Fri, Mar 24, 2023 at 3:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Mar-23, Greg Stark wrote: > > > On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > + ereport(log_replication_commands ? LOG : DEBUG3, > > > > + (errmsg("acquired physical replication slot \"%s\"", > > > > + slotname))); > > > > So this is just a bit of bike-shedding but I don't feel like these log > > messages really meet the standard we set for our logging. Like what > > did the acquiring? What does "acquired" actually mean for a > > replication slot? Is there not any meta information about the > > acquisition that can give more context to the reader to make this > > message more meaningful? > > > > I would expect a log message like this to say, I dunno, something like > > "physical replication slot \"%s\" acquired by streaming TCP connection > > to 192.168.0.1:999 at LSN ... with xxxMB of logs to read" > > Hmm, I don't disagree with your argument in principle, but I think this > proposal is going too far. I think stating the PID is more than > sufficient. Do you mean to have something like "physical/logical replication slot \"%s\" is released/acquired by PID %d", MyProcPid? If yes, the log_line_prefix already contains PID right? Or do we want to cover the cases when someone changes log_line_prefix to not contain PID? > And I don't think we need this patch to go great lengths to > explain what acquisition is, either; I mean, maybe that's a good thing > to have, but then that's a different patch. > > > I even would be wondering if the other end shouldn't also be logging a > > corresponding log and we shouldn't be going out of our way to ensure > > there's enough information to match them up and presenting them in a > > way that makes that easy. > > Hmm, you should be able to match things using the connection > information. I don't think the slot acquisition operation in itself is > that important. Yeah, the intention of the patch is to track the patterns of slot acquisitions and releases to aid analysis. Of course, this information alone may not help but when matched with others in the logs, it will. The v9 patch was failing because I was using MyReplicationSlot after it got reset by slot release, attached v10 patch fixes it. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Mon, Mar 27, 2023 at 11:08 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > The v9 patch was failing because I was using MyReplicationSlot after > it got reset by slot release, attached v10 patch fixes it. > + * + * Note: use LogReplicationSlotAcquire() if needed, to log a message after + * acquiring the replication slot. */ void ReplicationSlotAcquire(const char *name, bool nowait) @@ -542,6 +554,9 @@ retry: When does it need to be logged? For example, recently, we added one more slot acquisition/release call in binary_upgrade_logical_slot_has_caught_up(); it is not clear from the comments whether we need to LOG it or not. I guess at some place like atop LogReplicationSlotAcquire() we should document in a bit more specific way as to when is this expected to be called. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
On Wed, Nov 1, 2023 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 27, 2023 at 11:08 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > The v9 patch was failing because I was using MyReplicationSlot after > > it got reset by slot release, attached v10 patch fixes it. > > > > + * > + * Note: use LogReplicationSlotAcquire() if needed, to log a message after > + * acquiring the replication slot. > */ > void > ReplicationSlotAcquire(const char *name, bool nowait) > @@ -542,6 +554,9 @@ retry: > > When does it need to be logged? For example, recently, we added one > more slot acquisition/release call in > binary_upgrade_logical_slot_has_caught_up(); it is not clear from the > comments whether we need to LOG it or not. I guess at some place like > atop LogReplicationSlotAcquire() we should document in a bit more > specific way as to when is this expected to be called. > I agree. Just saying "if needed" in those function comments doesn't help with knowing how to judge when logging is needed or not. ~ Looking back at the thread history it seems the function comment was added after Euler [1] suggested ("... you should add a comment at the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions saying that LogReplicationSlotAquired() and LogReplicationSlotReleased() functions should be called respectively after it.") But that's not quite compatible with what Alvaro [2] had written long back ("... the only acquisitions that would log messages are those in StartReplication and StartLogicalReplication."). In other words, ReplicationSlotAcquire/ReplicationSlotRelease is called by more places than you care to log from. ~ Adding a better explanatory comment than "if needed" will be good, and maybe that is all that is necessary. I'm not sure. OTOH, if you have to explain that logging is only wanted for a couple of scenarios, then it raises some doubts about the usefulness of having a common function in the first place. I had the same doubts back in March [3] ("I am not sure for the *current* code if the encapsulation is worth the trouble or not."). ====== [1] Euler - https://www.postgresql.org/message-id/c42d5634-ca9b-49a7-85cd-9eff9feb33b4%40app.fastmail.com [2] Alvaro - https://www.postgresql.org/message-id/202204291032.qfvyuqxkjnjw%40alvherre.pgsql [3] Peter - https://www.postgresql.org/message-id/CAHut%2BPu6Knwooc_NckMxszGrAJnytgpMadtoJ-OA-SFWT%2BGFYw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Austalia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > But that's not quite compatible with what Alvaro [2] had written long > back ("... the only acquisitions that would log messages are those in > StartReplication and StartLogicalReplication."). > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > called by more places than you care to log from. I refreshed my thoughts for this patch and I think it's enough if walsenders alone log messages when slots become active and inactive. How about something like the attached v11 patch? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > But that's not quite compatible with what Alvaro [2] had written long > > back ("... the only acquisitions that would log messages are those in > > StartReplication and StartLogicalReplication."). > > > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > > called by more places than you care to log from. > > I refreshed my thoughts for this patch and I think it's enough if > walsenders alone log messages when slots become active and inactive. > How about something like the attached v11 patch? > + * This function is currently used only in walsender. + */ +void +ReplicationSlotAcquireAndLog(const char *name, bool nowait) BTW, is the reason for using it only in walsender is that it is a background process and it is not very apparent whether the slot is created, and for foreground processes, it is a bit clear when the command is executed. If so, the other alternative is to either use a parameter to the existing function or directly use am_walsender flag to distinguish when to print the message in acquire/release calls. Can you please tell me the use case of this additional message? A few other minor comments: 1. + Causes each replication command and related activity to be logged in + the server log. Can we be bit more specific by changing to something like: "Causes each replication command and slot acquisition/release to be logged in the server log." 2. + ereport(log_replication_commands ? LOG : DEBUG1, + (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"", It seems PID and process name is quite unlike what we print in other similar messages. For example, see below messages when we start replication via pub/sub : 2023-11-06 08:41:57.867 IST [24400] LOG: received replication command: CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput (SNAPSHOT 'nothing') 2023-11-06 08:41:57.867 IST [24400] STATEMENT: CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput (SNAPSHOT 'nothing') ... ... 2023-11-06 08:41:57.993 IST [22332] LOG: walsender process with PID 22332 acquired logical replication slot "sub1" 2023-11-06 08:41:57.993 IST [22332] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') ... ... 2023-11-06 08:41:58.015 IST [22332] LOG: starting logical decoding for slot "sub1" 2023-11-06 08:41:58.015 IST [22332] DETAIL: Streaming transactions committing after 0/1522730, reading WAL from 0/15226F8. 2023-11-06 08:41:58.015 IST [22332] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') We can get the PID from the log line as for other logs and I don't see the process name printed anywhere else. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
Hi, FWIW, I used a small script to set up the following environment to help observe the log messages written by this patch. Node1 = primary; it creates a PUBLICATION Node2 = physical standby; the publication is replicated here Node3 = subscriber; it subscribes to the publication now on the standby Node2 ~ In the log files you can see: 1. The "physical" message from the walsender of the primary (Node1) e.g. 2023-11-06 18:59:58.094 AEDT [17091] LOG: walsender process with PID 17091 acquired physical replication slot "pg_basebackup_17091" 2. The "logical" message from the pub/sub walsender of the standby (Node2) e.g. 2023-11-06 19:16:29.053 AEDT [12774] LOG: walsender process with PID 12774 acquired logical replication slot "sub1" ~ PSA the test script and logs ====== Kind Regards, Peter Smith. Fujitsu Australia
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Mon, Nov 6, 2023 at 9:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > But that's not quite compatible with what Alvaro [2] had written long > > > back ("... the only acquisitions that would log messages are those in > > > StartReplication and StartLogicalReplication."). > > > > > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > > > called by more places than you care to log from. > > > > I refreshed my thoughts for this patch and I think it's enough if > > walsenders alone log messages when slots become active and inactive. > > How about something like the attached v11 patch? > > > > + * This function is currently used only in walsender. > + */ > +void > +ReplicationSlotAcquireAndLog(const char *name, bool nowait) > > BTW, is the reason for using it only in walsender is that it is a > background process and it is not very apparent whether the slot is > created, and for foreground processes, it is a bit clear when the > command is executed. > > Can you please tell me the use case of this additional message? Replication slot acquisitions and releases by backends say when running pg_replication_slot_advance or pg_logical_slot_get_changes or pg_drop_replication_slot or pg_create_{physical, logical}_replication_slot are transient unlike walsenders which comparatively hold slots for longer durations. Therefore, I've added them only for walsenders. These messages help to know the lifetime of a replication slot - one can know how long a streaming standby or logical subscriber is down, IOW, how long a replication slot is inactive in production. For instance, the time between released and acquired slots in the below messages is the inactive replication slot duration. 2023-11-13 11:06:34.338 UTC [470262] LOG: acquired physical replication slot "sb_repl_slot" 2023-11-13 11:06:34.338 UTC [470262] STATEMENT: START_REPLICATION SLOT "sb_repl_slot" 0/3000000 TIMELINE 1 2023-11-13 11:09:24.918 UTC [470262] LOG: released physical replication slot "sb_repl_slot" 2023-11-13 12:01:40.530 UTC [470967] LOG: acquired physical replication slot "sb_repl_slot" 2023-11-13 12:01:40.530 UTC [470967] STATEMENT: START_REPLICATION SLOT "sb_repl_slot" 0/3000000 TIMELINE 1 > If so, the other alternative is to either use a > parameter to the existing function or directly use am_walsender flag > to distinguish when to print the message in acquire/release calls. Done that way. PSA v12. > A few other minor comments: > 1. > + Causes each replication command and related activity to be logged in > + the server log. > > Can we be bit more specific by changing to something like: "Causes > each replication command and slot acquisition/release to be logged in > the server log." Done. > 2. > + ereport(log_replication_commands ? LOG : DEBUG1, > + (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"", > > It seems PID and process name is quite unlike what we print in other > similar messages. For example, see below messages when we start > replication via pub/sub : > > We can get the PID from the log line as for other logs and I don't see > the process name printed anywhere else. There was a comment upthread to have PID printed, but I agree to be consistent and changed the messages to be: acquired physical/logical replication slot "foo" and released physical/logical replication slot "foo". PSA v12 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Mon, Nov 13, 2023 at 5:43 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Nov 6, 2023 at 9:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > But that's not quite compatible with what Alvaro [2] had written long > > > > back ("... the only acquisitions that would log messages are those in > > > > StartReplication and StartLogicalReplication."). > > > > > > > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is > > > > called by more places than you care to log from. > > > > > > I refreshed my thoughts for this patch and I think it's enough if > > > walsenders alone log messages when slots become active and inactive. > > > How about something like the attached v11 patch? > > > > > > > + * This function is currently used only in walsender. > > + */ > > +void > > +ReplicationSlotAcquireAndLog(const char *name, bool nowait) > > > > BTW, is the reason for using it only in walsender is that it is a > > background process and it is not very apparent whether the slot is > > created, and for foreground processes, it is a bit clear when the > > command is executed. > > > > Can you please tell me the use case of this additional message? > > Replication slot acquisitions and releases by backends say when > running pg_replication_slot_advance or pg_logical_slot_get_changes or > pg_drop_replication_slot or pg_create_{physical, > logical}_replication_slot are transient unlike walsenders which > comparatively hold slots for longer durations. Therefore, I've added > them only for walsenders. These messages help to know the lifetime of > a replication slot - one can know how long a streaming standby or > logical subscriber is down, IOW, how long a replication slot is > inactive in production. For instance, the time between released and > acquired slots in the below messages is the inactive replication slot > duration. > > 2023-11-13 11:06:34.338 UTC [470262] LOG: acquired physical > replication slot "sb_repl_slot" > 2023-11-13 11:06:34.338 UTC [470262] STATEMENT: START_REPLICATION > SLOT "sb_repl_slot" 0/3000000 TIMELINE 1 > 2023-11-13 11:09:24.918 UTC [470262] LOG: released physical > replication slot "sb_repl_slot" > 2023-11-13 12:01:40.530 UTC [470967] LOG: acquired physical > replication slot "sb_repl_slot" > 2023-11-13 12:01:40.530 UTC [470967] STATEMENT: START_REPLICATION > SLOT "sb_repl_slot" 0/3000000 TIMELINE 1 > > > If so, the other alternative is to either use a > > parameter to the existing function or directly use am_walsender flag > > to distinguish when to print the message in acquire/release calls. > > Done that way. PSA v12. > > > A few other minor comments: > > 1. > > + Causes each replication command and related activity to be logged in > > + the server log. > > > > Can we be bit more specific by changing to something like: "Causes > > each replication command and slot acquisition/release to be logged in > > the server log." > > Done. > > > 2. > > + ereport(log_replication_commands ? LOG : DEBUG1, > > + (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"", > > > > It seems PID and process name is quite unlike what we print in other > > similar messages. For example, see below messages when we start > > replication via pub/sub : > > > > We can get the PID from the log line as for other logs and I don't see > > the process name printed anywhere else. > > There was a comment upthread to have PID printed, but I agree to be > consistent and changed the messages to be: acquired physical/logical > replication slot "foo" and released physical/logical replication slot > "foo". > > PSA v12 patch. Compiler isn't happy with v12 https://cirrus-ci.com/task/5543061376204800?logs=gcc_warning#L405. PSA v13 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
Here are some review comments for patch v13-0001. ====== doc/src/sgml/config.sgml 1. <para> - Causes each replication command to be logged in the server log. - See <xref linkend="protocol-replication"/> for more information about - replication command. The default value is <literal>off</literal>. - Only superusers and users with the appropriate <literal>SET</literal> - privilege can change this setting. + Causes each replication command and slot acquisition/release to be + logged in the server log. See <xref linkend="protocol-replication"/> + for more information about replication command. The default value is + <literal>off</literal>. Only superusers and users with the appropriate + <literal>SET</literal> privilege can change this setting. </para> Should that also mention about walsender? e.g. "and slot acquisition/release" ==> "and <literal>walsender</literal> slot acquisition/release" ====== src/backend/replication/slot.c 2. ReplicationSlotAcquire if (SlotIsLogical(s)) pgstat_acquire_replslot(s); + + if (am_walsender) + ereport(log_replication_commands ? LOG : DEBUG1, + errmsg_internal("acquired %s replication slot \"%s\"", + SlotIsPhysical(MyReplicationSlot) ? "physical" : "logical", + NameStr(MyReplicationSlot->data.name))); 2a. Instead of calling SlotIsLogical() and then again calling SlotIsPhysical(), it might be better to assign this one time to a local variable. ~ 2b. IMO it is better to continue using variable 's' here instead of 'MyReplicationSlot'. Code is not only shorter but is also consistent with the rest of the function which never uses MyReplicationSlot, even in the places where it could have. ~ SUGGESTION (for #2a and #2b) is_logical = SlotIsLogical(s); if (is_logical) pgstat_acquire_replslot(s); if (am_walsender) ereport(log_replication_commands ? LOG : DEBUG1, errmsg_internal("acquired %s replication slot \"%s\"", is_logical ? "logical" : "physical", NameStr(s->data.name))); ~~~ 3. ReplicationSlotRelease ReplicationSlotRelease(void) { ReplicationSlot *slot = MyReplicationSlot; + char *slotname = NULL; /* keep compiler quiet */ + bool is_physical = false; /* keep compiler quiet */ Assert(slot != NULL && slot->active_pid != 0); + if (am_walsender) + { + slotname = pstrdup(NameStr(MyReplicationSlot->data.name)); + is_physical = SlotIsPhysical(MyReplicationSlot); + } + 3a. Notice 'MyReplicationSlot' is already assigned to the local 'slot' variable, so IMO it is better if this new code also uses that 'slot' variable for consistency with the rest of the function. ~ 3b. Consider flipping the flag to be 'is_logical' instead of 'is_physical', so the ereport substitution will match the other ReplicationSlotAcquirecode suggested above (#2a). ~ SUGGESTION (For #3a and #3b) if (am_walsender) { slotname = pstrdup(NameStr(slot->data.name)); is_logical = SlotIsLogical(slot); } ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Tue, Nov 14, 2023 at 4:50 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are some review comments for patch v13-0001. Thanks. > ====== > doc/src/sgml/config.sgml > > 1. > > Should that also mention about walsender? > > e.g. > "and slot acquisition/release" ==> "and <literal>walsender</literal> > slot acquisition/release" Changed. > 2a. > Instead of calling SlotIsLogical() and then again calling > SlotIsPhysical(), it might be better to assign this one time to a > local variable. > > 2b. > IMO it is better to continue using variable 's' here instead of > 'MyReplicationSlot'. Code is not only shorter but is also consistent > with the rest of the function which never uses MyReplicationSlot, even > in the places where it could have. > > SUGGESTION (for #2a and #2b) > is_logical = SlotIsLogical(s); > if (is_logical) > pgstat_acquire_replslot(s); > > if (am_walsender) > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg_internal("acquired %s replication slot \"%s\"", > is_logical ? "logical" : "physical", NameStr(s->data.name))); Use of a separate variable isn't good IMO, I used SlotIsLogical(s); directly. > 3a. > Notice 'MyReplicationSlot' is already assigned to the local 'slot' > variable, so IMO it is better if this new code also uses that 'slot' > variable for consistency with the rest of the function. > > 3b. > Consider flipping the flag to be 'is_logical' instead of > 'is_physical', so the ereport substitution will match the other > ReplicationSlotAcquirecode suggested above (#2a). > > SUGGESTION (For #3a and #3b) > if (am_walsender) > { > slotname = pstrdup(NameStr(slot->data.name)); > is_logical = SlotIsLogical(slot); > } Done. PSA v14 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
Hi, Thanks for addressing my previous comments. Patch v14-0001 looks good to me, except I have one question: The patch uses errmsg_internal() for the logging, but I noticed the only other code using GUC 'log_replication_commands' has errmsg() instead of errmsg_internal(). Isn't it better to be consistent with the existing code? ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Wed, Nov 15, 2023 at 4:40 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Thanks for addressing my previous comments. Patch v14-0001 looks good > to me, except I have one question: > > The patch uses errmsg_internal() for the logging, but I noticed the > only other code using GUC 'log_replication_commands' has errmsg() > instead of errmsg_internal(). Isn't it better to be consistent with > the existing code? > I agree that we should errmsg here. If we read the description of errmsg_internal() [1], it is recommended to be used for "cannot happen" cases where we don't want to spend translation effort which is not the case here. Also, similar to the below message, we should add a comment for a translator. ereport(LOG, /* translator: %s is SIGKILL or SIGABRT */ (errmsg("issuing %s to recalcitrant children", send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); Another minor comment: + Causes each replication command and <literal>walsender</literal> + process replication slot acquisition/release to be logged in the server + log. Isn't it better to use process's instead of process in the above sentence? [1] -https://www.postgresql.org/docs/devel/error-message-reporting.html -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Wed, Nov 15, 2023 at 9:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 15, 2023 at 4:40 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Thanks for addressing my previous comments. Patch v14-0001 looks good > > to me, except I have one question: > > > > The patch uses errmsg_internal() for the logging, but I noticed the > > only other code using GUC 'log_replication_commands' has errmsg() > > instead of errmsg_internal(). Isn't it better to be consistent with > > the existing code? > > > > I agree that we should errmsg here. If we read the description of > errmsg_internal() [1], it is recommended to be used for "cannot > happen" cases where we don't want to spend translation effort which is > not the case here. I chose not to translate the newly added messages as they are only written to server logs not sent to the client. However, I've changed to errmsg, after looking at the errmsg_internal docs. > Also, similar to the below message, we should add a > comment for a translator. > > ereport(LOG, > /* translator: %s is SIGKILL or SIGABRT */ > (errmsg("issuing %s to recalcitrant children", > send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); Added. > Another minor comment: > + Causes each replication command and <literal>walsender</literal> > + process replication slot acquisition/release to be logged in the server > + log. > > Isn't it better to use process's instead of process in the above sentence? Changed. PSA v15 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
Patch v15-0001 LGTM. ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Wed, Nov 15, 2023 at 11:00 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > PSA v15 patch. > The patch looks good to me. I have slightly modified the translator message and commit message in the attached. I'll push this tomorrow unless there are any comments. -- With Regards, Amit Kapila.
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Alvaro Herrera
Date:
Translation-wise, this doesn't work, because you're building a string. There's no reason to think that the words "logical" and "physical" should stay untranslated; the message would make no sense, or at least would be very ugly. You should do something like if (am_walsender) { ereport(log_replication_commands ? LOG : DEBUG1, SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) : errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name))); } (Obviously, lose the "translator:" comments since they are unnecessary) If you really want to keep the "logical"/"physical" word untranslated, you need to split it out of the sentence somehow. But it would be really horrible IMO. Like errmsg("acquired replication slot \"%s\" of type \"%s\"", NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Translation-wise, this doesn't work, because you're building a string. > There's no reason to think that the words "logical" and "physical" > should stay untranslated; the message would make no sense, or at least > would be very ugly. > > You should do something like > > if (am_walsender) > { > ereport(log_replication_commands ? LOG : DEBUG1, > SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) : > errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name))); > } This seems better, so done that way. > (Obviously, lose the "translator:" comments since they are unnecessary) The translator message now indicates that the remaining %s denotes the replication slot name. PSA v17 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
Some minor comments for v17-0001. ====== 1. + ereport(log_replication_commands ? LOG : DEBUG1, + SlotIsLogical(s) + /* translator: %s is name of the replication slot */ + ? errmsg("acquired logical replication slot \"%s\"", + NameStr(s->data.name)) + : errmsg("acquired physical replication slot \"%s\"", + NameStr(s->data.name))); 1a. FWIW, if the ternary was inside the errmsg, there would be less code duplication. ~ 1b. I searched HEAD code and did not find any "translator:" comments for just ordinary slot name substitutions like these; AFAICT that comment is not necessary anymore. ~ SUGGESTION (#1a and #1b) ereport(log_replication_commands ? LOG : DEBUG1, errmsg(SlotIsLogical(s) ? "acquired logical replication slot \"%s\"" : "acquired physical replication slot \"%s\"", NameStr(s->data.name))); ~~~ 2. Ditto for the other ereport. ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Thu, Nov 16, 2023 at 3:48 AM Peter Smith <smithpb2250@gmail.com> wrote: > > ~ > > SUGGESTION (#1a and #1b) > > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg(SlotIsLogical(s) > ? "acquired logical replication slot \"%s\"" > : "acquired physical replication slot \"%s\"", > NameStr(s->data.name))); > > ~~~ > Personally, I prefer the way Bharath had in his patch. Did you see any preferred way in the existing code? -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Translation-wise, this doesn't work, because you're building a string. > There's no reason to think that the words "logical" and "physical" > should stay untranslated; the message would make no sense, or at least > would be very ugly. > > You should do something like > > if (am_walsender) > { > ereport(log_replication_commands ? LOG : DEBUG1, > SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) : > errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name))); > } > > (Obviously, lose the "translator:" comments since they are unnecessary) > > > If you really want to keep the "logical"/"physical" word untranslated, > you need to split it out of the sentence somehow. But it would be > really horrible IMO. Like > > errmsg("acquired replication slot \"%s\" of type \"%s\"", > NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") > Thanks for the suggestion. I would like to clarify on this a bit. What do exactly mean by splitting out of the sentence? For example, in one of the existing messages: ereport(LOG, /* translator: %s is SIGKILL or SIGABRT */ (errmsg("issuing %s to recalcitrant children", send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); Do here words SIGABRT/SIGKILL remain untranslated due to the translator's comment? I thought this was similar to the message being proposed but seems like this message construction follows translation rules better. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
On Thu, Nov 16, 2023 at 12:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > Translation-wise, this doesn't work, because you're building a string. > > There's no reason to think that the words "logical" and "physical" > > should stay untranslated; the message would make no sense, or at least > > would be very ugly. > > > > You should do something like > > > > if (am_walsender) > > { > > ereport(log_replication_commands ? LOG : DEBUG1, > > SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) : > > errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name))); > > } > > > > (Obviously, lose the "translator:" comments since they are unnecessary) > > > > > > If you really want to keep the "logical"/"physical" word untranslated, > > you need to split it out of the sentence somehow. But it would be > > really horrible IMO. Like > > > > errmsg("acquired replication slot \"%s\" of type \"%s\"", > > NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") > > > > Thanks for the suggestion. I would like to clarify on this a bit. What > do exactly mean by splitting out of the sentence? For example, in one > of the existing messages: > > ereport(LOG, > /* translator: %s is SIGKILL or SIGABRT */ > (errmsg("issuing %s to recalcitrant children", > send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); > > Do here words SIGABRT/SIGKILL remain untranslated due to the > translator's comment? I thought this was similar to the message being > proposed but seems like this message construction follows translation > rules better. > IIUC, that example is different because "SIGABRT" / "SIGKILL" are not real words, so you don't want the translator to attempt to translate them.You want them to appear in the message as-is. OTOH in this patch "logical" and "physical" are just normal English words that should be translated as part of the original message. e.g. like in these similar messages: - msgid "database \"%s\" is used by an active logical replication slot" - msgstr "la base de données « %s » est utilisée par un slot de réplication logique actif" ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Peter Smith
Date:
On Thu, Nov 16, 2023 at 12:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 16, 2023 at 3:48 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > ~ > > > > SUGGESTION (#1a and #1b) > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > errmsg(SlotIsLogical(s) > > ? "acquired logical replication slot \"%s\"" > > : "acquired physical replication slot \"%s\"", > > NameStr(s->data.name))); > > > > ~~~ > > > > Personally, I prefer the way Bharath had in his patch. Did you see any > preferred way in the existing code? Not really. I think the errmsg combined with ternary is not so common. I couldn't find many examples, so I wouldn't try to claim anything is a "preferred" way There are some existing examples, like Bharath had: ereport(NOTICE, (errcode(ERRCODE_DUPLICATE_OBJECT), collencoding == -1 ? errmsg("collation \"%s\" already exists, skipping", collname) : errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping", collname, pg_encoding_to_char(collencoding)))); OTOH, when there are different numbers of substitution parameters in each of the errmsg like that, you don't have much choice but to do it that way. I am fine with whatever is chosen -- I was only offering an alternative. ====== Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Alvaro Herrera
Date:
On 2023-Nov-16, Peter Smith wrote: > I searched HEAD code and did not find any "translator:" comments for > just ordinary slot name substitutions like these; AFAICT that comment > is not necessary anymore. True. Lose that. The rationale I have is to consider whether a translator looking at the original message message in isolation is going to understand what the %s means. If it's possible to tell what it is without having to go read the source code that leads to the message, then you don't need a "translator:" comment. Otherwise you do. You also need to assume the translator is not stupid, but that seems an OK assumption. > SUGGESTION (#1a and #1b) > > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg(SlotIsLogical(s) > ? "acquired logical replication slot \"%s\"" > : "acquired physical replication slot \"%s\"", > NameStr(s->data.name))); The bad thing about this is that gettext() is not going to pick up these strings into the translation catalog. You could fix that by adding gettext_noop() calls around them: ereport(log_replication_commands ? LOG : DEBUG1, errmsg(SlotIsLogical(s) ? gettext_noop("acquired logical replication slot \"%s\"") : gettext_noop("acquired physical replication slot \"%s\""), NameStr(s->data.name))); but at that point it's not clear that it's really better than putting the ternary in the outer scope. You can verify this by doing "make update-po" and then searching for the messages in postgres.pot. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Bharath Rupireddy
Date:
On Thu, Nov 16, 2023 at 4:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Nov-16, Peter Smith wrote: > > > I searched HEAD code and did not find any "translator:" comments for > > just ordinary slot name substitutions like these; AFAICT that comment > > is not necessary anymore. > > True. Lose that. Removed. > The rationale I have is to consider whether a translator looking at the > original message message in isolation is going to understand what the %s > means. If it's possible to tell what it is without having to go read > the source code that leads to the message, then you don't need a > "translator:" comment. Otherwise you do. Agree. I think it's easy for one to guess the slot name follows ".... replication slot %s", so I removed the translator message. > > SUGGESTION (#1a and #1b) > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > errmsg(SlotIsLogical(s) > > ? "acquired logical replication slot \"%s\"" > > : "acquired physical replication slot \"%s\"", > > NameStr(s->data.name))); > > The bad thing about this is that gettext() is not going to pick up these > strings into the translation catalog. You could fix that by adding > gettext_noop() calls around them: > > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg(SlotIsLogical(s) > ? gettext_noop("acquired logical replication slot \"%s\"") > : gettext_noop("acquired physical replication slot \"%s\""), > NameStr(s->data.name))); > > but at that point it's not clear that it's really better than putting > the ternary in the outer scope. I retained wrapping messages in errmsg("..."). > You can verify this by doing "make update-po" and then searching for the > messages in postgres.pot. Translation gives me [1] with v18 patch PSA v18 patch. [1] #: replication/slot.c:545 #, c-format msgid "acquired logical replication slot \"%s\"" msgstr "" #: replication/slot.c:547 #, c-format msgid "acquired physical replication slot \"%s\"" msgstr "" #: replication/slot.c:622 #, c-format msgid "released logical replication slot \"%s\"" msgstr "" #: replication/slot.c:624 #, c-format msgid "released physical replication slot \"%s\"" msgstr "" -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Thu, Nov 16, 2023 at 6:09 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Nov 16, 2023 at 4:01 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2023-Nov-16, Peter Smith wrote: > > > > > I searched HEAD code and did not find any "translator:" comments for > > > just ordinary slot name substitutions like these; AFAICT that comment > > > is not necessary anymore. > > > > True. Lose that. > > Removed. > > > The rationale I have is to consider whether a translator looking at the > > original message message in isolation is going to understand what the %s > > means. If it's possible to tell what it is without having to go read > > the source code that leads to the message, then you don't need a > > "translator:" comment. Otherwise you do. > > Agree. I think it's easy for one to guess the slot name follows ".... > replication slot %s", so I removed the translator message. > > > > SUGGESTION (#1a and #1b) > > > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > > errmsg(SlotIsLogical(s) > > > ? "acquired logical replication slot \"%s\"" > > > : "acquired physical replication slot \"%s\"", > > > NameStr(s->data.name))); > > > > The bad thing about this is that gettext() is not going to pick up these > > strings into the translation catalog. You could fix that by adding > > gettext_noop() calls around them: > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > errmsg(SlotIsLogical(s) > > ? gettext_noop("acquired logical replication slot \"%s\"") > > : gettext_noop("acquired physical replication slot \"%s\""), > > NameStr(s->data.name))); > > > > but at that point it's not clear that it's really better than putting > > the ternary in the outer scope. > > I retained wrapping messages in errmsg("..."). > > > You can verify this by doing "make update-po" and then searching for the > > messages in postgres.pot. > > Translation gives me [1] with v18 patch > > PSA v18 patch. > LGTM. I'll push this early next week unless there are further suggestions or comments. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
From
Amit Kapila
Date:
On Sat, Nov 18, 2023 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 16, 2023 at 6:09 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > PSA v18 patch. > > > > LGTM. I'll push this early next week unless there are further > suggestions or comments. > Pushed. -- With Regards, Amit Kapila.