Re: more descriptive message for process termination due to max_slot_wal_keep_size - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: more descriptive message for process termination due to max_slot_wal_keep_size
Date
Msg-id 20220302.175520.2160160005149599370.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
At Wed, 02 Mar 2022 15:37:19 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> The CI was confused by the mixed patches for multiple PG versions. In
> this version the patchset for master are attached as .patch and that
> for PG13 as .txt.

Yeah.... It is of course the relevant check should be fixed.  The
attached v5 adjusts 019_replslot_limit.pl.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 4909db2893f0b33ab6bca1ffc3ad802cd159c577 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 24 Dec 2021 12:52:07 +0900
Subject: [PATCH v5 1/2] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie <alex@altmetric.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index caa6b29756..92f19bcb35 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1300,8 +1300,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             if (last_signaled_pid != active_pid)
             {
                 ereport(LOG,
-                        (errmsg("terminating process %d to release replication slot \"%s\"",
-                                active_pid, NameStr(slotname))));
+                        (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn
%X/%Xexceeds max_slot_wal_keep_size",
 
+                                active_pid, NameStr(slotname),
+                                LSN_FORMAT_ARGS(restart_lsn))));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
-- 
2.27.0

From 7a9e571e8cf4e5ffc13d101733c1b6fc455b1aec Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 24 Dec 2021 12:58:25 +0900
Subject: [PATCH v5 2/2] Add detailed information to slot-invalidation messages

The messages says just "exceeds max_slot_wal_keep_size" but doesn't
tell the concrete LSN of the limit. That information helps operators'
understanding on the issue.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c            | 12 ++++++++----
 src/test/recovery/t/019_replslot_limit.pl |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 92f19bcb35..be21b62add 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1300,9 +1300,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             if (last_signaled_pid != active_pid)
             {
                 ereport(LOG,
-                        (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn
%X/%Xexceeds max_slot_wal_keep_size",
 
+                        (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn
%X/%Xexceeds the limit %X/%X",
 
                                 active_pid, NameStr(slotname),
-                                LSN_FORMAT_ARGS(restart_lsn))));
+                                LSN_FORMAT_ARGS(restart_lsn),
+                                LSN_FORMAT_ARGS(oldestLSN)),
+                         errhint("You might need to increase max_slot_wal_keep_size.")));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
@@ -1345,9 +1347,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             ReplicationSlotRelease();
 
             ereport(LOG,
-                    (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+                    (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
                             NameStr(slotname),
-                            LSN_FORMAT_ARGS(restart_lsn))));
+                            LSN_FORMAT_ARGS(restart_lsn),
+                            LSN_FORMAT_ARGS(oldestLSN)),
+                     errhint("You might need to increase max_slot_wal_keep_size.")));
 
             /* done with this slot for now */
             break;
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 9bb71b62c0..bac496a9cf 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -188,7 +188,7 @@ for (my $i = 0; $i < 10000; $i++)
 {
     if (find_in_log(
             $node_primary,
-            "invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size",
+            "invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds the limit [0-9A-F/]+",
             $logstart))
     {
         $invalidated = 1;
-- 
2.27.0


pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Two noncritical bugs of pg_waldump
Next
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors