Re: more descriptive message for process termination due to max_slot_wal_keep_size - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: more descriptive message for process termination due to max_slot_wal_keep_size |
Date | |
Msg-id | 20220302.175520.2160160005149599370.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: more descriptive message for process termination due to max_slot_wal_keep_size (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
List | pgsql-hackers |
At Wed, 02 Mar 2022 15:37:19 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > The CI was confused by the mixed patches for multiple PG versions. In > this version the patchset for master are attached as .patch and that > for PG13 as .txt. Yeah.... It is of course the relevant check should be fixed. The attached v5 adjusts 019_replslot_limit.pl. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 4909db2893f0b33ab6bca1ffc3ad802cd159c577 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Fri, 24 Dec 2021 12:52:07 +0900 Subject: [PATCH v5 1/2] Make a message on process termination more dscriptive The message at process termination due to slot limit doesn't provide the reason. In the major scenario the message is followed by another message about slot invalidatation, which shows the detail for the termination. However the second message is missing if the slot is temporary one. Augment the first message with the reason same as the second message. Backpatch through to 13 where the message was introduced. Reported-by: Alex Enachioaie <alex@altmetric.com> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org Backpatch-through: 13 --- src/backend/replication/slot.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index caa6b29756..92f19bcb35 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1300,8 +1300,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, if (last_signaled_pid != active_pid) { ereport(LOG, - (errmsg("terminating process %d to release replication slot \"%s\"", - active_pid, NameStr(slotname)))); + (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%Xexceeds max_slot_wal_keep_size", + active_pid, NameStr(slotname), + LSN_FORMAT_ARGS(restart_lsn)))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; -- 2.27.0 From 7a9e571e8cf4e5ffc13d101733c1b6fc455b1aec Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Fri, 24 Dec 2021 12:58:25 +0900 Subject: [PATCH v5 2/2] Add detailed information to slot-invalidation messages The messages says just "exceeds max_slot_wal_keep_size" but doesn't tell the concrete LSN of the limit. That information helps operators' understanding on the issue. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org --- src/backend/replication/slot.c | 12 ++++++++---- src/test/recovery/t/019_replslot_limit.pl | 2 +- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 92f19bcb35..be21b62add 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1300,9 +1300,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, if (last_signaled_pid != active_pid) { ereport(LOG, - (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%Xexceeds max_slot_wal_keep_size", + (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%Xexceeds the limit %X/%X", active_pid, NameStr(slotname), - LSN_FORMAT_ARGS(restart_lsn)))); + LSN_FORMAT_ARGS(restart_lsn), + LSN_FORMAT_ARGS(oldestLSN)), + errhint("You might need to increase max_slot_wal_keep_size."))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; @@ -1345,9 +1347,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, ReplicationSlotRelease(); ereport(LOG, - (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size", + (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X", NameStr(slotname), - LSN_FORMAT_ARGS(restart_lsn)))); + LSN_FORMAT_ARGS(restart_lsn), + LSN_FORMAT_ARGS(oldestLSN)), + errhint("You might need to increase max_slot_wal_keep_size."))); /* done with this slot for now */ break; diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 9bb71b62c0..bac496a9cf 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -188,7 +188,7 @@ for (my $i = 0; $i < 10000; $i++) { if (find_in_log( $node_primary, - "invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size", + "invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds the limit [0-9A-F/]+", $logstart)) { $invalidated = 1; -- 2.27.0
pgsql-hackers by date: