more descriptive message for process termination due to max_slot_wal_keep_size - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | more descriptive message for process termination due to max_slot_wal_keep_size |
Date | |
Msg-id | 20211214.130456.2233153190058148084.horikyota.ntt@gmail.com Whole thread Raw |
Responses |
Re: more descriptive message for process termination due to max_slot_wal_keep_size
Re: more descriptive message for process termination due to max_slot_wal_keep_size |
List | pgsql-hackers |
Hello. As complained in pgsql-bugs [1], when a process is terminated due to max_slot_wal_keep_size, the related messages don't mention the root cause for *the termination*. Note that the third message does not show for temporary replication slots. [pid=a] LOG: terminating process x to release replication slot "s" [pid=x] LOG: FATAL: terminating connection due to administrator command [pid=a] LOG: invalidting slot "s" because its restart_lsn X/X exceeds max_slot_wal_keep_size The attached patch attaches a DETAIL line to the first message. > [17605] LOG: terminating process 17614 to release replication slot "s1" + [17605] DETAIL: The slot's restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size. > [17614] FATAL: terminating connection due to administrator command > [17605] LOG: invalidating slot "s1" because its restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size Somewhat the second and fourth lines look inconsistent each other but that wouldn't be such a problem. I don't think we want to concatenate the two lines together as the result is a bit too long. > LOG: terminating process 17614 to release replication slot "s1" because it's restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size. What do you think about this? [1] https://www.postgresql.org/message-id/20211214.101137.379073733372253470.horikyota.ntt%40gmail.com -- Kyotaro Horiguchi NTT Open Source Software Center From b0c27dc80aff37ef984592b79f1dd20d052299fa Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Tue, 14 Dec 2021 10:50:00 +0900 Subject: [PATCH] Make an error message about process termination more descriptive If checkpointer kills a process due to a temporary replication slot exceeding max_slot_wal_keep_size, the messages fails to describe the cause of the termination. It is because the message that describes the reason that is emitted for persistent slots does not show for temporary slots. Add a DETAIL line to the message common to all types of slot to describe the cause. Reported-by: Alex Enachioaie <alex@altmetric.com> Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org --- src/backend/replication/slot.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 90ba9b417d..cba9a29113 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1254,7 +1254,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, { ereport(LOG, (errmsg("terminating process %d to release replication slot \"%s\"", - active_pid, NameStr(slotname)))); + active_pid, NameStr(slotname)), + errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn)))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; -- 2.27.0
pgsql-hackers by date: