Re: more descriptive message for process termination due to max_slot_wal_keep_size - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: more descriptive message for process termination due to max_slot_wal_keep_size
Date
Msg-id 03ee95c3-6496-90b7-5c8b-13fe69ee73dd@amazon.com
Whole thread Raw
In response to Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: more descriptive message for process termination due to max_slot_wal_keep_size
List pgsql-hackers

Hi,

On 9/6/22 7:53 AM, Kyotaro Horiguchi wrote:
At Mon, 5 Sep 2022 11:56:33 +0200, "Drouvot, Bertrand" <bdrouvot@amazon.com> wrote in
Hi,

On 3/2/22 7:37 AM, Kyotaro Horiguchi wrote:
At Tue, 04 Jan 2022 10:29:31 +0900 (JST), Kyotaro
Horiguchi<horikyota.ntt@gmail.com> wrote in
So what do you say if I propose the following?

LOG:  terminating process %d to release replication slot \"%s\"
because its restart_lsn %X/%X exceeds the limit %X/%X
HINT: You might need to increase max_slot_wal_keep_size.
This version emits the following message.

[35785:checkpointer] LOG: terminating process 36368 to release
replication slot "s1" because its restart_lsn 0/1F000148 exceeds the
limit 0/21000000
[35785:checkpointer] HINT: You might need to increase
max_slot_wal_keep_size.
As the hint is to increase max_slot_wal_keep_size, what about
reporting the difference in size (rather than the limit lsn)?
Something along those lines?

[35785:checkpointer] LOG: terminating process 36368 to release
replication slot "s1" because its restart_lsn 0/1F000148 exceeds the
limit by <NNN MB>.
Thanks! That might be more sensible exactly for the reason you
mentioned.  One issue doing that is size_pretty is dbsize.c local
function. Since the size is less than kB in many cases, we cannot use
fixed unit for that.

Thanks for the new patch version!. I did not realized (sorry about that) that we'd need to expose byte_size_pretty(). Now I wonder if we should not simply report the number of bytes (like I can see it is done in many places). So something like:

@@ -1298,9 +1298,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
                                byte_size_pretty(buf, sizeof(buf),
                                                                 oldestLSN - restart_lsn);
                                ereport(LOG,
-                                               (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit by %s",
+                                               (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit by %lu bytes",
                                                                active_pid, NameStr(slotname),
-                                                               LSN_FORMAT_ARGS(restart_lsn), buf),
+                                                               LSN_FORMAT_ARGS(restart_lsn), oldestLSN - restart_lsn),
                                                 errhint("You might need to increase max_slot_wal_keep_size.")));

and then forget about exposing/using byte_size_pretty() (that would be more consistent with the same kind of reporting in the existing code).

What do you think?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: "kuroda.hayato@fujitsu.com"
Date:
Subject: RE: test_decoding assertion failure for the loss of top-sub transaction relationship