Thread: Re: Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots

Hi Nisha,

Some review comments for the patch v1-0002.

======
src/backend/replication/slot.c

1.
+
+/*
+ * Raise an error based on the slot's invalidation cause.
+ */
+static void
+RaiseSlotInvalidationError(ReplicationSlot *slot)
+{
+ StringInfo err_detail = makeStringInfo();
+
+ Assert(slot->data.invalidated != RS_INVAL_NONE);
+
+ switch (slot->data.invalidated)
+ {
+ case RS_INVAL_WAL_REMOVED:
+ appendStringInfoString(err_detail, _("This slot has been invalidated
because the required WAL has been removed."));
+ break;
+
+ case RS_INVAL_HORIZON:
+ appendStringInfoString(err_detail, _("This slot has been invalidated
because the required rows have been removed."));
+ break;
+
+ case RS_INVAL_WAL_LEVEL:
+ /* translator: %s is a GUC variable name */
+ appendStringInfo(err_detail, _("This slot has been invalidated
because \"%s\" is insufficient for slot."),
+ "wal_level");
+ break;
+
+ case RS_INVAL_NONE:
+ pg_unreachable();
+ }
+
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+    NameStr(slot->data.name)),
+ errdetail_internal("%s", err_detail->data));
+}

AFAIK the _() is a macro for gettext(). So those strings are intended
for translation, right? There's also a "/* translator: ..." comment
implying the same.

OTOH, those strings are only being used by errdetail_internal, whose
function comment says "This is exactly like errdetail() except that
strings passed to errdetail_internal are not translated...".

Isn't there some contradiction here?

Perhaps the _() macro is not needed, or perhaps the
errdetail_internal() should be errdetail().

======
Kind Regards,
Peter Smith.
Fujitsu Australia



On Wed, Jan 29, 2025 at 8:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> ======
> src/backend/replication/slot.c
>
> 1.
> +
> +/*
> + * Raise an error based on the slot's invalidation cause.
> + */
> +static void
> +RaiseSlotInvalidationError(ReplicationSlot *slot)
> +{
> + StringInfo err_detail = makeStringInfo();
> +
> + Assert(slot->data.invalidated != RS_INVAL_NONE);
> +
> + switch (slot->data.invalidated)
> + {
> + case RS_INVAL_WAL_REMOVED:
> + appendStringInfoString(err_detail, _("This slot has been invalidated
> because the required WAL has been removed."));
> + break;
> +
> + case RS_INVAL_HORIZON:
> + appendStringInfoString(err_detail, _("This slot has been invalidated
> because the required rows have been removed."));
> + break;
> +
> + case RS_INVAL_WAL_LEVEL:
> + /* translator: %s is a GUC variable name */
> + appendStringInfo(err_detail, _("This slot has been invalidated
> because \"%s\" is insufficient for slot."),
> + "wal_level");
> + break;
> +
> + case RS_INVAL_NONE:
> + pg_unreachable();
> + }
> +
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("can no longer get changes from replication slot \"%s\"",
> +    NameStr(slot->data.name)),
> + errdetail_internal("%s", err_detail->data));
> +}
>
> AFAIK the _() is a macro for gettext(). So those strings are intended
> for translation, right? There's also a "/* translator: ..." comment
> implying the same.
>
> OTOH, those strings are only being used by errdetail_internal, whose
> function comment says "This is exactly like errdetail() except that
> strings passed to errdetail_internal are not translated...".
>
> Isn't there some contradiction here?
>

Yeah, I also think so but I see some other similar usages. For
example, parse_one_reloption() uses errdetail_internal("%s",
_(optenum->detailmsg)) : 0));

There are various other places in the code where _( is used for
messages in errmsg_internal.

> Perhaps the _() macro is not needed, or perhaps the
> errdetail_internal() should be errdetail().
>

These messages should be with errdetail. We already have these
messages being displayed as errdetail in CreateDecodingContext().
Also, after this patch, shouldn't we remove the same error cases in
CreateDecodingContext(). There is already an
Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE) which
should be sufficient after this patch to ensure we didn't miss any
error cases.

--
With Regards,
Amit Kapila.



On Wed, Jan 29, 2025 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+    NameStr(slot->data.name)),
+ errdetail_internal("%s", err_detail->data));

ERROR message appears to be specific to logical slots, it will occur
for physical slots as well, so I suggest changing from "can no longer
get changes from replication slot" to "can no longer access from
replication slot"

The errdetail should be used as we have in ReplicationSlotAlter.
errdetail("This replication slot has been invalidated due to \"%s\".",
SlotInvalidationCauses[MyReplicationSlot->data.invalidated]));

You have given the following reason in another thread for not giving a
message during acquire in ReplicationSlotAlter: "Because
ReplicationSlotAlter() already handles errors immediately after
acquiring the slot. It raises errors for invalidated slots and also
raises a different error message if the slot is a physical one. So, In
case of ALTER, I feel it is okay to acquire the slot first without
raising errors and then handle errors in the pre-defined way. Similar
immediate error handling is not available at other places."

It is better to unify the error in one place rather than at different
places. One benefit is error message contents will be the same which
is currently not the case.

BTW, in the commit message, you only mention the place where we get
the message without a patch during logical replication, it is better
to mention the timing for physical replication as well.

--
With Regards,
Amit Kapila.