Thread: Question on error code selection in conflict detection

Question on error code selection in conflict detection

From

Dilip Kumar

Date:

09 June, 16:44:41

Hi,

I was reviewing the code for conflict reporting and became curious
about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
typically signifies a serialization failure within a transaction under
serializable isolation, so its use here for a different type of
conflict seems somewhat out of place. I did notice its use in other
contexts for recovery conflicts in physical replication, which also
struck me as a bit unusual.

Given these observations, I'm wondering if it would be more
appropriate to introduce a new, more specific error code for this
purpose?


static int
errcode_apply_conflict(ConflictType type)
{
switch (type)
{
case CT_INSERT_EXISTS:
case CT_UPDATE_EXISTS:
case CT_MULTIPLE_UNIQUE_CONFLICTS:
return errcode(ERRCODE_UNIQUE_VIOLATION);
case CT_UPDATE_ORIGIN_DIFFERS:
case CT_UPDATE_MISSING:
case CT_DELETE_ORIGIN_DIFFERS:
case CT_DELETE_MISSING:
return errcode(ERRCODE_T_R_SERIALIZATION_FAILURE);
}

Assert(false);
return 0; /* silence compiler warning */
}

-- 
Regards,
Dilip Kumar
Google

Re: Question on error code selection in conflict detection

From

Robert Haas

Date:

09 June, 22:55:43

On Mon, Jun 9, 2025 at 9:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I was reviewing the code for conflict reporting and became curious
> about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
> typically signifies a serialization failure within a transaction under
> serializable isolation, so its use here for a different type of
> conflict seems somewhat out of place. I did notice its use in other
> contexts for recovery conflicts in physical replication, which also
> struck me as a bit unusual.
>
> Given these observations, I'm wondering if it would be more
> appropriate to introduce a new, more specific error code for this
> purpose?

Makes sense to me. I'm not sure if we should use a new error code or
some other existing one, but conflating other things with serializable
failures seems like a bad plan.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Question on error code selection in conflict detection

From

Amit Kapila

Date:

10 June, 09:09:08

On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I was reviewing the code for conflict reporting and became curious
> about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
> typically signifies a serialization failure within a transaction under
> serializable isolation, so its use here for a different type of
> conflict seems somewhat out of place. I did notice its use in other
> contexts for recovery conflicts in physical replication, which also
> struck me as a bit unusual.
>
> Given these observations, I'm wondering if it would be more
> appropriate to introduce a new, more specific error code for this
> purpose?
>

Can we instead try to use other suitable existing error codes?

CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS →
ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000)
These represent cases where the row exists but differs from the
expected state, conceptually similar to a triggered data change
invalidating the operation.

I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for
the above, but that sounds to be fit for a generic error that occurs
during the execution of a triggered action (e.g., a BEFORE or AFTER
trigger).

CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000)
These are straightforward cases where the target row is missing,
aligning well with the standard meaning of 02000.

I don't have good ideas on the cases for physical replication, as
those seem quite different; we can consider those separately.

Thoughts?

--
With Regards,
Amit Kapila.

Re: Question on error code selection in conflict detection

From

Dilip Kumar

Date:

10 June, 09:44:40

On Tue, Jun 10, 2025 at 1:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jun 9, 2025 at 9:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I was reviewing the code for conflict reporting and became curious
> > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
> > typically signifies a serialization failure within a transaction under
> > serializable isolation, so its use here for a different type of
> > conflict seems somewhat out of place. I did notice its use in other
> > contexts for recovery conflicts in physical replication, which also
> > struck me as a bit unusual.
> >
> > Given these observations, I'm wondering if it would be more
> > appropriate to introduce a new, more specific error code for this
> > purpose?
>
> Makes sense to me. I'm not sure if we should use a new error code or
> some other existing one, but conflating other things with serializable
> failures seems like a bad plan.

Yeah we may use existing as well if we find some appropriate error codes.

--
Regards,
Dilip Kumar
Google

Re: Question on error code selection in conflict detection

From

Dilip Kumar

Date:

10 June, 09:44:52

On Tue, Jun 10, 2025 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I was reviewing the code for conflict reporting and became curious
> > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
> > typically signifies a serialization failure within a transaction under
> > serializable isolation, so its use here for a different type of
> > conflict seems somewhat out of place. I did notice its use in other
> > contexts for recovery conflicts in physical replication, which also
> > struck me as a bit unusual.
> >
> > Given these observations, I'm wondering if it would be more
> > appropriate to introduce a new, more specific error code for this
> > purpose?
> >
>
> Can we instead try to use other suitable existing error codes?

Yeah we can try to do that as well.

> CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS →
> ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000)
> These represent cases where the row exists but differs from the
> expected state, conceptually similar to a triggered data change
> invalidating the operation.

Yeah this looks much better than what we already have.

> I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for
> the above, but that sounds to be fit for a generic error that occurs
> during the execution of a triggered action (e.g., a BEFORE or AFTER
> trigger).

Right

> CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000)
> These are straightforward cases where the target row is missing,
> aligning well with the standard meaning of 02000.

Yeah this looks good.

> I don't have good ideas on the cases for physical replication, as
> those seem quite different; we can consider those separately.

Yeah we can do that separately, maybe I put more thought on that and
send my proposal.

> Thoughts?

Okay I will put more thought about the proposed error code and also
see what others have to say and if we have a consensus I can provide
the patch.

--
Regards,
Dilip Kumar
Google

Re: Question on error code selection in conflict detection

From

shveta malik

Date:

11 June, 13:27:21

On Wed, Jun 11, 2025 at 11:05 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 10, 2025 at 12:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Jun 10, 2025 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I was reviewing the code for conflict reporting and became curious
> > > > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
> > > > typically signifies a serialization failure within a transaction under
> > > > serializable isolation, so its use here for a different type of
> > > > conflict seems somewhat out of place. I did notice its use in other
> > > > contexts for recovery conflicts in physical replication, which also
> > > > struck me as a bit unusual.
> > > >
> > > > Given these observations, I'm wondering if it would be more
> > > > appropriate to introduce a new, more specific error code for this
> > > > purpose?
> > > >
> > >
> > > Can we instead try to use other suitable existing error codes?
> >
> > Yeah we can try to do that as well.
> >
> > > CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS →
> > > ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000)
> > > These represent cases where the row exists but differs from the
> > > expected state, conceptually similar to a triggered data change
> > > invalidating the operation.
> >
> > Yeah this looks much better than what we already have.
> >
> > > I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for
> > > the above, but that sounds to be fit for a generic error that occurs
> > > during the execution of a triggered action (e.g., a BEFORE or AFTER
> > > trigger).
> >
> > Right
> >
> > > CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000)
> > > These are straightforward cases where the target row is missing,
> > > aligning well with the standard meaning of 02000.
> >
> > Yeah this looks good.
> >
> > > I don't have good ideas on the cases for physical replication, as
> > > those seem quite different; we can consider those separately.
> >
> > Yeah we can do that separately, maybe I put more thought on that and
> > send my proposal.
> >
> > > Thoughts?
> >
> > Okay I will put more thought about the proposed error code and also
> > see what others have to say and if we have a consensus I can provide
> > the patch.
>
> After reviewing other error codes, these appear to be the most
> relevant in this context.

+1. On digging deep, among existing codes, these are the most relevant ones.

> PFA patch for the same, I will analyze the
> error code in physical replication recovery conflict and propose what
> is relevant there in a separate patch.
>

The patch looks good.

thanks
Shveta

Re: Question on error code selection in conflict detection

From

Robert Haas

Date:

11 June, 17:03:10

On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Can we instead try to use other suitable existing error codes?

Why?

I mean, I'm not 100% against using existing error codes, but I feel
like we might be distorting the meanings of the existing error codes.
If we used new error codes, then people could test for those and know
that they would get exactly these conditions and nothing else.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Question on error code selection in conflict detection

From

Dilip Kumar

Date:

11 June, 17:32:08

On Wed, Jun 11, 2025 at 7:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Can we instead try to use other suitable existing error codes?
>
> Why?
>
> I mean, I'm not 100% against using existing error codes, but I feel
> like we might be distorting the meanings of the existing error codes.
> If we used new error codes, then people could test for those and know
> that they would get exactly these conditions and nothing else.
>

To enhance the clarity and specificity of our error reporting,
particularly for logical replication conflicts, I suggest we consider
defining a dedicated class of error codes, much like we have for FDWs.
IMHO this would be a more future-proof approach, given the potential
for many new conflict detection types in the future.

--
Regards,
Dilip Kumar
Google

Re: Question on error code selection in conflict detection

From

Robert Haas

Date:

12 June, 16:26:10

On Thu, Jun 12, 2025 at 3:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> cases like UPDATE_MISSING, DELETE_MISSING, the existing code
> ERRCODE_NO_DATA_FOUND seems to be an exact match. The LOG message
> appears when we don't find the row to be updated or deleted while
> applying changes. This can happen if someone deletes the required rows
> on the subscriber. This case is similar to unique_key_violation where
> we report ERRCODE_UNIQUE_VIOLATION when, during apply, we found the
> row with the same key exists (for example, cases like INSERT_EXISTS or
> UPDATE_EXISTS). So, I can't think of a reason to use a different
> error_code for these cases.

Well, ERRCODE_NO_DATA_FOUND is "Class P0 - PL/pgSQL Error," and it
normally occurs when STRICT was used to say that SELECT INTO should
return exactly one row. This is a completely different part of the
system and a completely different situation. I see that one use of
ERRCODE_NO_DATA_FOUND has also found its way into tablecmds.c, but
that is probably also a mistake that should be fixed.

> Now, the error_code proposed for the other two cases
> UPDATE_ORIGIN_DIFFERS, DELETE_ORIGIN_DIFFERS is debatable. The
> situation in these cases is that the row exists but differs from the
> expected state (already changed by a different origin). As of now, for
> these cases, we LOG the message and update the existing rows. I
> thought of using ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION because we
> are using it for the case where the tuple to be updated is already
> updated/deleted, which is also the case here, but the reason for the
> 'already update/delete' is different. So, this is not a perfect match,
> but it is worth considering.

This error is currently used for situations involving triggers, which
may be why it has TRIGGERED in the name. This situation is different.

> The other code worth considering is
> ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE for UPDATE_ORIGIN_DIFFERS and
> DELETE_ORIGIN_DIFFERS cases. We use this error code when we try to
> perform some operation, but the required object (say a tuple) is not
> in the expected state. Currently, we use it for tuples in the
> following cases:

The point is that we shouldn't just consider the name of the error
code. We should consider more broadly: what is the error code
category? How is the error code currently used and is this consistent?
People want to be able to filter logs by error code to find the events
that they care about, and if you just lump a bunch of quasi-related
things under one error code because technically the wording of the
error code would fit the situation, then they can't do that.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Question on error code selection in conflict detection

From

Amit Kapila

Date:

13 June, 12:49:07

On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I was reviewing the code for conflict reporting and became curious
> about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code
> typically signifies a serialization failure within a transaction under
> serializable isolation, so its use here for a different type of
> conflict seems somewhat out of place. I did notice its use in other
> contexts for recovery conflicts in physical replication, which also
> struck me as a bit unusual.
>

I examined the usage of ERRCODE_T_R_SERIALIZATION_FAILURE in the code,
and it appears that, apart from logical and recovery conflicts, it is
also used in places other than the cases under the serializable
isolation level. For example,

if (ItemPointerIndicatesMovedPartitions(tid))
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("tuple to be locked was already moved to another partition due
to concurrent update")));

if (ItemPointerIndicatesMovedPartitions(&context->tmfd.ctid))
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("tuple to be merged was already moved to another partition due
to concurrent update")));

Then it is also used in ExecCheckTupleVisible and execReplication.c
without serializable isolation. Seeing this, it seems SERIALIZATION in
the errod_code refers to the case where the transaction could not be
serialized due to concurrent access conflicts. Now, we typically use
it for serializable isolation, but its use at other places where
transactions attempt to modify the same data simultaneously doesn't
seem unreasonable.

> Given these observations, I'm wondering if it would be more
> appropriate to introduce a new, more specific error code for this
> purpose?
>

If we want to have separate errorcode(s) for logical and recovery
conflicts, one possibility is to add new codes in Class 55 (Object Not
In Prerequisite State (class borrowed from DB2)).

ERRCODE_CONFLICT_CANT_ACCESS: all recovery conflicts, delete_missing,
and update_missing
ERRCODE_CONFLICT_CANT_MODIFY : update_origin_differ and delete_origin_differ

The reason I thought class 55 is suitable because it has other codes
like ERRCODE_CANT_CHANGE_RUNTIME_PARAM,
ERRCODE_UNSAFE_NEW_ENUM_VALUE_USAGE,
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE, which to me appear to have
some similarity. I have checked the same class in DB2 [1] and it had a
few related codes like:
55019 The object is in an invalid state for the operation.
55079 The operation cannot be performed because the XML column is not
in the versioning format.

[1] -
https://www.ibm.com/docs/en/db2-for-zos/12.0.0?topic=codes-sqlstate-values-common-error#db2z_sqlstatevalues__classcode55

--
With Regards,
Amit Kapila.

Re: Question on error code selection in conflict detection

From

Amit Kapila

Date:

13 June, 12:58:42

On Thu, Jun 12, 2025 at 6:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jun 12, 2025 at 3:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > cases like UPDATE_MISSING, DELETE_MISSING, the existing code
> > ERRCODE_NO_DATA_FOUND seems to be an exact match. The LOG message
> > appears when we don't find the row to be updated or deleted while
> > applying changes. This can happen if someone deletes the required rows
> > on the subscriber. This case is similar to unique_key_violation where
> > we report ERRCODE_UNIQUE_VIOLATION when, during apply, we found the
> > row with the same key exists (for example, cases like INSERT_EXISTS or
> > UPDATE_EXISTS). So, I can't think of a reason to use a different
> > error_code for these cases.
>
> Well, ERRCODE_NO_DATA_FOUND is "Class P0 - PL/pgSQL Error," and it
> normally occurs when STRICT was used to say that SELECT INTO should
> return exactly one row. This is a completely different part of the
> system and a completely different situation. I see that one use of
> ERRCODE_NO_DATA_FOUND has also found its way into tablecmds.c, but
> that is probably also a mistake that should be fixed.
>

Right, and I agree we should work on changing the use of error code
ERRCODE_NO_DATA_FOUND in tablecmds.c.

--
With Regards,
Amit Kapila.