Thread: Question on error code selection in conflict detection
Hi, I was reviewing the code for conflict reporting and became curious about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code typically signifies a serialization failure within a transaction under serializable isolation, so its use here for a different type of conflict seems somewhat out of place. I did notice its use in other contexts for recovery conflicts in physical replication, which also struck me as a bit unusual. Given these observations, I'm wondering if it would be more appropriate to introduce a new, more specific error code for this purpose? static int errcode_apply_conflict(ConflictType type) { switch (type) { case CT_INSERT_EXISTS: case CT_UPDATE_EXISTS: case CT_MULTIPLE_UNIQUE_CONFLICTS: return errcode(ERRCODE_UNIQUE_VIOLATION); case CT_UPDATE_ORIGIN_DIFFERS: case CT_UPDATE_MISSING: case CT_DELETE_ORIGIN_DIFFERS: case CT_DELETE_MISSING: return errcode(ERRCODE_T_R_SERIALIZATION_FAILURE); } Assert(false); return 0; /* silence compiler warning */ } -- Regards, Dilip Kumar Google
On Mon, Jun 9, 2025 at 9:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I was reviewing the code for conflict reporting and became curious > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code > typically signifies a serialization failure within a transaction under > serializable isolation, so its use here for a different type of > conflict seems somewhat out of place. I did notice its use in other > contexts for recovery conflicts in physical replication, which also > struck me as a bit unusual. > > Given these observations, I'm wondering if it would be more > appropriate to introduce a new, more specific error code for this > purpose? Makes sense to me. I'm not sure if we should use a new error code or some other existing one, but conflating other things with serializable failures seems like a bad plan. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I was reviewing the code for conflict reporting and became curious > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code > typically signifies a serialization failure within a transaction under > serializable isolation, so its use here for a different type of > conflict seems somewhat out of place. I did notice its use in other > contexts for recovery conflicts in physical replication, which also > struck me as a bit unusual. > > Given these observations, I'm wondering if it would be more > appropriate to introduce a new, more specific error code for this > purpose? > Can we instead try to use other suitable existing error codes? CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS → ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000) These represent cases where the row exists but differs from the expected state, conceptually similar to a triggered data change invalidating the operation. I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for the above, but that sounds to be fit for a generic error that occurs during the execution of a triggered action (e.g., a BEFORE or AFTER trigger). CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000) These are straightforward cases where the target row is missing, aligning well with the standard meaning of 02000. I don't have good ideas on the cases for physical replication, as those seem quite different; we can consider those separately. Thoughts? -- With Regards, Amit Kapila.
On Tue, Jun 10, 2025 at 1:25 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jun 9, 2025 at 9:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I was reviewing the code for conflict reporting and became curious > > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code > > typically signifies a serialization failure within a transaction under > > serializable isolation, so its use here for a different type of > > conflict seems somewhat out of place. I did notice its use in other > > contexts for recovery conflicts in physical replication, which also > > struck me as a bit unusual. > > > > Given these observations, I'm wondering if it would be more > > appropriate to introduce a new, more specific error code for this > > purpose? > > Makes sense to me. I'm not sure if we should use a new error code or > some other existing one, but conflating other things with serializable > failures seems like a bad plan. Yeah we may use existing as well if we find some appropriate error codes. -- Regards, Dilip Kumar Google
On Tue, Jun 10, 2025 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I was reviewing the code for conflict reporting and became curious > > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code > > typically signifies a serialization failure within a transaction under > > serializable isolation, so its use here for a different type of > > conflict seems somewhat out of place. I did notice its use in other > > contexts for recovery conflicts in physical replication, which also > > struck me as a bit unusual. > > > > Given these observations, I'm wondering if it would be more > > appropriate to introduce a new, more specific error code for this > > purpose? > > > > Can we instead try to use other suitable existing error codes? Yeah we can try to do that as well. > CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS → > ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000) > These represent cases where the row exists but differs from the > expected state, conceptually similar to a triggered data change > invalidating the operation. Yeah this looks much better than what we already have. > I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for > the above, but that sounds to be fit for a generic error that occurs > during the execution of a triggered action (e.g., a BEFORE or AFTER > trigger). Right > CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000) > These are straightforward cases where the target row is missing, > aligning well with the standard meaning of 02000. Yeah this looks good. > I don't have good ideas on the cases for physical replication, as > those seem quite different; we can consider those separately. Yeah we can do that separately, maybe I put more thought on that and send my proposal. > Thoughts? Okay I will put more thought about the proposed error code and also see what others have to say and if we have a consensus I can provide the patch. -- Regards, Dilip Kumar Google
On Wed, Jun 11, 2025 at 11:05 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 10, 2025 at 12:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 10, 2025 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Jun 9, 2025 at 7:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I was reviewing the code for conflict reporting and became curious > > > > about the choice of ERRCODE_T_R_SERIALIZATION_FAILURE. This error code > > > > typically signifies a serialization failure within a transaction under > > > > serializable isolation, so its use here for a different type of > > > > conflict seems somewhat out of place. I did notice its use in other > > > > contexts for recovery conflicts in physical replication, which also > > > > struck me as a bit unusual. > > > > > > > > Given these observations, I'm wondering if it would be more > > > > appropriate to introduce a new, more specific error code for this > > > > purpose? > > > > > > > > > > Can we instead try to use other suitable existing error codes? > > > > Yeah we can try to do that as well. > > > > > CT_UPDATE_ORIGIN_DIFFERS, CT_DELETE_ORIGIN_DIFFERS → > > > ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION (27000) > > > These represent cases where the row exists but differs from the > > > expected state, conceptually similar to a triggered data change > > > invalidating the operation. > > > > Yeah this looks much better than what we already have. > > > > > I have also considered using ERRCODE_TRIGGERED_ACTION_EXCEPTION for > > > the above, but that sounds to be fit for a generic error that occurs > > > during the execution of a triggered action (e.g., a BEFORE or AFTER > > > trigger). > > > > Right > > > > > CT_UPDATE_MISSING, CT_DELETE_MISSING → ERRCODE_NO_DATA_FOUND (02000) > > > These are straightforward cases where the target row is missing, > > > aligning well with the standard meaning of 02000. > > > > Yeah this looks good. > > > > > I don't have good ideas on the cases for physical replication, as > > > those seem quite different; we can consider those separately. > > > > Yeah we can do that separately, maybe I put more thought on that and > > send my proposal. > > > > > Thoughts? > > > > Okay I will put more thought about the proposed error code and also > > see what others have to say and if we have a consensus I can provide > > the patch. > > After reviewing other error codes, these appear to be the most > relevant in this context. +1. On digging deep, among existing codes, these are the most relevant ones. > PFA patch for the same, I will analyze the > error code in physical replication recovery conflict and propose what > is relevant there in a separate patch. > The patch looks good. thanks Shveta
On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Can we instead try to use other suitable existing error codes? Why? I mean, I'm not 100% against using existing error codes, but I feel like we might be distorting the meanings of the existing error codes. If we used new error codes, then people could test for those and know that they would get exactly these conditions and nothing else. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jun 11, 2025 at 7:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jun 10, 2025 at 2:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Can we instead try to use other suitable existing error codes? > > Why? > > I mean, I'm not 100% against using existing error codes, but I feel > like we might be distorting the meanings of the existing error codes. > If we used new error codes, then people could test for those and know > that they would get exactly these conditions and nothing else. > To enhance the clarity and specificity of our error reporting, particularly for logical replication conflicts, I suggest we consider defining a dedicated class of error codes, much like we have for FDWs. IMHO this would be a more future-proof approach, given the potential for many new conflict detection types in the future. -- Regards, Dilip Kumar Google
On Thu, Jun 12, 2025 at 3:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > cases like UPDATE_MISSING, DELETE_MISSING, the existing code > ERRCODE_NO_DATA_FOUND seems to be an exact match. The LOG message > appears when we don't find the row to be updated or deleted while > applying changes. This can happen if someone deletes the required rows > on the subscriber. This case is similar to unique_key_violation where > we report ERRCODE_UNIQUE_VIOLATION when, during apply, we found the > row with the same key exists (for example, cases like INSERT_EXISTS or > UPDATE_EXISTS). So, I can't think of a reason to use a different > error_code for these cases. Well, ERRCODE_NO_DATA_FOUND is "Class P0 - PL/pgSQL Error," and it normally occurs when STRICT was used to say that SELECT INTO should return exactly one row. This is a completely different part of the system and a completely different situation. I see that one use of ERRCODE_NO_DATA_FOUND has also found its way into tablecmds.c, but that is probably also a mistake that should be fixed. > Now, the error_code proposed for the other two cases > UPDATE_ORIGIN_DIFFERS, DELETE_ORIGIN_DIFFERS is debatable. The > situation in these cases is that the row exists but differs from the > expected state (already changed by a different origin). As of now, for > these cases, we LOG the message and update the existing rows. I > thought of using ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION because we > are using it for the case where the tuple to be updated is already > updated/deleted, which is also the case here, but the reason for the > 'already update/delete' is different. So, this is not a perfect match, > but it is worth considering. This error is currently used for situations involving triggers, which may be why it has TRIGGERED in the name. This situation is different. > The other code worth considering is > ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE for UPDATE_ORIGIN_DIFFERS and > DELETE_ORIGIN_DIFFERS cases. We use this error code when we try to > perform some operation, but the required object (say a tuple) is not > in the expected state. Currently, we use it for tuples in the > following cases: The point is that we shouldn't just consider the name of the error code. We should consider more broadly: what is the error code category? How is the error code currently used and is this consistent? People want to be able to filter logs by error code to find the events that they care about, and if you just lump a bunch of quasi-related things under one error code because technically the wording of the error code would fit the situation, then they can't do that. -- Robert Haas EDB: http://www.enterprisedb.com