Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Conflict Detection and Resolution
Date
Msg-id CAD21AoAa6JzqhXY02uNUPb-aTozu2RY9nMdD1=TUh+FpskkYtw@mail.gmail.com
Whole thread Raw
In response to RE: Conflict Detection and Resolution  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Conflict Detection and Resolution
RE: Conflict Detection and Resolution
List pgsql-hackers
On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> This time at PGconf.dev[1], we had some discussions regarding this
> project. The proposed approach is to split the work into two main
> components. The first part focuses on conflict detection, which aims to
> identify and report conflicts in logical replication. This feature will
> enable users to monitor the unexpected conflicts that may occur. The
> second part involves the actual conflict resolution. Here, we will provide
> built-in resolutions for each conflict and allow user to choose which
> resolution will be used for which conflict(as described in the initial
> email of this thread).

I agree with this direction that we focus on conflict detection (and
logging) first and then develop conflict resolution on top of that.

>
> Of course, we are open to alternative ideas and suggestions, and the
> strategy above can be changed based on ongoing discussions and feedback
> received.
>
> Here is the patch of the first part work, which adds a new parameter
> detect_conflict for CREATE and ALTER subscription commands. This new
> parameter will decide if subscription will go for conflict detection. By
> default, conflict detection will be off for a subscription.
>
> When conflict detection is enabled, additional logging is triggered in the
> following conflict scenarios:
>
> * updating a row that was previously modified by another origin.
> * The tuple to be updated is not found.
> * The tuple to be deleted is not found.
>
> While there exist other conflict types in logical replication, such as an
> incoming insert conflicting with an existing row due to a primary key or
> unique index, these cases already result in constraint violation errors.

What does detect_conflict being true actually mean to users? I
understand that detect_conflict being true could introduce some
overhead to detect conflicts. But in terms of conflict detection, even
if detect_confict is false, we detect some conflicts such as
concurrent inserts with the same key. Once we introduce the complete
conflict detection feature, I'm not sure there is a case where a user
wants to detect only some particular types of conflict.

> Therefore, additional conflict detection for these cases is currently
> omitted to minimize potential overhead. However, the pre-detection for
> conflict in these error cases is still essential to support automatic
> conflict resolution in the future.

I feel that we should log all types of conflict in an uniform way. For
example, with detect_conflict being true, the update_differ conflict
is reported as "conflict %s detected on relation "%s"", whereas
concurrent inserts with the same key is reported as "duplicate key
value violates unique constraint "%s"", which could confuse users.
Ideally, I think that we log such conflict detection details (table
name, column name, conflict type, etc) to somewhere (e.g. a table or
server logs) so that the users can resolve them manually.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: [PATCH] pg_permissions
Next
From: vignesh C
Date:
Subject: Re: Logical Replication of sequences