Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From shveta malik
Subject Re: Conflict Detection and Resolution
Date
Msg-id CAJpy0uC4riK8e6hQt8jcU+nXYmRRjnbFEapYNbmxVYjENxTw2g@mail.gmail.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Conflict Detection and Resolution
Re: Conflict Detection and Resolution
List pgsql-hackers
On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > >
> > > >
> > > > Conflict Resolution
> > > > ----------------
> > > > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > > > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
>
> Can you share the use case of "earliest_timestamp_wins" resolution
> method? It seems after the initial update on the local node, it will
> never allow remote update to succeed which sounds a bit odd. Jan has
> shared this and similar concerns about this resolution method, so I
> have added him to the email as well.

I do not have the exact scenario for this.  But I feel, if 2 nodes are
concurrently inserting different data against a primary key, then some
users may have preferences that retain the row which was inserted
earlier. It is no different from latest_timestamp_wins. It totally
depends upon what kind of application and requirement the user may
have, based on which, he may discard the later coming rows (specially
for INSERT case).

> > > > Conflict Types:
> > > > ----------------
> > > > a) update_differ: The origin of an incoming update's key row differs
> > > > from the local row i.e.; the row has already been updated locally or
> > > > by different nodes.
> > > > b) update_missing: The row with the same value as that incoming
> > > > update's key does not exist. Remote is trying to update a row which
> > > > does not exist locally.
> > > > c) update_deleted: The row with the same value as that incoming
> > > > update's key does not exist. The row is already deleted. This conflict
> > > > type is generated only if the deleted row is still detectable i.e., it
> > > > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > > > it cannot detect this conflict. It will detect it as update_missing
> > > > and will follow the default or configured resolver of update_missing
> > > > itself.
> > > >
> > >
> > > I don't understand the why should update_missing or update_deleted be
> > > different, especially considering it's not detected reliably. And also
> > > that even if we happen to find the row the associated TOAST data may
> > > have already been removed. So why would this matter?
> >
> > Here, we are trying to tackle the case where the row is 'recently'
> > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> > want to opt for a different resolution in such a case as against the
> > one where the corresponding row was not even present in the first
> > place. The case where the row was deleted long back may not fall into
> > this category as there are higher chances that they have been removed
> > by vacuum and can be considered equivalent to the update_ missing
> > case.
> >
>
> I think to make 'update_deleted' work, we need another scan with a
> different snapshot type to find the recently deleted row. I don't know
> if it is a good idea to scan the index twice with different snapshots,
> so for the sake of simplicity, can we consider 'updated_deleted' same
> as 'update_missing'? If we think it is an important case to consider
> then we can try to accomplish this once we finalize the
> design/implementation of other resolution methods.

I think it is important for scenarios when data is being updated and
deleted concurrently. But yes, I agree that implementation may have
some performance hit for this case. We can tackle this scenario at a
later stage.

> > > >
> > > > To implement the above, subscription commands will be changed to have
> > > > one more parameter 'conflict_resolution=on/off', default will be OFF.
> > > >
> > > > To configure global resolvers, new DDL command will be introduced:
> > > >
> > > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>
> > > >
> > >
> > > I very much doubt we want a single global conflict resolver, or even one
> > > resolver per subscription. It seems like a very table-specific thing.
> >
>
> +1 to make it a table-level configuration but we probably need
> something at the global level as well such that by default if users
> don't define anything at table-level global-level configuration will
> be used.
>
> >
> > >
> > > Also, doesn't all this whole design ignore the concurrency between
> > > publishers? Isn't this problematic considering the commit timestamps may
> > > go backwards (for a given publisher), which means the conflict
> > > resolution is not deterministic (as it depends on how exactly it
> > > interleaves)?
> > >
>
> I am not able to imagine the cases you are worried about. Can you
> please be specific? Is it similar to the case I described in
> yesterday's email [1]?
>
> [1] -
https://www.postgresql.org/message-id/CAA4eK1JTMiBOoGqkt%3DaLPLU8Rs45ihbLhXaGHsz8XC76%2BOG3%2BQ%40mail.gmail.com
>

thanks
Shveta



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Logical Replication of sequences
Next
From: Amit Kapila
Date:
Subject: Re: Logical Replication of sequences