Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From shveta malik
Subject Re: Conflict Detection and Resolution
Date
Msg-id CAJpy0uCov4JfZJeOvY0O21_gk9bcgNUDp4jf8+BbMp+EAv8cVQ@mail.gmail.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 6/11/24 10:35, shveta malik wrote:
> > On Mon, Jun 10, 2024 at 5:24 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >>
> >>
> >> On 6/10/24 12:56, shveta malik wrote:
> >>> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
> >>> <tomas.vondra@enterprisedb.com> wrote:
> >>>>
> >>>>>>>
> >>>>>>>  UPDATE
> >>>>>>> ================
> >>>>>>>
> >>>>>>> Conflict Detection Method:
> >>>>>>> --------------------------------
> >>>>>>> Origin conflict detection: The ‘origin’ info is used to detect
> >>>>>>> conflict which can be obtained from commit-timestamp generated for
> >>>>>>> incoming txn at the source node. To compare remote’s origin with the
> >>>>>>> local’s origin, we must have origin information for local txns as well
> >>>>>>> which can be obtained from commit-timestamp after enabling
> >>>>>>> ‘track_commit_timestamp’ locally.
> >>>>>>> The one drawback here is the ‘origin’ information cannot be obtained
> >>>>>>> once the row is frozen and the commit-timestamp info is removed by
> >>>>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
> >>>>>>> incoming changes will be applied in all the cases.
> >>>>>>>
> >>>>>>> Conflict Types:
> >>>>>>> ----------------
> >>>>>>> a) update_differ: The origin of an incoming update's key row differs
> >>>>>>> from the local row i.e.; the row has already been updated locally or
> >>>>>>> by different nodes.
> >>>>>>> b) update_missing: The row with the same value as that incoming
> >>>>>>> update's key does not exist. Remote is trying to update a row which
> >>>>>>> does not exist locally.
> >>>>>>> c) update_deleted: The row with the same value as that incoming
> >>>>>>> update's key does not exist. The row is already deleted. This conflict
> >>>>>>> type is generated only if the deleted row is still detectable i.e., it
> >>>>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
> >>>>>>> it cannot detect this conflict. It will detect it as update_missing
> >>>>>>> and will follow the default or configured resolver of update_missing
> >>>>>>> itself.
> >>>>>>>
> >>>>>>
> >>>>>> I don't understand the why should update_missing or update_deleted be
> >>>>>> different, especially considering it's not detected reliably. And also
> >>>>>> that even if we happen to find the row the associated TOAST data may
> >>>>>> have already been removed. So why would this matter?
> >>>>>
> >>>>> Here, we are trying to tackle the case where the row is 'recently'
> >>>>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> >>>>> want to opt for a different resolution in such a case as against the
> >>>>> one where the corresponding row was not even present in the first
> >>>>> place. The case where the row was deleted long back may not fall into
> >>>>> this category as there are higher chances that they have been removed
> >>>>> by vacuum and can be considered equivalent to the update_ missing
> >>>>> case.
> >>>>>
> >>>>
> >>>> My point is that if we can't detect the difference reliably, it's not
> >>>> very useful. Consider this example:
> >>>>
> >>>> Node A:
> >>>>
> >>>>   T1: INSERT INTO t (id, value) VALUES (1,1);
> >>>>
> >>>>   T2: DELETE FROM t WHERE id = 1;
> >>>>
> >>>> Node B:
> >>>>
> >>>>   T3: UPDATE t SET value = 2 WHERE id = 1;
> >>>>
> >>>> The "correct" order of received messages on a third node is T1-T3-T2.
> >>>> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
> >>>> and so on. For T1-T2-T3 the right decision is to discard the update,
> >>>> while for T3-T1-T2 it's to either wait for the INSERT or wait for the
> >>>> insert to arrive.
> >>>>
> >>>> But if we misdetect the situation, we either end up with a row that
> >>>> shouldn't be there, or losing an update.
> >>>
> >>> Doesn't the above example indicate that 'update_deleted' should also
> >>> be considered a necessary conflict type?  Please see the possibilities
> >>> of conflicts in all three cases:
> >>>
> >>>
> >>> The "correct" order of receiving messages on node C (as suggested
> >>> above) is T1-T3-T2   (case1)
> >>> ----------
> >>> T1 will insert the row.
> >>> T3 will have update_differ conflict; latest_timestamp wins or apply
> >>> will apply it. earliest_timestamp_wins or skip will skip it.
> >>> T2 will delete the row (irrespective of whether the update happened or not).
> >>> End Result: No Data.
> >>>
> >>> T1-T2-T3
> >>> ----------
> >>> T1 will insert the row.
> >>> T2 will delete the row.
> >>> T3 will have conflict update_deleted. If it is 'update_deleted', the
> >>> chances are that the resolver set here is to 'skip' (default is also
> >>> 'skip' in this case).
> >>>
> >>> If vacuum has deleted that row (or if we don't support
> >>> 'update_deleted' conflict), it will be 'update_missing' conflict. In
> >>> that case, the user may end up inserting the row if resolver chosen is
> >>> in favor of apply (which seems an obvious choice for 'update_missing'
> >>> conflict; default is also 'apply_or_skip').
> >>>
> >>> End result:
> >>> Row inserted with 'update_missing'.
> >>> Row correctly skipped with 'update_deleted' (assuming the obvious
> >>> choice seems to be 'skip' for update_deleted case).
> >>>
> >>> So it seems that with 'update_deleted' conflict, there are higher
> >>> chances of opting for right decision here (which is to discard the
> >>> update), as 'update_deleted' conveys correct info to the user.  The
> >>> 'update_missing' OTOH does not convey correct info and user may end up
> >>> inserting the data by choosing apply favoring resolvers for
> >>> 'update_missing'. Again, we get benefit of 'update_deleted' for
> >>> *recently* deleted rows only.
> >>>
> >>> T3-T1-T2
> >>> ----------
> >>> T3 may end up inserting the record if the resolver is in favor of
> >>> 'apply' and all the columns are received from remote.
> >>> T1 will have' insert_exists' conflict and thus may either overwrite
> >>> 'updated' values or may leave the data as is (based on whether
> >>> resolver is in favor of apply or not)
> >>> T2 will end up deleting it.
> >>> End Result: No Data.
> >>>
> >>> I feel for second case (and similar cases), 'update_deleted' serves a
> >>> better conflict type.
> >>>
> >>
> >> True, but this is pretty much just a restatement of the example, right?
> >>
> >> The point I was trying to make is that this hinges on the ability to
> >> detect the correct conflict type. And if vacuum can swoop in and remove
> >> the recently deleted tuples (which I believe can happen at any time,
> >> right?), then that's not guaranteed, because we won't see the deleted
> >> tuple anymore.
> >
> > Yes, that's correct. However, many cases could benefit from the
> > update_deleted conflict type if it can be implemented reliably. That's
> > why we wanted to give it a try. But if we can't achieve predictable
> > results with it, I'm fine to drop this approach and conflict_type. We
> > can consider a better design in the future that doesn't depend on
> > non-vacuumed entries and provides a more robust method for identifying
> > deleted rows.
> >
>
> I agree having a separate update_deleted conflict would be beneficial,
> I'm not arguing against that - my point is actually that I think this
> conflict type is required, and that it needs to be detected reliably.
>
> I'm not sure dropping update_deleted entirely would be a good idea,
> though. It pretty much guarantees making the wrong decision at least
> sometimes. But at least it's predictable and users are more likely to
> notice that (compared to update_delete working on well-behaving systems,
> and then failing when a node starts lagging or something).
>
> That's my opinion, though, and I don't intend to stay in the way. But I
> think the solution is not that difficult - something needs to prevent
> cleanup of recently dead tuples (until the "relevant" changes are
> received and applied from other nodes). I don't know if that could be
> done based on information we have for subscriptions, or if we need
> something new.

I agree that without update_deleted, there are higher chances of
making incorrect decisions in some cases. But not sure if relying on
delaying vacuum from removing such rows is a full proof plan. We
cannot predict if or when "relevant" changes will occur, so how long
should we delay the vacuum?
To address this problem, we may need a completely different approach.
One solution could be to store deleted rows in a separate table
(dead-rows-table) so we can consult that table for any deleted entries
at any time. Additionally, we would need methods to purge older data
from the dead-rows-table to prevent it from growing too large. This
would be a substantial project on its own, so we can aim to implement
some initial and simple conflict resolution methods first before
tackling this more complex solution.

> >> Also, can the resolver even convert the UPDATE into INSERT and proceed?
> >> Maybe with REPLICA IDENTITY FULL?
> >
> > Yes, it can, as long as the row doesn't contain toasted data. Without
> > toasted data, the new tuple is fully logged. However, if the row does
> > contain toasted data, the new tuple won't log it completely. In such a
> > case, REPLICA IDENTITY FULL becomes a requirement to ensure we have
> > all the data necessary to create the row on the target side. In
> > absence of RI full and with row lacking toasted data, the operation
> > will be skipped or error will be raised.
> >
> >> Otherwise the row might be incomplete,
> >> missing required columns etc. In which case it'd have to wait for the
> >> actual INSERT to arrive - which would work for actual update_missing,
> >> where the row may be delayed due to network issues. But if that's a
> >> mistake due to vacuum removing the deleted tuple, it'll wait forever.
> >
> > Even in case of 'update_missing', we do not intend to wait for 'actual
> > insert' to arrive, as it is not guaranteed if the 'insert' will arrive
> > or not. And thus we plan to skip or error out  (based on user's
> > configuration) if a complete row can not be created for insertion.
> >
>
> If the UPDATE contains all the columns and can be turned into an INSERT,
> then that seems reasonable. But I don't see how skipping it could work
> in general (except for some very simple / specific use cases). I'm not
> sure if you suggest to skip just the one UPDATE or transaction as a
> whole, but it seems to me either of those options could easily lead to
> all kinds of inconsistencies and user confusion.

Conflict resolution is row-based, meaning that whatever action we
choose (error or skip) applies to the specific change rather than the
entire transaction. I'm not sure if waiting indefinitely for an INSERT
to arrive is a good idea, as the node that triggered the INSERT might
be down for an extended period. At best, we could provide a
configuration parameter using which the apply worker waits for a
specified time period for the INSERT to arrive before either skipping
or throwing an error.

That said, even if we error out or skip and log without waiting for
the INSERT, we won't introduce any new inconsistencies. This is the
current behavior on pg-HEAD. But with options like apply_or_skip and
apply_or_error, we have a better chance of resolving conflicts by
constructing the complete row internally, without user's intervention.
There will still be some cases where we can't fully reconstruct the
row, but in those instances, the behavior won't be any worse than the
current pg-HEAD.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: Re: Re: Add support to TLS 1.3 cipher suites and curves lists
Next
From: Amit Kapila
Date:
Subject: Re: Logical Replication of sequences