Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From shveta malik
Subject Re: Conflict Detection and Resolution
Date
Msg-id CAJpy0uDW-eh6sRWCuxfc89wwnLB_EdaJ9F6H9kN4ViOY4ag3JQ@mail.gmail.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Conflict Detection and Resolution
List pgsql-hackers
On Wed, Jul 3, 2024 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jul 2, 2024 at 2:40 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I tried to work out a few scenarios with this, where the apply worker
> > > > will wait until its local clock hits 'remote_commit_tts - max_skew
> > > > permitted'. Please have a look.
> > > >
> > > > Let's say, we have a GUC to configure max_clock_skew permitted.
> > > > Resolver is last_update_wins in both cases.
> > > > ----------------
> > > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew.
> > > >
> > > > Remote Update with commit_timestamp = 10.20AM.
> > > > Local clock (which is say 5 min behind) shows = 10.15AM.
> > > >
> > > > When remote update arrives at local node, we see that skew is greater
> > > > than max_clock_skew and thus apply worker waits till local clock hits
> > > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the
> > > > local clock hits 10.20 AM, the worker applies the remote change with
> > > > commit_tts of 10.20AM. In the meantime (during wait period of apply
> > > > worker)) if some local update on same row has happened at say 10.18am,
> > > > that will applied first, which will be later overwritten by above
> > > > remote change of 10.20AM as remote-change's timestamp appear more
> > > > latest, even though it has happened earlier than local change.
> > >
> > > For the sake of simplicity let's call the change that happened at
> > > 10:20 AM change-1 and the change that happened at 10:15 as change-2
> > > and assume we are talking about the synchronous commit only.
> > >
> > > I think now from an application perspective the change-1 wouldn't have
> > > caused the change-2 because we delayed applying change-2 on the local
> > > node which would have delayed the confirmation of the change-1 to the
> > > application that means we have got the change-2 on the local node
> > > without the confirmation of change-1 hence change-2 has no causal
> > > dependency on the change-1.  So it's fine that we perform change-1
> > > before change-2 and the timestamp will also show the same at any other
> > > node if they receive these 2 changes.
> > >
> > > The goal is to ensure that if we define the order where change-2
> > > happens before change-1, this same order should be visible on all
> > > other nodes. This will hold true because the commit timestamp of
> > > change-2 is earlier than that of change-1.
> > >
> > > > 2)  Case 2: max_clock_skew is set to 2min.
> > > >
> > > > Remote Update with commit_timestamp=10.20AM
> > > > Local clock (which is say 5 min behind) = 10.15AM.
> > > >
> > > > Now apply worker will notice skew greater than 2min and thus will wait
> > > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e.
> > > > 10.18 and will apply the change with commit_tts of 10.20 ( as we
> > > > always save the origin's commit timestamp into local commit_tts, see
> > > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say
> > > > another local update is triggered at 10.19am, it will be applied
> > > > locally but it will be ignored on remote node. On the remote node ,
> > > > the existing change with a timestamp of 10.20 am will win resulting in
> > > > data divergence.
> > >
> > > Let's call the 10:20 AM change as a change-1 and the change that
> > > happened at 10:19 as change-2
> > >
> > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that
> > > commit_ts of that change is 10:20, and the same will be visible to all
> > > other nodes.  So in conflict resolution still the change-1 happened
> > > after the change-2 because change-2's commit_ts is 10:19 AM.   Now
> > > there could be a problem with the causal order because we applied the
> > > change-1 at 10:18 AM so the application might have gotten confirmation
> > > at 10:18 AM and the change-2 of the local node may be triggered as a
> > > result of confirmation of the change-1 that means now change-2 has a
> > > causal dependency on the change-1 but commit_ts shows change-2
> > > happened before the change-1 on all the nodes.
> > >
> > > So, is this acceptable? I think yes because the user has configured a
> > > maximum clock skew of 2 minutes, which means the detected order might
> > > not always align with the causal order for transactions occurring
> > > within that time frame. Generally, the ideal configuration for
> > > max_clock_skew should be in multiple of the network round trip time.
> > > Assuming this configuration, we wouldn’t encounter this problem
> > > because for change-2 to be caused by change-1, the client would need
> > > to get confirmation of change-1 and then trigger change-2, which would
> > > take at least 2-3 network round trips.
> >
> > As we agreed, the subscriber should wait before applying an operation
> > if the commit timestamp of the currently replayed transaction is in
> > the future and the difference exceeds the maximum clock skew. This
> > raises the question: should the subscriber wait only for insert,
> > update, and delete operations when timestamp-based resolution methods
> > are set, or should it wait regardless of the type of remote operation,
> > the presence or absence of conflicts, and the resolvers configured?
> > I believe the latter approach is the way to go i.e. this should be
> > independent of CDR, though needed by CDR for better timestamp based
> > resolutions. Thoughts?
>
> Yes, I also think it should be independent of CDR.  IMHO, it should be
> based on the user-configured maximum clock skew tolerance and can be
> independent of CDR.

+1

> IIUC we would make the remote apply wait just
> before committing if the remote commit timestamp is ahead of the local
> clock by more than the maximum clock skew tolerance, is that correct?

+1 on condition to wait.

But I think we should make apply worker wait during begin
(apply_handle_begin) instead of commit. It makes more sense to delay
the entire operation to manage clock-skew rather than the commit
alone. And only then CDR's timestamp based resolution which are much
prior to commit-stage can benefit from this. Thoughts?

thanks
Shveta



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Conflict Detection and Resolution
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: What is a typical precision of gettimeofday()?