Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Conflict Detection and Resolution
Date
Msg-id CAFiTN-ufDeiovR+9jB-4-TKjFDY+t+WDj=SvxZfo5kkrpyzYSQ@mail.gmail.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Conflict Detection and Resolution
Re: Conflict Detection and Resolution
List pgsql-hackers
On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
> > > <tomas.vondra@enterprisedb.com> wrote:
> > >
> > > > > Yes, that's correct. However, many cases could benefit from the
> > > > > update_deleted conflict type if it can be implemented reliably. That's
> > > > > why we wanted to give it a try. But if we can't achieve predictable
> > > > > results with it, I'm fine to drop this approach and conflict_type. We
> > > > > can consider a better design in the future that doesn't depend on
> > > > > non-vacuumed entries and provides a more robust method for identifying
> > > > > deleted rows.
> > > > >
> > > >
> > > > I agree having a separate update_deleted conflict would be beneficial,
> > > > I'm not arguing against that - my point is actually that I think this
> > > > conflict type is required, and that it needs to be detected reliably.
> > > >
> > >
> > > When working with a distributed system, we must accept some form of
> > > eventual consistency model. However, it's essential to design a
> > > predictable and acceptable behavior. For example, if a change is a
> > > result of a previous operation (such as an update on node B triggered
> > > after observing an operation on node A), we can say that the operation
> > > on node A happened before the operation on node B. Conversely, if
> > > operations on nodes A and B are independent, we consider them
> > > concurrent.
> > >
> > > In distributed systems, clock skew is a known issue. To establish a
> > > consistency model, we need to ensure it guarantees the
> > > "happens-before" relationship. Consider a scenario with three nodes:
> > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> > > subsequently NodeB makes changes, and then both NodeA's and NodeB's
> > > changes are sent to NodeC, the clock skew might make NodeB's changes
> > > appear to have occurred before NodeA's changes. However, we should
> > > maintain data that indicates NodeB's changes were triggered after
> > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> > > changes happened after NodeA's changes, despite what the timestamps
> > > suggest.
> > >
> > > A common method to handle such cases is using vector clocks for
> > > conflict resolution.
> > >
> >
> > I think the unbounded size of the vector could be a problem to store
> > for each event. However, while researching previous discussions, it
> > came to our notice that we have discussed this topic in the past as
> > well in the context of standbys. For recovery_min_apply_delay, we
> > decided the clock skew is not a problem as the settings of this
> > parameter are much larger than typical time deviations between servers
> > as mentioned in docs. Similarly for casual reads [1], there was a
> > proposal to introduce max_clock_skew parameter and suggesting the user
> > to make sure to have NTP set up correctly. We have tried to check
> > other databases (like Ora and BDR) where CDR is implemented but didn't
> > find anything specific to clock skew. So, I propose to go with a GUC
> > like max_clock_skew such that if the difference of time between the
> > incoming transaction's commit time and the local time is more than
> > max_clock_skew then we raise an ERROR. It is not clear to me that
> > putting bigger effort into clock skew is worth especially when other
> > systems providing CDR feature (like Ora or BDR) for decades have not
> > done anything like vector clocks. It is possible that this is less of
> > a problem w.r.t CDR and just detecting the anomaly in clock skew is
> > good enough.
>
> I believe that if we've accepted this solution elsewhere, then we can
> also consider the same. Basically, we're allowing the application to
> set its tolerance for clock skew. And, if the skew exceeds that
> tolerance, it's the application's responsibility to synchronize;
> otherwise, an error will occur. This approach seems reasonable.

This model can be further extended by making the apply worker wait if
the remote transaction's commit_ts is greater than the local
timestamp. This ensures that no local transactions occurring after the
remote transaction appear to have happened earlier due to clock skew
instead we make them happen before the remote transaction by delaying
the remote transaction apply.  Essentially, by having the remote
application wait until the local timestamp matches the remote
transaction's timestamp, we ensure that the remote transaction, which
seems to occur after concurrent local transactions due to clock skew,
is actually applied after those transactions.

With this model, there should be no ordering errors from the
application's perspective as well if synchronous commit is enabled.
The transaction initiated by the publisher cannot be completed until
it is applied to the synchronous subscriber. This ensures that if the
subscriber's clock is lagging behind the publisher's clock, the
transaction will not be applied until the subscriber's local clock is
in sync, preventing the transaction from being completed out of order.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Flush pgstats file during checkpoints
Next
From: Andrei Lepikhov
Date:
Subject: Re: Missing docs for new enable_group_by_reordering GUC