Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Conflict Detection and Resolution
Date
Msg-id CAFiTN-te+idb-xM11g5O+dSWjUxEPop1YQOW_kkMTB0WrdENRg@mail.gmail.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Conflict Detection and Resolution
List pgsql-hackers
On Wed, Jun 12, 2024 at 5:26 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> >> I agree having a separate update_deleted conflict would be beneficial,
> >> I'm not arguing against that - my point is actually that I think this
> >> conflict type is required, and that it needs to be detected reliably.
> >>
> >
> > When working with a distributed system, we must accept some form of
> > eventual consistency model.
>
> I'm not sure this is necessarily true. There are distributed databases
> implementing (or aiming to) regular consistency models, without eventual
> consistency. I'm not saying it's easy, but it shows eventual consistency
> is not the only option.

Right, that statement might not be completely accurate. Based on the
CAP theorem, when a network partition is unavoidable and availability
is expected, we often choose an eventual consistency model. However,
claiming that a fully consistent model is impossible in any
distributed system is incorrect, as it can be achieved using
mechanisms like Two-Phase Commit.

We must also accept that our PostgreSQL replication mechanism does not
guarantee a fully consistent model. Even with synchronous commit, it
only waits for the WAL to be replayed on the standby but does not
change the commit decision based on other nodes. This means, at most,
we can only guarantee "Read Your Write" consistency.

> > However, it's essential to design a
> > predictable and acceptable behavior. For example, if a change is a
> > result of a previous operation (such as an update on node B triggered
> > after observing an operation on node A), we can say that the operation
> > on node A happened before the operation on node B. Conversely, if
> > operations on nodes A and B are independent, we consider them
> > concurrent.
> >
>
> Right. And this is precisely the focus or my questions - understanding
> what behavior we aim for / expect in the end. Or said differently, what
> anomalies / weird behavior would be considered expected.

> Because that's important both for discussions about feasibility, etc.
> And also for evaluation / reviews of the patch.

+1

> > In distributed systems, clock skew is a known issue. To establish a
> > consistency model, we need to ensure it guarantees the
> > "happens-before" relationship. Consider a scenario with three nodes:
> > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> > subsequently NodeB makes changes, and then both NodeA's and NodeB's
> > changes are sent to NodeC, the clock skew might make NodeB's changes
> > appear to have occurred before NodeA's changes. However, we should
> > maintain data that indicates NodeB's changes were triggered after
> > NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> > changes happened after NodeA's changes, despite what the timestamps
> > suggest.
> >
> > A common method to handle such cases is using vector clocks for
> > conflict resolution. "Vector clocks" allow us to track the causal
> > relationships between changes across nodes, ensuring that we can
> > correctly order events and resolve conflicts in a manner that respects
> > the "happens-before" relationship. This method helps maintain
> > consistency and predictability in the system despite issues like clock
> > skew.
> >
>
> I'm familiar with the concept of vector clock (or logical clock in
> general), but it's not clear to me how you plan to use this in the
> context of the conflict handling. Can you elaborate/explain?
>
> The way I see it, conflict handling is pretty tightly coupled with
> regular commit timestamps and MVCC in general. How would you use vector
> clock to change that?

The issue with using commit timestamps is that, when multiple nodes
are involved, the commit timestamp won't accurately represent the
actual order of operations. There's no reliable way to determine the
perfect order of each operation happening on different nodes roughly
simultaneously unless we use some globally synchronized counter.
Generally, that order might not cause real issues unless one operation
is triggered by a previous operation, and relying solely on physical
timestamps would not detect that correctly.

We need some sort of logical counter, such as a vector clock, which
might be an independent counter on each node but can perfectly track
the causal order. For example, if NodeA observes an operation from
NodeB with a counter value of X, NodeA will adjust its counter to X+1.
This ensures that if NodeA has seen an operation from NodeB, its next
operation will appear to have occurred after NodeB's operation.

I admit that I haven't fully thought through how we could design such
version tracking in our logical replication protocol or how it would
fit into our system. However, my point is that we need to consider
something beyond commit timestamps to achieve reliable ordering.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Logical Replication of sequences
Next
From: Dilip Kumar
Date:
Subject: Re: Logical Replication of sequences