Re: Conflict Detection and Resolution - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Conflict Detection and Resolution |
Date | |
Msg-id | CAFiTN-te+idb-xM11g5O+dSWjUxEPop1YQOW_kkMTB0WrdENRg@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict Detection and Resolution (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Conflict Detection and Resolution
|
List | pgsql-hackers |
On Wed, Jun 12, 2024 at 5:26 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > >> I agree having a separate update_deleted conflict would be beneficial, > >> I'm not arguing against that - my point is actually that I think this > >> conflict type is required, and that it needs to be detected reliably. > >> > > > > When working with a distributed system, we must accept some form of > > eventual consistency model. > > I'm not sure this is necessarily true. There are distributed databases > implementing (or aiming to) regular consistency models, without eventual > consistency. I'm not saying it's easy, but it shows eventual consistency > is not the only option. Right, that statement might not be completely accurate. Based on the CAP theorem, when a network partition is unavoidable and availability is expected, we often choose an eventual consistency model. However, claiming that a fully consistent model is impossible in any distributed system is incorrect, as it can be achieved using mechanisms like Two-Phase Commit. We must also accept that our PostgreSQL replication mechanism does not guarantee a fully consistent model. Even with synchronous commit, it only waits for the WAL to be replayed on the standby but does not change the commit decision based on other nodes. This means, at most, we can only guarantee "Read Your Write" consistency. > > However, it's essential to design a > > predictable and acceptable behavior. For example, if a change is a > > result of a previous operation (such as an update on node B triggered > > after observing an operation on node A), we can say that the operation > > on node A happened before the operation on node B. Conversely, if > > operations on nodes A and B are independent, we consider them > > concurrent. > > > > Right. And this is precisely the focus or my questions - understanding > what behavior we aim for / expect in the end. Or said differently, what > anomalies / weird behavior would be considered expected. > Because that's important both for discussions about feasibility, etc. > And also for evaluation / reviews of the patch. +1 > > In distributed systems, clock skew is a known issue. To establish a > > consistency model, we need to ensure it guarantees the > > "happens-before" relationship. Consider a scenario with three nodes: > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > > subsequently NodeB makes changes, and then both NodeA's and NodeB's > > changes are sent to NodeC, the clock skew might make NodeB's changes > > appear to have occurred before NodeA's changes. However, we should > > maintain data that indicates NodeB's changes were triggered after > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > > changes happened after NodeA's changes, despite what the timestamps > > suggest. > > > > A common method to handle such cases is using vector clocks for > > conflict resolution. "Vector clocks" allow us to track the causal > > relationships between changes across nodes, ensuring that we can > > correctly order events and resolve conflicts in a manner that respects > > the "happens-before" relationship. This method helps maintain > > consistency and predictability in the system despite issues like clock > > skew. > > > > I'm familiar with the concept of vector clock (or logical clock in > general), but it's not clear to me how you plan to use this in the > context of the conflict handling. Can you elaborate/explain? > > The way I see it, conflict handling is pretty tightly coupled with > regular commit timestamps and MVCC in general. How would you use vector > clock to change that? The issue with using commit timestamps is that, when multiple nodes are involved, the commit timestamp won't accurately represent the actual order of operations. There's no reliable way to determine the perfect order of each operation happening on different nodes roughly simultaneously unless we use some globally synchronized counter. Generally, that order might not cause real issues unless one operation is triggered by a previous operation, and relying solely on physical timestamps would not detect that correctly. We need some sort of logical counter, such as a vector clock, which might be an independent counter on each node but can perfectly track the causal order. For example, if NodeA observes an operation from NodeB with a counter value of X, NodeA will adjust its counter to X+1. This ensures that if NodeA has seen an operation from NodeB, its next operation will appear to have occurred after NodeB's operation. I admit that I haven't fully thought through how we could design such version tracking in our logical replication protocol or how it would fit into our system. However, my point is that we need to consider something beyond commit timestamps to achieve reliable ordering. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: