Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAA4eK1JDFE3XXy54GN9WJky2U2_CwAmX53d9n6jQOECtjVkeqQ@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
|
List | pgsql-hackers |
On Tue, Sep 17, 2024 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Sep 16, 2024 at 11:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Sep 17, 2024 at 6:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I haven't thought about the implementation details yet but I think > > during pruning (for example in heap_prune_satisfies_vacuum()), apart > > from checking if the tuple satisfies > > HeapTupleSatisfiesVacuumHorizon(), we should also check if the tuple's > > committs is greater than configured vacuum_committs_age (for the > > table) to decide whether tuple can be removed. > > Sounds very costly. I think we need to do performance tests. Even if > the vacuum gets slower only on the particular table having the > vacuum_committs_age setting, it would affect overall autovacuum > performance. Also, it would affect HOT pruning performance. > Agreed that we should do some performance testing and additionally think of any better way to implement. I think the cost won't be much if the tuples to be removed are from a single transaction because the required commit_ts information would be cached but when the tuples are from different transactions, we could see a noticeable impact. We need to test to say anything concrete on this. > > > > > > but IIUC this value doesn’t need to be significant; it > > > > can be limited to just a few minutes. The one which is sufficient to > > > > handle replication delays caused by network lag or other factors, > > > > assuming clock skew has already been addressed. > > > > > > I think that in a non-bidirectional case the value could need to be a > > > large number. Is that right? > > > > > > > As per my understanding, even for non-bidirectional cases, the value > > should be small. For example, in the case, pointed out by Shveta [1], > > where the updates from 2 nodes are received by a third node, this > > setting is expected to be small. This setting primarily deals with > > concurrent transactions on multiple nodes, so it should be small but I > > could be missing something. > > > > I might be missing something but the scenario I was thinking of is > something below. > > Suppose that we setup uni-directional logical replication between Node > A and Node B (e.g., Node A -> Node B) and both nodes have the same row > with key = 1: > > Node A: > T1: UPDATE t SET val = 2 WHERE key = 1; (10:00 AM) > -> This change is applied on Node B at 10:01 AM. > > Node B: > T2: DELETE FROM t WHERE key = 1; (05:00 AM) > > If a vacuum runs on Node B at 06:00 AM, the change of T1 coming from > Node A would raise an "update_missing" conflict. On the other hand, if > a vacuum runs on Node B at 11:00 AM, the change would raise an > "update_deleted" conflict. It looks whether we detect an > "update_deleted" or an "updated_missing" depends on the timing of > vacuum, and to avoid such a situation, we would need to set > vacuum_committs_age to more than 5 hours. > Yeah, in this case, it would detect a different conflict (if we don't set vacuum_committs_age to greater than 5 hours) but as per my understanding, the primary purpose of conflict detection and resolution is to avoid data inconsistency in a bi-directional setup. Assume, in the above case it is a bi-directional setup, then we want to have the same data in both nodes. Now, if there are other cases like the one you mentioned that require to detect the conflict reliably than I agree this value could be large and probably not the best way to achieve it. I think we can mention in the docs that the primary purpose of this is to achieve data consistency among bi-directional kind of setups. Having said that even in the above case, the result should be the same whether the vacuum has removed the row or not. Say, if the vacuum has not yet removed the row (due to vacuum_committs_age or otherwise) then also because the incoming update has a later timestamp, we will convert the update to insert as per last_update_wins resolution method, so the conflict will be considered as update_missing. And, say, the vacuum has removed the row and the conflict detected is update_missing, then also we will convert the update to insert. In short, if UPDATE has lower commit-ts, DELETE should win and if UPDATE has higher commit-ts, UPDATE should win. So, we can expect data consistency in bidirectional cases and expect a deterministic behavior in other cases (e.g. the final data in a table does not depend on the order of applying the transactions from other nodes). -- With Regards, Amit Kapila.
pgsql-hackers by date: