Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAD21AoA9TTLqFbTAbHsDvt1G2+R4iNixweVRkCHRBnHAOvoFkQ@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Conflict detection for update_deleted in logical replication
|
List | pgsql-hackers |
On Tue, Sep 17, 2024 at 9:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Sep 17, 2024 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Sep 16, 2024 at 11:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Sep 17, 2024 at 6:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I haven't thought about the implementation details yet but I think > > > during pruning (for example in heap_prune_satisfies_vacuum()), apart > > > from checking if the tuple satisfies > > > HeapTupleSatisfiesVacuumHorizon(), we should also check if the tuple's > > > committs is greater than configured vacuum_committs_age (for the > > > table) to decide whether tuple can be removed. > > > > Sounds very costly. I think we need to do performance tests. Even if > > the vacuum gets slower only on the particular table having the > > vacuum_committs_age setting, it would affect overall autovacuum > > performance. Also, it would affect HOT pruning performance. > > > > Agreed that we should do some performance testing and additionally > think of any better way to implement. I think the cost won't be much > if the tuples to be removed are from a single transaction because the > required commit_ts information would be cached but when the tuples are > from different transactions, we could see a noticeable impact. We need > to test to say anything concrete on this. Agreed. > > > > > > > > > but IIUC this value doesn’t need to be significant; it > > > > > can be limited to just a few minutes. The one which is sufficient to > > > > > handle replication delays caused by network lag or other factors, > > > > > assuming clock skew has already been addressed. > > > > > > > > I think that in a non-bidirectional case the value could need to be a > > > > large number. Is that right? > > > > > > > > > > As per my understanding, even for non-bidirectional cases, the value > > > should be small. For example, in the case, pointed out by Shveta [1], > > > where the updates from 2 nodes are received by a third node, this > > > setting is expected to be small. This setting primarily deals with > > > concurrent transactions on multiple nodes, so it should be small but I > > > could be missing something. > > > > > > > I might be missing something but the scenario I was thinking of is > > something below. > > > > Suppose that we setup uni-directional logical replication between Node > > A and Node B (e.g., Node A -> Node B) and both nodes have the same row > > with key = 1: > > > > Node A: > > T1: UPDATE t SET val = 2 WHERE key = 1; (10:00 AM) > > -> This change is applied on Node B at 10:01 AM. > > > > Node B: > > T2: DELETE FROM t WHERE key = 1; (05:00 AM) > > > > If a vacuum runs on Node B at 06:00 AM, the change of T1 coming from > > Node A would raise an "update_missing" conflict. On the other hand, if > > a vacuum runs on Node B at 11:00 AM, the change would raise an > > "update_deleted" conflict. It looks whether we detect an > > "update_deleted" or an "updated_missing" depends on the timing of > > vacuum, and to avoid such a situation, we would need to set > > vacuum_committs_age to more than 5 hours. > > > > Yeah, in this case, it would detect a different conflict (if we don't > set vacuum_committs_age to greater than 5 hours) but as per my > understanding, the primary purpose of conflict detection and > resolution is to avoid data inconsistency in a bi-directional setup. > Assume, in the above case it is a bi-directional setup, then we want > to have the same data in both nodes. Now, if there are other cases > like the one you mentioned that require to detect the conflict > reliably than I agree this value could be large and probably not the > best way to achieve it. I think we can mention in the docs that the > primary purpose of this is to achieve data consistency among > bi-directional kind of setups. > > Having said that even in the above case, the result should be the same > whether the vacuum has removed the row or not. Say, if the vacuum has > not yet removed the row (due to vacuum_committs_age or otherwise) then > also because the incoming update has a later timestamp, we will > convert the update to insert as per last_update_wins resolution > method, so the conflict will be considered as update_missing. And, > say, the vacuum has removed the row and the conflict detected is > update_missing, then also we will convert the update to insert. In > short, if UPDATE has lower commit-ts, DELETE should win and if UPDATE > has higher commit-ts, UPDATE should win. > > So, we can expect data consistency in bidirectional cases and expect a > deterministic behavior in other cases (e.g. the final data in a table > does not depend on the order of applying the transactions from other > nodes). Agreed. I think that such a time-based configuration parameter would be a reasonable solution. The current concerns are that it might affect vacuum performance and lead to a similar bug we had with vacuum_defer_cleanup_age. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: