Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAD21AoD=m-YHceYMpsdu0HnGCaezeyVhaCPFxDLHU7aN0wgzqg@mail.gmail.com
Whole thread Raw
In response to RE: Conflict detection for update_deleted in logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Tue, Sep 24, 2024 at 12:14 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, September 24, 2024 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 23, 2024 at 8:32 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Tuesday, September 24, 2024 5:05 AM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > > > I'm still studying this idea but let me confirm the following scenario.
> > > >
> > > > Suppose both Node-A and Node-B have the same row (1,1) in table t,
> > > > and XIDs and commit LSNs of T2 and T3 are the following:
> > > >
> > > > Node A
> > > >   T2: DELETE FROM t WHERE id = 1 (10:02 AM) XID:100,
> > commit-LSN:1000
> > > >
> > > > Node B
> > > >   T3: UPDATE t SET value = 2 WHERE id 1 (10:01 AM) XID:500,
> > > > commit-LSN:5000
> > > >
> > > > Further suppose that it's now 10:05 AM, and the latest XID and the
> > > > latest flush WAL position of Node-A and Node-B are following:
> > > >
> > > > Node A
> > > >   current XID: 300
> > > >   latest flush LSN; 3000
> > > >
> > > > Node B
> > > >   current XID: 700
> > > >   latest flush LSN: 7000
> > > >
> > > > Both T2 and T3 are NOT sent to Node B and Node A yet, respectively
> > > > (i.e., the logical replication is delaying for 5 min).
> > > >
> > > > Consider the following scenario:
> > > >
> > > > 1. The apply worker on Node-A calls GetRunningTransactionData() and
> > > > gets 301 (set as candidate_xmin).
> > > > 2. The apply worker on Node-A requests the latest WAL flush position
> > > > from Node-B, and gets 7000 (set as candidate_remote_wal_lsn).
> > > > 3. T2 is applied on Node-B, and the latest flush position of Node-B is now
> > 8000.
> > > > 4. The apply worker on Node-A continues applying changes, and
> > > > applies the transactions up to remote (commit) LSN 7100.
> > > > 5. Now that the apply worker on Node-A applied all changes smaller
> > > > than candidate_remote_wal_lsn (7000), it increases the slot.xmin to
> > > > 301 (candidate_xmin).
> > > > 6. On Node-A, vacuum runs and physically removes the tuple that was
> > > > deleted by T2.
> > > >
> > > > Here, on Node-B, there might be a transition between LSN 7100 and
> > > > 8000 that might require the tuple that is deleted by T2.
> > > >
> > > > For example, "UPDATE t SET value = 3 WHERE id = 1" (say T4) is
> > > > executed on Node-B at LSN 7200, and it's sent to Node-A after step 6.
> > > > On Node-A, whether we detect "update_deleted" or "update_missing"
> > > > still depends on when vacuum removes the tuple deleted by T2.
> > >
> > > I think in this case, no matter we detect "update_delete" or
> > > "update_missing", the final data is the same. Because T4's commit
> > > timestamp should be later than
> > > T2 on node A, so in the case of "update_deleted", it will compare the
> > > commit timestamp of the deleted tuple's xmax with T4's timestamp, and
> > > T4 should win, which means we will convert the update into insert and
> > > apply. Even if the deleted tuple is deleted and "update_missing" is
> > > detected, the update will still be converted into insert and applied. So, the
> > result is the same.
> >
> > The "latest_timestamp_wins" is the default resolution method for
> > "update_deleted"? When I checked the wiki page[1], the "skip" was the default
> > solution method for that.
>
> Right, I think the wiki needs some update.
>
> I think using 'skip' as default for update_delete could easily cause data
> divergence when the dead tuple is deleted by an old transaction while the
> UPDATE has a newer timestamp like the case you mentioned. It's necessary to
> follow the last update win strategy when the incoming update has later
> timestamp, which is to convert update to insert.

Right. If "latest_timestamp_wins" is the default resolution for
"update_deleted", I think your idea works fine unless I'm missing
corner cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Add new COPY option REJECT_LIMIT