RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Conflict detection for update_deleted in logical replication
Date
Msg-id OS0PR01MB5716662BEB9C0B4E92587FAC946C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Conflict detection for update_deleted in logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Friday, September 20, 2024 10:55 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> On Friday, September 20, 2024 2:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > 
> >
> > I think that such a time-based configuration parameter would be a
> > reasonable solution. The current concerns are that it might affect
> > vacuum performance and lead to a similar bug we had with
> vacuum_defer_cleanup_age.
> 
> Thanks for the feedback!
> 
> I am working on the POC patch and doing some initial performance tests on
> this idea.
> I will share the results after finishing.
> 
> Apart from the vacuum_defer_cleanup_age idea. we’ve given more thought to
> our approach for retaining dead tuples and have come up with another idea that
> can reliably detect conflicts without requiring users to choose a wise value for
> the vacuum_committs_age. This new idea could also reduce the performance
> impact. Thanks a lot to Amit for off-list discussion.
> 
> The concept of the new idea is that, the dead tuples are only useful to detect
> conflicts when applying *concurrent* transactions from remotes. Any
> subsequent UPDATE from a remote node after removing the dead tuples
> should have a later timestamp, meaning it's reasonable to detect an
> update_missing scenario and convert the UPDATE to an INSERT when
> applying it.
> 
> To achieve above, we can create an additional replication slot on the subscriber
> side, maintained by the apply worker. This slot is used to retain the dead tuples.
> The apply worker will advance the slot.xmin after confirming that all the
> concurrent transaction on publisher has been applied locally.
> 
> The process of advancing the slot.xmin could be:
> 
> 1) the apply worker call GetRunningTransactionData() to get the
> 'oldestRunningXid' and consider this as 'candidate_xmin'.
> 2) the apply worker send a new message to walsender to request the latest wal
> flush position(GetFlushRecPtr) on publisher, and save it to
> 'candidate_remote_wal_lsn'. Here we could introduce a new feedback
> message or extend the existing keepalive message(e,g extends the
> requestReply bit in keepalive message to add a 'request_wal_position' value)
> 3) The apply worker can continue to apply changes. After applying all the WALs
> upto 'candidate_remote_wal_lsn', the apply worker can then advance the
> slot.xmin to 'candidate_xmin'.
> 
> This approach ensures that dead tuples are not removed until all concurrent
> transactions have been applied. It can be effective for both bidirectional and
> non-bidirectional replication cases.
> 
> We could introduce a boolean subscription option (retain_dead_tuples) to
> control whether this feature is enabled. Each subscription intending to detect
> update-delete conflicts should set retain_dead_tuples to true.
> 
> The following explains how it works in different cases to achieve data
> consistency:
...
> --
> 3 nodes, non-bidirectional, Node C subscribes to both Node A and Node B:
> --

Sorry for a typo here, the time of T2 and T3 were reversed.
Please see the following correction:

> 
> Node A:
>   T1: INSERT INTO t (id, value) VALUES (1,1);        ts=10.00 AM
>   T2: DELETE FROM t WHERE id = 1;            ts=10.01 AM

Here T2 should be at ts=10.02 AM

> 
> Node B:
>   T3: UPDATE t SET value = 2 WHERE id = 1;        ts=10.02 AM

T3 should be at ts=10.01 AM

> 
> Node C:
>     apply T1, T2, T3
> 
> After applying T2, the apply worker on Node C will check the latest wal flush
> location on Node B. Till that time, the T3 should have finished, so the xmin will
> be advanced only after applying the WALs that is later than T3. So, the dead
> tuple will not be removed before applying the T3, which means the
> update_delete can be detected.
> 
> Your feedback on this idea would be greatly appreciated.
> 

Best Regards,
Hou zj 


pgsql-hackers by date:

Previous
From: Junwang Zhao
Date:
Subject: Re: attndims, typndims still not enforced, but make the value within a sane threshold
Next
From: Alexander Lakhin
Date:
Subject: Re: Large expressions in indexes can't be stored (non-TOASTable)