Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAA4eK1J=eKf_-AObSdXp7xurjTQqA62Ls7ZJjajcz0wkE4DkQQ@mail.gmail.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Mon, Sep 30, 2024 at 12:02 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, September 25, 2024 2:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I think the remote wal flush location is asked using a replication protocol.
> > Therefore, if a new worker is responsible for asking wal flush location from
> > multiple publishers (like the idea (b)), the corresponding process would need
> > to be launched on publisher sides and logical replication would also need to
> > start on each connection. I think it would be better to get the remote wal flush
> > location using the existing logical replication connection (i.e., between the
> > logical wal sender and the apply worker), and advertise the locations on the
> > shared memory. Then, the central process who holds the slot to retain the
> > deleted row versions traverses them and increases slot.xmin if possible.
> >
> > The cost of requesting the remote wal flush location would not be huge if we
> > don't ask it very frequently. So probably we can start by having each apply
> > worker (in the retain_sub_list) ask the remote wal flush location and can leave
> > the optimization of avoiding sending the request for the same publisher.
>
> Agreed. Here is the POC patch set based on this idea.
>
> The implementation is as follows:
>
> A subscription option is added to allow users to specify whether dead
> tuples on the subscriber, which are useful for detecting update_deleted
> conflicts, should be retained. The default setting is false. If set to true,
> the detection of update_deleted will be enabled,
>

I find the option name retain_dead_tuples bit misleading because by
name one can't make out the purpose of the same. It is better to name
it as detect_update_deleted or something on those lines.

> and an additional replication
> slot named pg_conflict_detection will be created on the subscriber to prevent
> dead tuples from being removed. Note that if multiple subscriptions on one node
> enable this option, only one replication slot will be created.
>

In general, we should have done this by default but as detecting
update_deleted type conflict has some overhead in terms of retaining
dead tuples for more time, so having an option seems reasonable. But I
suggest to keep this as a separate last patch. If we can make the core
idea work by default then we can enable it via option in the end.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Greg Sabino Mullane
Date:
Subject: Re: Truncate logs by max_log_size
Next
From: Robert Haas
Date:
Subject: Re: pg_verifybackup: TAR format backup verification