Re: In logical replication concurrent update of partition key createsa duplicate record on standby. - Mailing list pgsql-hackers

From amul sul
Subject Re: In logical replication concurrent update of partition key createsa duplicate record on standby.
Date
Msg-id CAAJ_b96ehpQOnwNakOedeGuEUCmjOYJipDbhmBdy34PeypjUrg@mail.gmail.com
Whole thread Raw
In response to Re: In logical replication concurrent update of partition key createsa duplicate record on standby.  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: In logical replication concurrent update of partition key createsa duplicate record on standby.  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Feb 7, 2018 at 3:42 PM, amul sul <sulamul@gmail.com> wrote:
>> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> On 7 February 2018 at 13:53, amul sul <sulamul@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> If an update of partition key involves tuple movement from one partition to
>>>> another partition then there will be a separate delete on one partition and
>>>> insert on the other partition made.
>>>>
>>>> In the logical replication if an update performed on the master and standby at
>>>> the same moment, then replication worker tries to replicate delete + insert
>>>> operation on standby. While replying master changes on standby for the delete
>>>> operation worker will log "concurrent update, retrying" message (because the
>>>> update on standby has already deleted) and move forward to reply the next
>>>> insert operation. Standby update also did the same delete+insert is as part of
>>>> the update of partition key in a result there will be two records inserted on
>>>> standby.
>>>
>>> A quick thinking on how to resolve this makes me wonder if we can
>>> manage to pass some information through logical decoding that the
>>> delete is part of a partition key update. This is analogous to how we
>>> set some information locally in the tuple by setting
>>> tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber.
>>>
>>
>> +1,
>>
>
> I also mentioned the same thing in the other thread [1], but I think
> that alone won't solve the dual record problem as you are seeing.  I
> think we need to do something for next insert as you are suggesting.
>
>> also if  worker failed to reply delete operation on standby then
>> we need to decide what will be the next step, should we skip follow
>> insert operation or error out or something else.
>>
>
> That would be tricky, do you see any simple way of doing either of those.
>

Not really, like ExecUpdate for an update of partition key if delete is failed
then the further insert will be skipped, but you are correct, it might be more
tricky than I can think -- there is no guarantee that the next insert operation
which replication worker trying to replicate is part of the update of partition
key mechanism.  How can one identify that an insert operation on one relation is
related to previously deleting operation on some other relation?

Regards,
Amul


pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Next
From: amul sul
Date:
Subject: Re: In logical replication concurrent update of partition key createsa duplicate record on standby.