Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAA4eK1+-Qgq1SrsMz8vufd2-yOVuj0H-PaTYTxe-e6krY702kg@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Single transaction in the tablesync worker?
Re: Single transaction in the tablesync worker?
List pgsql-hackers
On Fri, Dec 18, 2020 at 6:41 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> TODO / Known Issues:
>
> * the current implementation of tablesync drop slot (e.g. from
> DropSubscription or finish_sync_worker) regenerates the tablesync slot
> name so it knows what slot to drop.
>

If you always drop the slot at finish_sync_worker, then in which case
do you need to drop it during DropSubscription? Is it when the table
sync workers are crashed?

> The current code might be ok for
> normal use cases, but if there is an ALTER SUBSCRIPTION ... SET
> (slot_name = newname) it would fail to be able to find the tablesync
> slot.
>

Sure, but the same will be true for the apply worker slot as well. I
agree the problem would be more for table sync workers but I think we
can solve it, see below.

> * I think if there are crashed tablesync workers then they are not
> known to DropSubscription. So this might be a problem to cleanup slots
> and/or origin tracking belonging to those unknown workers.
>

Yeah, I think we can do two things to avoid this and the previous
problem. (a) We can generate the slot_name for the table sync worker
based on only subscription_id and rel_id. (b) Immediately after
creating the slot, advance the replication origin with the position
(origin_startpos) we get from walrcv_create_slot, this will help us to
start from the right location.

Do you see anything which will still not be addressed after doing the above?

I understand why you are trying to create this patch atop logical
decoding of 2PC patch but I think it is better to create this as an
independent patch and then use it to test 2PC problem. Also, please
explain what kind of testing you did to ensure that it works properly
after the table sync worker restarts after the crash.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Incorrect allocation handling for cryptohash functions with OpenSSL
Next
From: Amit Kapila
Date:
Subject: Re: Misleading comment in prologue of ReorderBufferQueueMessage