Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date
Msg-id CAA4eK1+PsvbjSH7yiCdCn=zcpAdM01MK=wdOuq879JcQqu6jwg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Responses Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List pgsql-hackers
On Thu, Jul 28, 2022 at 9:32 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
>>
>> Why after step 4, do you need to drop the replication slot? Won't just
>> clearing the required info from the catalog be sufficient?
>
>
> The replication slots that we read from the catalog will not be used for anything else after we're done with syncing
thetable which the rep slot belongs to.
 
> It's removed from the catalog when the sync is completed and it basically becomes a slot that is not linked to any
tableor worker. That's why I think it should be dropped rather than left behind.
 
>
> Note that if a worker dies and its replication slot continues to exist, that slot will only be used to complete the
syncprocess of the one table that the dead worker was syncing but couldn't finish.
 
> When that particular table is synced and becomes ready, the replication slot has no use anymore.
>

Why can't it be used to sync the other tables if any?

>>
>> Hmm, I think even if there is an iota of a chance which I think is
>> there, we can't use worker_pid. Assume, that if the same worker_pid is
>> assigned to another worker once the worker using it got an error out,
>> the new worker will fail as soon as it will try to create a
>> replication slot.
>
>
> Right. If something like that happens, worker will fail without doing anything. Then a new one will be launched and
thatone will continue to do the work.
 
> The worst case might be having conflicting pid over and over again while also having replication slots whose name
includesone of those pids still exist.
 
> It seems unlikely but possible, yes.
>
>>
>> I feel it would be better or maybe we need to think of some other
>> identifier but one thing we need to think about before using a 64-bit
>> unique identifier here is how will we retrieve its last used value
>> after restart of server. We may need to store it in a persistent way
>> somewhere.
>
>
> We might consider storing this info in a catalog again. Since this last used value will be different for each
subscription,pg_subscription can be a good place to keep that.
 
>

This sounds reasonable. Let's do this unless we get some better idea.

>>
>> The problems will be similar to the slot name. The origin is used to
>> track the progress of replication, so, if we use the wrong origin name
>> after the restart, it can send the wrong start_streaming position to
>> the publisher.
>
>
> I understand. But origin naming logic is still the same. Its format is like pg_<subid>_<relid> .
> I did not need to change this since it seems to me origins should belong to only one table. The patch does not reuse
origins.
> So I don't think this change introduces an issue with origin. What do you think?
>

There is no such restriction that origins should belong to only one
table. What makes you think like that?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.
Next
From: Amit Kapila
Date:
Subject: Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.