Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From Melih Mutlu
Subject Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date
Msg-id CAGPVpCRdDBRav4AR6NgAg+7HBfokppWwJkmjwzjWeuy1i7HYqA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List pgsql-hackers
Why after step 4, do you need to drop the replication slot? Won't just
clearing the required info from the catalog be sufficient?

The replication slots that we read from the catalog will not be used for anything else after we're done with syncing the table which the rep slot belongs to.
It's removed from the catalog when the sync is completed and it basically becomes a slot that is not linked to any table or worker. That's why I think it should be dropped rather than left behind.

Note that if a worker dies and its replication slot continues to exist, that slot will only be used to complete the sync process of the one table that the dead worker was syncing but couldn't finish.
When that particular table is synced and becomes ready, the replication slot has no use anymore.     
 
Hmm, I think even if there is an iota of a chance which I think is
there, we can't use worker_pid. Assume, that if the same worker_pid is
assigned to another worker once the worker using it got an error out,
the new worker will fail as soon as it will try to create a
replication slot.

Right. If something like that happens, worker will fail without doing anything. Then a new one will be launched and that one will continue to do the work.
The worst case might be having conflicting pid over and over again while also having replication slots whose name includes one of those pids still exist.
It seems unlikely but possible, yes.  
 
I feel it would be better or maybe we need to think of some other
identifier but one thing we need to think about before using a 64-bit
unique identifier here is how will we retrieve its last used value
after restart of server. We may need to store it in a persistent way
somewhere.

We might consider storing this info in a catalog again. Since this last used value will be different for each subscription, pg_subscription can be a good place to keep that. 
 
The problems will be similar to the slot name. The origin is used to
track the progress of replication, so, if we use the wrong origin name
after the restart, it can send the wrong start_streaming position to
the publisher.

I understand. But origin naming logic is still the same. Its format is like pg_<subid>_<relid> . 
I did not need to change this since it seems to me origins should belong to only one table. The patch does not reuse origins.
So I don't think this change introduces an issue with origin. What do you think?

Thanks,
Melih

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: making relfilenodes 56 bits
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] Log details for client certificate failures