Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers
From | Melih Mutlu |
---|---|
Subject | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |
Date | |
Msg-id | CAGPVpCTV57_aji6y4k1PwEVO3h8h2OYkEKiYC4OB=J-ErXufQw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
|
List | pgsql-hackers |
Hi Amit and Dilip,
Thanks for the replies.
> I had a quick look into the patch and it seems it is using the worker
> array index instead of relid while forming the slot name
Yes, I changed the slot names so they include slot index instead of relation id.
This was needed because I aimed to separate replication slots from relations.
I think that won't work because each time on restart the slot won't be
fixed. Now, it is possible that we may drop the wrong slot if that
state of copying rel is SUBREL_STATE_DATASYNC. Also, it is possible
that while creating a slot, we fail because the same name slot already
exists due to some other worker which has created that slot has been
restarted. Also, what about origin_name, won't that have similar
problems? Also, if the state is already SUBREL_STATE_FINISHEDCOPY, if
the slot is not the same as we have used in the previous run of a
particular worker, it may start WAL streaming from a different point
based on the slot's confirmed_flush_location.
You're right Amit. In case of a failure, tablesync phase of a relation may continue with different worker and replication slot due to this change in naming.
Seems like the same replication slot should be used from start to end for a relation during tablesync. However, creating/dropping replication slots can be a major overhead in some cases.
It would be nice if these slots are somehow reused.
To overcome this issue, I've been thinking about making some changes in my patch.
So far, my proposal would be as follows:
Slot naming can be like pg_<sub_id>_<worker_pid> instead of pg_<sub_id>_<slot_index>. This way each worker can use the same replication slot during their lifetime.
But if a worker is restarted, then it will switch to a new replication slot since its pid has changed.
pg_subscription_rel catalog can store replication slot name for each non-ready relation. Then we can find the slot needed for that particular relation to complete tablesync.
If a worker syncs a relation without any error, everything works well and this new replication slot column from the catalog will not be needed.
However if a worker is restarted due to a failure, the previous run of that worker left its slot behind since it did not exit properly.
And the restarted worker (with a different pid) will see that the relation is actually in SUBREL_STATE_FINISHEDCOPY and want to proceed for the catchup step.
Then the worker can look for that particular relation's replication slot from pg_subscription_rel catalog (slot name should be there since relation state is not ready). And tablesync can proceed with that slot.
There might be some cases where some replication slots are left behind. An example to such cases would be when the slot is removed from pg_subscription_rel catalog and detached from any relation, but tha slot actually couldn't be dropped for some reason. For such cases, a slot cleanup logic is needed. This cleanup can also be done by tablesync workers.
Whenever a tablesync worker is created, it can look for existing replication slots that do not belong to any relation and any worker (slot name has pid for that), and drop those slots if it finds any.
What do you think about this new way of handling slots? Do you see any points of concern?
I'm currently working on adding this change into the patch. And would appreciate any comment.
Thanks,
Melih
pgsql-hackers by date: