I think there is some basic flaw in slot reuse design. Currently, we
copy the table by starting a repeatable read transaction (BEGIN READ
ONLY ISOLATION LEVEL REPEATABLE READ) and create a slot that
establishes a snapshot which is first used for copy and then LSN
returned by it is used in the catchup phase after the copy is done.
The patch won't establish such a snapshot before a table copy as it
won't create a slot each time. If this understanding is correct, I
think we need to use ExportSnapshot/ImportSnapshot functionality to
achieve it or do something else to avoid the problem mentioned.
I did not really think about the snapshot created by replication slot while making this change. Thanks for pointing it out.
I've been thinking about how to fix this issue. There are some points I'm still not sure about.
If the worker will not create a new replication slot, which snapshot should we actually export and then import?
At the line where the worker was supposed to create replication slot but now will reuse an existing slot instead, calling pg_export_snapshot() can export the snapshot instead of CREATE_REPLICATION_SLOT.
However, importing that snapshot into the current transaction may not make any difference since we exported that snapshot from the same transaction. I think this wouldn't make sense.
How else an export/import snapshot logic can be placed in this change?
LSN also should be set accurately. The current change does not handle LSN properly.
I see that CREATE_REPLICATION_SLOT returns consistent_point which indicates the earliest location which streaming can start from. And this consistent_point is used as origin_startpos.
If that's the case, would it make sense to use "confirmed_flush_lsn" of the replication slot in case the slot is being reused?
Since confirmed_flush_lsn can be considered as the safest, earliest location which streaming can start from, I think it would work.
And at this point, with the correct LSN, I'm wondering whether this export/import logic is really necessary if the worker does not create a replication slot. What do you think?
For small tables, it could have a visible performance difference as it
involves database write operations to each time create and drop the
origin. But if we don't want to reuse then also you need to set its
origin_lsn appropriately. Currently (without this patch), after
creating the slot, we directly use the origin_lsn returned by
create_slot API whereas now it won't be the same case as the patch
doesn't create a slot every time.
Correct. For this issue, please consider the LSN logic explained above.
Thanks,
Melih