Hi hackers,
While reading the logical replication apply code in execReplication.c, I noticed that the retry paths in RelationFindReplTupleByIndex and RelationFindReplTupleSeq for concurrent updates and deletes have no test coverage [1]. Specifically, when the same tuple is being updated/deleted on the publisher and subscriber at the same time, the dirty snapshot finds the tuple being modified by another transaction, the apply worker waits and retries the index/sequential scan.
The attached patch adds an injection point before table_tuple_lock and a TAP test exercising these retry paths, hitting both TM_Updated and TM_Deleted.
While working on this, I also noticed minor issues in the conflict handling code:
1/ In RelationFindReplTupleByIndex, ExecMaterializeSlot was called before checking should_refetch_tuple. If the tuple needs to be refetched due to a concurrent modification, this materialization is wasted work. Moved it after the retry check, so it only runs when we've successfully locked the tuple.
2/ In RelationFindReplTupleSeq, ExecCopySlot and a separate TupleTableSlot allocation were unnecessary. Made this function consistent with RelationFindReplTupleByIndex by using outslot directly while scanning the heap, avoiding the extra TTS allocation and copy overhead.
I'm aware that these are not major performance issues in practice, but it keeps the two functions consistent and avoids unnecessary TTS materialize and copy costs.
I also think that we could deduplicate these two functions since the code looks mostly the same, but that would be an overkill IMHO.
Please find the attached patch. Appreciate any feedback. Thank you!
[1]
https://coverage.postgresql.org/src/backend/executor/execReplication.c.gcov.html