Hello, Amit!
> I think it is rather less likely or not possible in a parallel apply
> case because such conflicting updates (updates on the same tuple)
> should be serialized at the publisher itself. So one of the updates
> will be after the commit that has the second update.
Glad to hear! But anyway, such logic looks very fragile to me.
> I haven't tried the test based on your description of the general
> problem with DirtySnapshot scan. In case of logical replication, we
> will LOG update_missing type of conflict and the user may need to take
> some manual action based on that.
Current it is just DEBUG1, so it will be probably missed by the user.
> * XXX should this be promoted to ereport(LOG) perhaps?
> */
> elog(DEBUG1,
> "logical replication did not find row to be updated "
> "in replication target relation \"%s\"",
> RelationGetRelationName(localrel));
> }
> I have not tried a test so I could
> be wrong as well. I am not sure we can do anything specific to logical
> replication for this but feel free to suggest if you have ideas to
> solve this problem in general or specific to logical replication.
I've implemented a solution to address the problem more generally, attached the patch (and also the link [1]).
Here's a summary of the changes:
* For each tuple skipped because it was deleted, we now accumulate the maximum xmax.
* Before the scan begins, we store the value of the latest completed transaction.
* If no tuples are found in the index, we check the max(xmax) value. If this value is newer than the latest completed transaction stored before the scan, it indicates that a tuple was deleted by another transaction after the scan started. To ensure all tuples are correctly processed we then rescan the index.