Re: ask for review of MERGE - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: ask for review of MERGE |
Date | |
Msg-id | 4CC45BBA.9080400@2ndquadrant.com Whole thread Raw |
In response to | Re: ask for review of MERGE (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: ask for review of MERGE
|
List | pgsql-hackers |
Robert Haas wrote: > I am also wondering exactly what the semantics are supposed to be > under concurrency. We can't assume that it's OK to treat WHEN NOT > MATCHED as WHEN MATCHED if a unique-key violation is encountered. The > join condition could be something else entirely. I guess we could > scan the target table with a dirty snapshot and block on any in-doubt > tuples, similar to what EXCLUDE constraints do internally, but > throwing MVCC out the window doesn't seem right either. > You've answered Tom's question of why not to just INSERT and trap the error here. Presuming a unique key violation is what will kick back in this situation and to just follow the other side doesn't cover all the ways this can get used. I'm thinking of this problem now as being like the one that happens when a concurrent UPDATE happens. To quote from the docs here: "a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row." What I think we can do with adding a key reservation is apply the same sort of logic to INSERTs too, given a way to lock an index entry before the row itself is complete. If MERGE hits a WHERE condition that finds a key lock entry in the index, it has to sit and wait for that other command to finish. There isn't any other sensible behavior I can see in that situation, just like there isn't one for UPDATE right now. If the transaction that was locking the index entry commits, it has to start over with re-evaluating the WHEN MATCHED part (the equivilent of the WHERE in the UPDATE case). But it shouldn't need to do a new JOIN again, because that condition was proven to be met already by the lock squatting on that index key, without even having the rest of the row there yet to check its data. If the other entry commits, it would then follow the WHEN MATCHED path and in the UPSERT case do an UPDATE. If the transaction who had the key reservervation rolls back, the re-evaluation of the MERGE WHEN MATCHED will fail, at which point it follows the WHERE FOUND INSERT path. As it will now be the locker of the key reservation it had been waiting on earlier at that point, it will be free to do the INSERT with no concerns about race conditions or retries. In the SERIALIZABLE case, again just try to follow the same slightly different rules that exist for concurrent UPDATE commands. There may be a tricky point here that we will need to warn people that UPSERT implementations need to make sure they only reference the columns of the primary key in the MERGE conditions for this to work as expected, because otherwise they might get back a unique key violation error anyway. I don't know how other databases deal with that particular problem. I have considered following the somewhat distasteful path of installing SQL Server or Oracle here just to see how they handle some of the pathological test cases I'm now thinking about for this command. This particular weak spot in MVCC is already there for UPDATE, and this approach seems to me to be the most logical way to extend the same basic implementation to also eliminate some concurrent MERGE race conditions, including the most common one the UPSERT case is stuck with. But I have no intention of halting work on the basic MERGE to wait for this problem to be solved even if I'm completely wrong about what I've outlined above. That style of thinking--don't even start until every a perfect solution for every possible problem is known--is something I consider a problem in this community's development model, one that has contributed to why no work has been done on this feature until now. This one splits nicely into "get the implemenation working" and "improve the concurrency" parts, and I haven't heard a good reason yet why those need to coupled together. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
pgsql-hackers by date: