Thread: Foreign keys causing conflicts leading to serialization failures
Hello, Using PostgreSQL 8.2, I have "atable" one of whose columns reference a column in "othertable". I see serialization failures as a result of *inserts* to atable in the context of: '"SELECT 1 FROM ONLY othertable x WHERE "otherid" = $1 FOR SHARE OF x" ' in 'INSERT INTO atable (otherid, col2, col3) VALUES (.., .., ..)' My interpretation is that the acquisition of a lock on the row in question is due to the enforcement of the foreign key constraint, and that, combined with the fact that this locking is performed on a per-row level, this creates a conflict with any concurrent transaction updating that row in othertable, regardless of whether 'otherid' is touched. First off, is this correct? If yes: To me, it would be advantegous if "bogus" conflicts were not generated like this. Although I realize that serializable transactions are subject to retries, one still tends to design transactions specifically to avoid generating conflicts. It is useful to know that a particular transaction is guaranteed to not generate serialization failures. And if that is not possible, than at least minimizing the risk is useful. Normally, an INSERT is conflict-free and it would be nice to keep it that way. Unfortunately, the introduction of enforced referential integrity has this negative side effect. If my interpretation above is correct; is the use of row-level locking due to: (1) it being mandated by a standard? (2) "cell"-level beinginefficient? (3) no one having implemented "cell"-level locking? (4) there being a problem with having a DELETE correctly conflict with a "cell"-level lock? (*) something else? In short, I am wondering whether this behavior is intended or a side-efffect of implementation details. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Attachment
Peter Schuller wrote: > Using PostgreSQL 8.2, I have "atable" one of whose columns reference a > column in "othertable". I see serialization failures as a result of > *inserts* to atable in the context of: > > '"SELECT 1 FROM ONLY othertable x WHERE "otherid" = $1 FOR SHARE OF > x" ' in 'INSERT INTO atable (otherid, col2, col3) VALUES (.., .., > ..)' A SELECT ... FROM "othertable" ... FOR SHARE won't conflict with a concurrent update on "atable". Do I guess right that there was also an UPDATE on the row in "othertable"? You may have been misled by a message like: ERROR: could not serialize access due to concurrent update CONTEXT: SQL statement "SELECT 1 FROM ONLY "othertable" x WHERE "otherid" = $1 FOR SHARE OF x" This message will be displayed although the statement that causes the conflict is actually the UPDATE, perhaps because this was the first statement to acquire a lock on that row in this transaction. What most likely happens is the following: Serializable transaction 2 starts and SELECTs something. Transaction 1 starts and gets a RowShareLock on a row of "othertable" with the SELECT ... FOR SHARE Transaction 1 UPDATEs the row and now holds a RowExclusiveLock on the table row and the index row as well. The latter is necessary because that not yet committed UPDATE has also changed the index (even if the indexed columns did not change, the index will point to a new row now). Transaction 1 COMMITs. A new table row and a new index row are visible. Transaction 2 now issues an INSERT on "atable". This requires a RowShareLock on the index row of the index on "othertable" that is referenced by the foreign key constraint. But the corresponding index row has changed since the transaction began, hence the error. Transaction 2 needs the index entry to verify that the foreign key constraint is satisfied. It cannot perform the check on the old version of the index row as this might introduce inconsistencies. Being serializable, it must not use the new index entry. One could argue that, as long as the old index entry and the new index entry have the same values, the transaction could safely proceed. I guess it is for ease of implementation, design or performance reasons that this is not done. Your idea of "cell level locking" will probably not work with PostgreSQL: since any change in a data row will cause a new row to be created, there would be no advantage. Yours, Laurenz Albe
> A SELECT ... FROM "othertable" ... FOR SHARE won't conflict with a > concurrent update on "atable". > > Do I guess right that there was also an UPDATE on the row in > "othertable"? Yes, that was what I meant to convey. Sorry if I was not clear. The point was that an INSERT to "atable" conflicted with an update to "othertable", as a result of the foreign key constraint. > This message will be displayed although the statement that causes > the conflict is actually the UPDATE, perhaps because this was the > first statement to acquire a lock on that row in this transaction. I surmised from the SELECT that it had to be an internally generated SELECT used to enforce referential integrity, because the only statements made in the transaction in question were three INSERT:s. (So in this particular case isolation could simply be dropped to a lower level, but I find this interesting generally because I don't like generating conflicts that are not "real" conflicts in the application domain. And this is an actual implicit conflict at the SQL level, which is even more subtle than the more typical cases like value increments expressed as updates. As I said in this case the isolation be dropped, but in other cases it might trigger a desire to drop the enforced referential integrity instead - which is not good.) [snip] > Transaction 2 now issues an INSERT on "atable". This requires a > RowShareLock on the index row of the index on "othertable" that > is referenced by the foreign key constraint. But the corresponding > index row has changed since the transaction began, hence the error. Yes, this matches my "theory". > Transaction 2 needs the index entry to verify that the foreign > key constraint is satisfied. It cannot perform the check on the old > version of the index row as this might introduce inconsistencies. > Being serializable, it must not use the new index entry. Yes. > One could argue that, as long as the old index entry and the new > index entry have the same values, the transaction could safely proceed. Yes. :) Or alternatively, the fact that it was never updated could be tracked. I guess you might argue that if one, for example, deleted the row and re-created one with another id, that this would in fact break referential integrity. Same for updating the relevant column. But barring implementation reasons, it seems clear that if the row was not dropped and the relevant column was not touched, the ideal implementation would allow the INSERT to complete even in a serializable transaction. > I guess it is for ease of implementation, design or performance reasons > that this is not done. This is what I am wondering. Whether it is done this way due to expecation/standard, or as an implementation side effect. In the latter case it is fixable. > Your idea of "cell level locking" will probably not work with > PostgreSQL: since any change in a data row will cause a new row to be > created, there would be no advantage. I didn't think of that. I can certainly see that update-by-tuple-duplication makes it difficult to implement this case "optimally". Thanks, -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Attachment
Peter Schuller wrote: [about a serialization error caused by a foreign key constraint] >> Transaction 2 now issues an INSERT on "atable". This requires a >> RowShareLock on the index row of the index on "othertable" that >> is referenced by the foreign key constraint. But the corresponding >> index row has changed since the transaction began, hence the error. >> >> Transaction 2 needs the index entry to verify that the foreign >> key constraint is satisfied. It cannot perform the check on the old >> version of the index row as this might introduce inconsistencies. >> Being serializable, it must not use the new index entry. >> >> One could argue that, as long as the old index entry and the new >> index entry have the same values, the transaction could safely proceed. > > Yes. :) > > Or alternatively, the fact that it was never updated could be > tracked. I guess you might argue that if one, for example, deleted the > row and re-created one with another id, that this would in fact break > referential integrity. Same for updating the relevant column. Well, the index entry *was* changed because it now points somewhere else in table "othertable". > But barring implementation reasons, it seems clear that if the row was > not dropped and the relevant column was not touched, the ideal > implementation would allow the INSERT to complete even in a > serializable transaction. > >> I guess it is for ease of implementation, design or performance reasons >> that this is not done. > > This is what I am wondering. Whether it is done this way due to > expecation/standard, or as an implementation side effect. In the > latter case it is fixable. I don't see how this could break a standard. Maybe somebody who knows more than me knows the answer :^) Yours, Laurenz Albe
"Albe Laurenz" <laurenz.albe@wien.gv.at> writes: > Peter Schuller wrote: >> This is what I am wondering. Whether it is done this way due to >> expecation/standard, or as an implementation side effect. In the >> latter case it is fixable. > I don't see how this could break a standard. Actually, I think it does, because we went to great lengths to cause this case to error out. It would be much simpler, code-wise, if the RI checks just always used a current snapshot and didn't worry about whether serializability had been violated. (Albe's description of the implementation is largely fiction, but the conclusion is accurate: we throw error if the referenced PK row has been updated since the serializable transaction started. The exact nature of the update is not considered.) regards, tom lane
Tom Lane wrote: > >> This is what I am wondering. Whether it is done this way due to > >> expecation/standard, or as an implementation side effect. In the > >> latter case it is fixable. > > > I don't see how this could break a standard. > > Actually, I think it does, because we went to great lengths to cause > this case to error out. It would be much simpler, code-wise, if the > RI checks just always used a current snapshot and didn't worry about > whether serializability had been violated. > > (Albe's description of the implementation is largely fiction, but the > conclusion is accurate: we throw error if the referenced PK row has been > updated since the serializable transaction started. The exact nature > of the update is not considered.) I am aware that I know nothing of the implementation and only can describe the behaviour... Of course a serializable transaction cannot just use the current index entry for verifying referential integrity, because then it might not behave consistently with the transaction snapshot. What I mean is: if the serializable transaction went out of its way to check if the update it wants to make is both consistent with its snapshot and the current index row, it should not violate anything to allow that update. The index entry would not be changed in that case. Yours, Laurenz Albe