Re: I'm no longer puzzled by a foreign key constraint problem - Mailing list pgsql-general
From | Craig Ringer |
---|---|
Subject | Re: I'm no longer puzzled by a foreign key constraint problem |
Date | |
Msg-id | 49138DEA.2080009@postnewspapers.com.au Whole thread Raw |
In response to | Re: I'm no longer puzzled by a foreign key constraint problem (Jonathan Guthrie <jguthrie@brokersys.com>) |
List | pgsql-general |
Jonathan Guthrie wrote: >> Tom Lane pointed out where to go from here re the server logging. It'll >> be interesting to see what turns up once you've got new logs that show >> the backend pid and the involved xid. > > I'll be keeping Mr. Lane's message for the logging tips. For my own > part, since people kept making a strong distinction between the READ > COMMITTED transaction isolation level as opposed to the SERIALIZABLE > transaction level, I concluded that one likely explanation is that the > transaction isolation level was not what I thought it was and put an > explicit "ISOLATION LEVEL READ COMMITTED" in the code that issues the > BEGIN. > > The problem has since gone away. This VERY strongly suggests that your transactions do in fact overlap. You've probably got something like this: Transaction 1 Transaction 2 BEGIN; SELECT schema_version FROM app_info; SELECT create_the_object(); BEGIN; SELECT schema_version FROM app_info; COMMIT; SELECT set_permissions(); COMMIT; In READ COMMITTED isolation this would work fine. Transaction 2 would be able to see the object transaction 1 created because transaction 1 had been committed before the statement "SELECT set_permissions()" was issued. In SERIALIZABLE isolation, because the snapshot is frozen when the first database access occurs in the transaction (in this case "SELECT schema_version ..." but it could be anything) and that happens BEFORE transaction 1 commits, transaction 2 cannot see the work done by transaction 1 even though it's committed. If you have a highly layered system with connection pools, etc, then it seems reasonably likely that you're doing something like "disabling autocommit" (which might BEGIN a transaction and issue a few setup statements) in the connection setup code for your connection pool. If this turns out to be the case you should be able to tell because you'll have connections in the `idle in transaction' state in the output of `select * from pg_stat_activity'. These will mess with VACUUM's ability to clean out dead tuples, causing table and index bloat that'll slow your system down and waste disk space. In the SERIALIZABLE isolation level it'll also cause visibility problems that'll give you major headaches. The best thing to do, though, is configure your logging as Tom Lane suggested and definitively confirm whether the transactions of interest do or do not overlap in time. Even if the problem appears to have gone away, there may be other consequences and it's something you really need to investigate. If that does turn out to be the problem, you also need to look at why you're using the SERIALIZABLE isolation level by default. As per the (excellent and very strongly recommended, I cannot stress it enough) documentation, there are some downsides to using SERIALIZABLE isolation, like the need to be prepared to retry a transaction in case of serialization failure. For most operations READ COMMITTED is entirely safe; you just need to think about concurrency when writing your SQL and avoid "read-modify-write" code (eg SELECT value, add to value, UPDATE value). If you're using an ORM layer then all bets are off, since they *love* to read-modify-write and there's not really any way around it. The ORM should provide optimistic locking to detect conflicts, allowing you to still use READ COMMITTED safely. If it doesn't, you're stuck with SERIALIZABLE and will have to be very, very careful with your connection pooling, transaction lifetimes, etc. Of course, you should be anyway to be VACUUM friendly among other things. > I'm thinking that there may have > been some weird interaction between the business logic that's built in > to the executable and the database layer code that lives in a shared > object file. Perhaps it wasn't the code I changed that did it, but the > fact that I had to recompile to apply the change. Well, remove the change, recompile again, and see if the problem comes back. If it does, you know you've got an issue in your code, and that's important to confirm. -- Craig Ringer
pgsql-general by date: