Thread: I'm puzzled by a foreign key constraint problem
I've been tearing my hair out over this one issue and I'm hoping that someone on this list will have an insight on the matter that will shed some light on why the system is doing what it's doing. I have a database with a number of tables, two of which are projects and resources. We also have a users table and tables for files and folders and such. Files, folders, users, and so forth are considered resources and their identifiers (UUID's) are listed in the resources table. Entries in the resources table are used to do things like attach metadata to resources and to set permissions to allow users other than the owner to access specified resources. Because permissions are associated with resources, there is a foreign key constraint between the resourceid field in the permissions table and the resourceid column of the resource table, which is the primary key of the resource table. The system itself is written in C++ and database wrappers have been created which work with the original database (MS SQL Server) and we are porting to Postgres. The stored procedures (250 or so of them) have been converted into Postgres functions using a tool called SQLWays. I'm using named prepared procedures for each of the Postgres functions that are called, plus three, "BEGIN", "COMMIT", and "ROLLBACK", which consist of the named operation. Those three are necessary because we're returning data in cursors, so each time one of those functions is executed it's done inside a transaction and a status code is returned which determines whether the transaction is committed or rolled back. When I create a project, entries in the project table and the resource table are created in a single function. Then, separate functions are called to set the owner's access to the new project. These other functions are failing because of the resourceid foreign key constraint. Now, I've turned the logging as far up as it will go, and I seem to have verified that the first transaction, the one that populates the resource table, is completed and committed before any calls are made to the function that sets the permissions value. (It's called five times and the all fail.) The logging generated by the foreign key constraint failure includes the UUID for the resource and I can go back in later and do a query for that record in the resource table and that query successfully returns a single row. Also, I can go in at a later time and execute the function that sets the permissions and it works. To me, this implies that it's a race condition. When I first ran into this problem last week I set the foreign key check to deferred and it seemed to stop complaining although I don't remember what steps (if any) I did to test that conclusion. In any case, it happens with the check set to deferred, so that didn't fix the problem. I don't think it should have had an effect anyway. I spent the afternoon reading documentation about constraints and keys and various options associated with them Anyway, I need for these operations to succeed because the lack of permissions causes odd problems in other parts of the system. I don't even know where to begin looking for the problem or what magic might be useful at allowing the system to work as intended. I'm sure I'm doing something wrong, but I don't know what. Does this problem ring a bell with anyone? Is there any information that isn't clear from my description or which I could gather which would be helpful? !DSPAM:1544,490fb75c40305259311678!
Jonathan Guthrie wrote: > The stored procedures (250 or so of them) have > been converted into Postgres functions using a tool called SQLWays. I'm > using named prepared procedures for each of the Postgres functions that > are called, plus three, "BEGIN", "COMMIT", and "ROLLBACK", which consist > of the named operation. You're talking about prepared statements at the ODBC level, right? Is this with client-side or with server-side prepare? PostgreSQL won't actually let you use these in prepared statements at the SQL level: craig=> PREPARE teststm AS BEGIN; ERROR: syntax error at or near "BEGIN" LINE 1: PREPARE teststm AS BEGIN; Nor can you use them in a function: craig=> CREATE OR REPLACE FUNCTION testfn() RETURNS void AS $$ craig$> BEGIN; craig$> $$ LANGUAGE 'sql'; CREATE FUNCTION craig=> SELECT testfn(); ERROR: BEGIN is not allowed in a SQL function CONTEXT: SQL function "testfn" during startup This makes sense, given that to invoke a function without a containing transaction is impossible; PostgreSQL will implicitly wrap it in a transaction that's committed as soon as the statement is executed. I don't *think* you can use BEGIN etc in prepared statements at the v3 protocol level for the same reasons, but I'm not 100% certain of that. Given those limitations, I'm assuming you're talking about named prepared statements on the client side. > Now, I've turned the logging as far up as it will go, and I seem to have > verified that the first transaction, the one that populates the resource > table, is completed and committed before any calls are made to the > function that sets the permissions value. What transaction isolation level are you using? If you're on the READ COMMITTED level, then yes all you should need is for the transaction that creates the records of interest to commit before another transaction (even one that was already running) can see the values. In any case, I'm a little puzzled as to why you're not doing the creation of the initial records and the related permissions records etc all in the one transaction. > Anyway, I need for these operations to succeed because the lack of > permissions causes odd problems in other parts of the system. That really shows that you need to do it all in one transaction, then. -- Craig Ringer
Jonathan Guthrie wrote: > When I create a project, entries in the project table and the resource > table are created in a single function. Then, separate functions are > called to set the owner's access to the new project. These other > functions are failing because of the resourceid foreign key constraint. Have you turned statement logging on? Your message suggests that's the case, but didn't say so explicitly. Are the two steps: 1. Create project, resource 2. Set access-rights done in separate connections by any chance? If so it's possible (due to MVCC) that #2 is still seeing the database as it was before #1 committed. > Anyway, I need for these operations to succeed because the lack of > permissions causes odd problems in other parts of the system. If you want both steps to succeed or fail together though, they need to be in the same transaction. -- Richard Huxton Archonet Ltd
On Tue, 2008-11-04 at 07:49 +0000, Richard Huxton wrote: > Jonathan Guthrie wrote: > > When I create a project, entries in the project table and the resource > > table are created in a single function. Then, separate functions are > > called to set the owner's access to the new project. These other > > functions are failing because of the resourceid foreign key constraint. > > Have you turned statement logging on? Your message suggests that's the > case, but didn't say so explicitly. > > Are the two steps: > 1. Create project, resource > 2. Set access-rights > done in separate connections by any chance? If so it's possible (due to > MVCC) that #2 is still seeing the database as it was before #1 committed. It's possible, likely even. We use a connection pool to manage connections to the database and they're doled out as the system sees fit. However, at some point every update has to finish such that any view of the database will see that update as finished, right? > > Anyway, I need for these operations to succeed because the lack of > > permissions causes odd problems in other parts of the system. > If you want both steps to succeed or fail together though, they need to > be in the same transaction. That's what Mr Ringer said, and although I understand that answer and I understand the reason that two people have independently responded with it, I'm dissatisfied with it. There are two reasons why I'm dissatisfied with that answer. First, my mandate is basically to create an interface layer for Postgres and then port the SQL Server stored procedures without changing how they work. If I change the logic in this part, it will be different not only from the mechanism used in the original SQL Server stored procedure, but also different from the logic used in other stored procedures that do similar things. The second reason is because adding permissions doesn't just happen at project creation time. The software I work on is middleware for the actual client applications and the client can assign any user permissions to access the project just as soon as it knows the project's ID, which is one of the values returned by the project creation function. If the issue is a difference in views because the requests come in on different connections, then there's a time window during which a valid and unanticipatable request from the client could fail if the request happens to use a connection to communicate with the database that is different from the one used to create the project. Anyway, while I agree that adding the logic to set permissions to the project create function seems the simplest approach to dealing with the issue, I'd really rather not change the logic until I've thoroughly explored all other options. I CAN guarantee that the other operations on a project definitely won't begin until the create is committed. So, is there any way of causing a commit to not return until all the views are consistent?
On Tue, Nov 4, 2008 at 11:18 AM, Jonathan Guthrie <jguthrie@brokersys.com> wrote: > On Tue, 2008-11-04 at 07:49 +0000, Richard Huxton wrote: >> Jonathan Guthrie wrote: >> > When I create a project, entries in the project table and the resource >> > table are created in a single function. Then, separate functions are >> > called to set the owner's access to the new project. These other >> > functions are failing because of the resourceid foreign key constraint. >> >> Have you turned statement logging on? Your message suggests that's the >> case, but didn't say so explicitly. >> >> Are the two steps: >> 1. Create project, resource >> 2. Set access-rights >> done in separate connections by any chance? If so it's possible (due to >> MVCC) that #2 is still seeing the database as it was before #1 committed. > > It's possible, likely even. We use a connection pool to manage > connections to the database and they're doled out as the system sees > fit. However, at some point every update has to finish such that any > view of the database will see that update as finished, right? Sure. But, if the query to add the permissions is running under an already active transaction, and you're running in serializable mode, it can't see the changes because it started before they were committed. >> > Anyway, I need for these operations to succeed because the lack of >> > permissions causes odd problems in other parts of the system. > >> If you want both steps to succeed or fail together though, they need to >> be in the same transaction. > > That's what Mr Ringer said, and although I understand that answer and I > understand the reason that two people have independently responded with > it, I'm dissatisfied with it. Mr Ringer was right. So was Jonathan. > There are two reasons why I'm dissatisfied with that answer. First, my > mandate is basically to create an interface layer for Postgres and then > port the SQL Server stored procedures without changing how they work. Even if they're broken? > If I change the logic in this part, it will be different not only from > the mechanism used in the original SQL Server stored procedure, but also > different from the logic used in other stored procedures that do similar > things. Then they might be broken and need fixing as well. Blindly converting broken code that just happened to work is not the best way to approach a project. > The second reason is because adding permissions doesn't just happen at > project creation time. Is there some reason you can call the external permission setting function from within the function that adds the users? This would seem the simplest and cleanest solution, since they would then automatically share a transaction. Other operations could still call the permissions setting function as they used to.
Jonathan Guthrie wrote: > On Tue, 2008-11-04 at 07:49 +0000, Richard Huxton wrote: >> Jonathan Guthrie wrote: >>> When I create a project, entries in the project table and the resource >>> table are created in a single function. Then, separate functions are >>> called to set the owner's access to the new project. These other >>> functions are failing because of the resourceid foreign key constraint. >> Have you turned statement logging on? Your message suggests that's the >> case, but didn't say so explicitly. >> >> Are the two steps: >> 1. Create project, resource >> 2. Set access-rights >> done in separate connections by any chance? If so it's possible (due to >> MVCC) that #2 is still seeing the database as it was before #1 committed. > > It's possible, likely even. We use a connection pool to manage > connections to the database and they're doled out as the system sees > fit. However, at some point every update has to finish such that any > view of the database will see that update as finished, right? You'll need to read the section of the manuals regarding transaction isolation and how it impacts MVCC for full details, but the short answer is "no". A pre-existing transaction might well see the database as it was when its snapshot was first taken. More likely to happen if you have a connection pool that issues BEGINs too early... >>> Anyway, I need for these operations to succeed because the lack of >>> permissions causes odd problems in other parts of the system. > >> If you want both steps to succeed or fail together though, they need to >> be in the same transaction. > > That's what Mr Ringer said, and although I understand that answer and I > understand the reason that two people have independently responded with > it, I'm dissatisfied with it. > > There are two reasons why I'm dissatisfied with that answer. First, my > mandate is basically to create an interface layer for Postgres and then > port the SQL Server stored procedures without changing how they work. > If I change the logic in this part, it will be different not only from > the mechanism used in the original SQL Server stored procedure, but also > different from the logic used in other stored procedures that do similar > things. The logic is wrong regardless of whether you use PostgreSQL, SQL Server, Oracle or any other DB though. If you want a guarantee that both actions succeed or fail together you'll need to wrap them in a transaction. What you're saying is that at the moment there is no such guarantee with SQL Server as your database, it just happens to work most (e.g. 99.99%) of the time. > The second reason is because adding permissions doesn't just happen at > project creation time. The software I work on is middleware for the > actual client applications and the client can assign any user > permissions to access the project just as soon as it knows the project's > ID, which is one of the values returned by the project creation > function. If the issue is a difference in views because the requests > come in on different connections, then there's a time window during > which a valid and unanticipatable request from the client could fail if > the request happens to use a connection to communicate with the database > that is different from the one used to create the project. This is separate from the issue of both actions succeeding or failing. > Anyway, while I agree that adding the logic to set permissions to the > project create function seems the simplest approach to dealing with the > issue, I'd really rather not change the logic until I've thoroughly > explored all other options. I CAN guarantee that the other operations > on a project definitely won't begin until the create is committed. So, > is there any way of causing a commit to not return until all the views > are consistent? It doesn't. They are. But I think your second connection is fixed to an older snapshot. Set aside an hour, read through the concurrency control / transaction-isolation section of the manuals and experiment with two psql screens open at the same time until you're clear how it all works. It'll probably take 5 mins to find the problem then (in consunction with statment logging turned on at the server side). I might be wrong about the cause, but since (1) foreign-keys work in PG, (2) you seem to know what you're doing, I'm guessing it's a combination of the subtleties of mvcc and your connection-pool interacting. -- Richard Huxton Archonet Ltd
Jonathan Guthrie wrote: > It's possible, likely even. We use a connection pool to manage > connections to the database and they're doled out as the system sees > fit. However, at some point every update has to finish such that any > view of the database will see that update as finished, right? So that any /new/ snapshot of the state of the database sees it, yes. If you have long-lived transactions at the SERIALIZABLE isolation level, they won't (by design) see changes committed by other transactions after the first statement in the SERIALIZABLE transaction has been issued. See the manual for more information on concurrency, MVCC, and transaction isolation. > There are two reasons why I'm dissatisfied with that answer. First, my > mandate is basically to create an interface layer for Postgres and then > port the SQL Server stored procedures without changing how they work. This is unlikely to be possible. The two databases are extremely different in some important ways. In particular, MS SQL Server uses a locking approach to concurrency, whereas PostgreSQL uses an multi-version approach (more like Oracle). You should usually be able to make the locking approach work in PostgreSQL, but there *will* be differences you need to think about in the way procedures interact when run concurrently. You may need to add more explicit locking to correct for assumptions that're valid under MS SQL Server but not under PostgreSQL, or adjust your logic to exploit multi-versioning properly instead. You will get much better performance if you adapt your code to the MVCC model instead of trying to stick to using locking for concurrency control. It doesn't help that PostgreSQL does not at present support true stored procedures. There is no top-level CALLable procedure support; instead PostgreSQL has very powerful functions. The most important difference this makes is that you CAN NOT perform transaction control operations (BEGIN, ROLLBACK, COMMIT) within any procedural function in PostgreSQL. They are inherently wrapped in a transaction. You *can* RAISE EXCEPTION from PL/PgSQL to trigger a rollback (unless the caller traps and handles the exception), but there's no way to force a commit or begin a new and distinct transaction. OK, that's not absolutely 100% true. You can do it with dblink. You just don't want to. Anyway, if your MS SQL server stored procedures expect to be able to BEGIN a transaction, do some work, COMMIT it, then BEGIN another and do some more work before COMMITTING that second piece of work, you're going to have to do some redesign. > The second reason is because adding permissions doesn't just happen at > project creation time. That's fine. Nothing stops you from issuing something like: BEGIN; SELECT create_it(blah); SELECT set_permissions(blah, perms); COMMIT; and later using: SELECT set_permissions(blah, otherperms); standalone or inside another, unrelated transaction. The point is that if your initial create and the setting of the initial permissions must succeed or fail together, they MUST be done within a single transaction. That is, in fact, the fundamental point of database transactions. What you should avoid doing is: TRANSACTION 1 TRANSACTION 2 BEGIN; BEGIN; SELECT create_it(blah); SELECT set_permissions(blah, perms); COMMIT; COMMIT; ... because that just won't work. It sounds like you've got that right, but you might be doing this: TRANSACTION 1 TRANSACTION 2 BEGIN; BEGIN; SET transaction_isolation = SERIALIZABLE; -- do something else that triggers -- freezing of the transaction's snapshot, -- even something like: SELECT 1; SELECT create_it(blah); COMMIT; SELECT set_permissions(blah, perms); COMMIT; ... which will also fail. > The software I work on is middleware for the > actual client applications and the client can assign any user > permissions to access the project just as soon as it knows the project's > ID, which is one of the values returned by the project creation > function. When the project's creation function returns, the created project is not yet visible to other transactions, even ones begun after the function returns. It only becomes visible after the transaction in which the create function was called COMMITs. This is fine if your later manipulations of the permissions etc happen within the same transaction as the initial create (as they should). However, if you're trying to refer to the created record from another transaction before the one that created the record has committed, it won't yet be visible. Furthermore, if the transaction trying to refer to the record created by some other transaction is at the SERIALIZABLE isolation level, it won't be able to see the created object at all if the first statement in the transaction was issued before the record of interest was created. It's likely that READ COMMITTED is what you want anyway, but you really /must/ read the documentation on transaction isolation levels etc to understand and be sure about what you need. > If the issue is a difference in views because the requests > come in on different connections, then there's a time window during > which a valid and unanticipatable request from the client could fail if > the request happens to use a connection to communicate with the database > that is different from the one used to create the project. It has nothing to do with which connection is used. It's all about the transactions involved; a READ COMMITTED or SERIALIZABLE transaction cannot see another's dirty (uncomitted) changes, and a SERIALIZABLE transaction cannot see committed changes newer than its own start time. > Anyway, while I agree that adding the logic to set permissions to the > project create function seems the simplest approach to dealing with the > issue, I'd really rather not change the logic until I've thoroughly > explored all other options. I CAN guarantee that the other operations > on a project definitely won't begin until the create is committed. OK, then you should be having no issues if you're using the READ COMMITTED isolation level, or if the transaction that performs those other operations begins after the transaction that created the project committed. > So, > is there any way of causing a commit to not return until all the views > are consistent? They're always internally consistent; a transaction's view of the database state is never contaminated by uncommitted data, and changes happen atomically between statements as far as the transaction is concerned (if it's in READ COMMITTED mode) or never become visible at all (in SERIALIZABLE mode). Different concurrent transaction's views of the database state are not DESIGNED to be consistent when compared with each other externally by some side channel. Two READ COMMITTED transactions that have no dirty changes will see the same view of the database, but two SERIALIZABLE transactions started at different times won't. Nor will a SERIALIZABLE transaction compared to a READ COMMITTED transaction when changes have been committed after the SERIALIZABLE transaction's snapshot was taken. And, of course, even two READ COMMITTED transactions don't see the same view of the database state if either or both of them have made changes. -- Craig Ringer
On Wed, 2008-11-05 at 04:40 +0900, Craig Ringer wrote: > The point is that if your initial create and the setting of the initial > permissions must succeed or fail together, they MUST be done within a > single transaction. That is, in fact, the fundamental point of database > transactions. I understand that. Honestly, I do. If I hadn't ever said that odd things happen when the permissions aren't set, then maybe I could find out what I'm doing wrong. > What you should avoid doing is: > > TRANSACTION 1 TRANSACTION 2 > > BEGIN; > BEGIN; > SELECT create_it(blah); > SELECT set_permissions(blah, perms); > COMMIT; > COMMIT; > > ... because that just won't work. It sounds like you've got that right, > but you might be doing this: > TRANSACTION 1 TRANSACTION 2 > > BEGIN; > BEGIN; > SET transaction_isolation = SERIALIZABLE; > -- do something else that triggers > -- freezing of the transaction's snapshot, > -- even something like: > SELECT 1; > SELECT create_it(blah); > COMMIT; > SELECT set_permissions(blah, perms); > COMMIT; > > ... which will also fail. The thing is, the C++ code does this BEGIN transaction 1 INSERT project COMMIT BEGIN transaction 2 SET permissions COMMIT or, at least, it's supposed to. Those two operations are not supposed to overlap at all even if they're on two different connections. I thought I had verified this by looking at the log file. I mean, I can look at the log file and see things like 2008-11-03 16:29:22 CST DEBUG: 00000: StartTransactionCommand and 2008-11-03 16:29:22 CST DEBUG: 00000: CommitTransactionCommand where I would expect them to if what I'm expecting is going on, but the log file doesn't appear to have enough information to see a transaction created, proceed, and then end. That is, how do I know which transaction was started and which one was committed? I'm kind of confused by lines like this: 2008-11-03 16:29:22 CST DEBUG: 00000: name: unnamed; blockState: INPROGRESS; state: INPROGR, xid/subid/cid: 678145/1/4,nestlvl: 1, children: 678146 678147 Is there an easy explanation somewhere?
Jonathan Guthrie <jguthrie@brokersys.com> writes: > ... or, at least, it's supposed to. Those two operations are not supposed > to overlap at all even if they're on two different connections. I > thought I had verified this by looking at the log file. I mean, I can > look at the log file and see things like > 2008-11-03 16:29:22 CST DEBUG: 00000: StartTransactionCommand > and > 2008-11-03 16:29:22 CST DEBUG: 00000: CommitTransactionCommand > where I would expect them to if what I'm expecting is going on, but the > log file doesn't appear to have enough information to see a transaction > created, proceed, and then end. That is, how do I know which > transaction was started and which one was committed? You need to add more identification info to your log_line_prefix. The PID would be the most reliable way to tie those entries together, but I think there's also an option that writes the transaction ID. > I'm kind of confused by lines like this: > 2008-11-03 16:29:22 CST DEBUG: 00000: name: unnamed; blockState: INPROGRESS; state: INPROGR, xid/subid/cid: 678145/1/4,nestlvl: 1, children: 678146 678147 > Is there an easy explanation somewhere? You'd have to look at the source code to figure out most of the DEBUG-level messages. regards, tom lane
Jonathan Guthrie wrote: > The thing is, the C++ code does this > > BEGIN transaction 1 > INSERT project > COMMIT > > BEGIN transaction 2 > SET permissions > COMMIT > > or, at least, it's supposed to. OK, and we know that if it is doing what it is supposed to, transaction 2 /must/ see the changes made by transaction 1. Either it's not doing what it's supposed to, or you've hit a mind bogglingly huge bug in PostgreSQL that's never been noticed before. I'm inclined to suspect the former. Tom Lane pointed out where to go from here re the server logging. It'll be interesting to see what turns up once you've got new logs that show the backend pid and the involved xid. Sorry about hammering on the point re transaction interleaving and so on. You do see people here who've misunderstood how MVCC visibility works, and I wasn't sure from your mail that your transactions didn't overlap. -- Craig Ringer
On Wed, 2008-11-05 at 12:14 +0900, Craig Ringer wrote: > Jonathan Guthrie wrote: > > > The thing is, the C++ code does this > > > > BEGIN transaction 1 > > INSERT project > > COMMIT > > > > BEGIN transaction 2 > > SET permissions > > COMMIT > > > > or, at least, it's supposed to. > > OK, and we know that if it is doing what it is supposed to, transaction > 2 /must/ see the changes made by transaction 1. Either it's not doing > what it's supposed to, or you've hit a mind bogglingly huge bug in > PostgreSQL that's never been noticed before. I'm inclined to suspect the > former. As am I. In fact, I'm rather counting on it, and that's the reason I posted my question to pgsql-general rather than bugs. As an aside, I prefer any problems I run into to be in my code because I can most readily fix those. > Tom Lane pointed out where to go from here re the server logging. It'll > be interesting to see what turns up once you've got new logs that show > the backend pid and the involved xid. I'll be keeping Mr. Lane's message for the logging tips. For my own part, since people kept making a strong distinction between the READ COMMITTED transaction isolation level as opposed to the SERIALIZABLE transaction level, I concluded that one likely explanation is that the transaction isolation level was not what I thought it was and put an explicit "ISOLATION LEVEL READ COMMITTED" in the code that issues the BEGIN. The problem has since gone away. Of course, with a problem that isn't perfectly understood one must guard carefully against the idea that you've solved a problem that doesn't recur or that the solution is certain to be what you thought it was. I'm thinking that there may have been some weird interaction between the business logic that's built in to the executable and the database layer code that lives in a shared object file. Perhaps it wasn't the code I changed that did it, but the fact that I had to recompile to apply the change. > Sorry about hammering on the point re transaction interleaving and so > on. You do see people here who've misunderstood how MVCC visibility > works, and I wasn't sure from your mail that your transactions didn't > overlap. There's nothing to be sorry for. I can now see several deficincies in my original message that hindered rather than helped the communication. The only thing I can say in my own defense is that it appears likely that if I had known what sort of things I needed to be certain of, and certain to mention, in my initial message, then I wouldn't have had to write the message in the first place. In any case, my thanks to all of you for your attention and time. I am profoundly grateful for it.
Jonathan Guthrie wrote: >> Tom Lane pointed out where to go from here re the server logging. It'll >> be interesting to see what turns up once you've got new logs that show >> the backend pid and the involved xid. > > I'll be keeping Mr. Lane's message for the logging tips. For my own > part, since people kept making a strong distinction between the READ > COMMITTED transaction isolation level as opposed to the SERIALIZABLE > transaction level, I concluded that one likely explanation is that the > transaction isolation level was not what I thought it was and put an > explicit "ISOLATION LEVEL READ COMMITTED" in the code that issues the > BEGIN. > > The problem has since gone away. This VERY strongly suggests that your transactions do in fact overlap. You've probably got something like this: Transaction 1 Transaction 2 BEGIN; SELECT schema_version FROM app_info; SELECT create_the_object(); BEGIN; SELECT schema_version FROM app_info; COMMIT; SELECT set_permissions(); COMMIT; In READ COMMITTED isolation this would work fine. Transaction 2 would be able to see the object transaction 1 created because transaction 1 had been committed before the statement "SELECT set_permissions()" was issued. In SERIALIZABLE isolation, because the snapshot is frozen when the first database access occurs in the transaction (in this case "SELECT schema_version ..." but it could be anything) and that happens BEFORE transaction 1 commits, transaction 2 cannot see the work done by transaction 1 even though it's committed. If you have a highly layered system with connection pools, etc, then it seems reasonably likely that you're doing something like "disabling autocommit" (which might BEGIN a transaction and issue a few setup statements) in the connection setup code for your connection pool. If this turns out to be the case you should be able to tell because you'll have connections in the `idle in transaction' state in the output of `select * from pg_stat_activity'. These will mess with VACUUM's ability to clean out dead tuples, causing table and index bloat that'll slow your system down and waste disk space. In the SERIALIZABLE isolation level it'll also cause visibility problems that'll give you major headaches. The best thing to do, though, is configure your logging as Tom Lane suggested and definitively confirm whether the transactions of interest do or do not overlap in time. Even if the problem appears to have gone away, there may be other consequences and it's something you really need to investigate. If that does turn out to be the problem, you also need to look at why you're using the SERIALIZABLE isolation level by default. As per the (excellent and very strongly recommended, I cannot stress it enough) documentation, there are some downsides to using SERIALIZABLE isolation, like the need to be prepared to retry a transaction in case of serialization failure. For most operations READ COMMITTED is entirely safe; you just need to think about concurrency when writing your SQL and avoid "read-modify-write" code (eg SELECT value, add to value, UPDATE value). If you're using an ORM layer then all bets are off, since they *love* to read-modify-write and there's not really any way around it. The ORM should provide optimistic locking to detect conflicts, allowing you to still use READ COMMITTED safely. If it doesn't, you're stuck with SERIALIZABLE and will have to be very, very careful with your connection pooling, transaction lifetimes, etc. Of course, you should be anyway to be VACUUM friendly among other things. > I'm thinking that there may have > been some weird interaction between the business logic that's built in > to the executable and the database layer code that lives in a shared > object file. Perhaps it wasn't the code I changed that did it, but the > fact that I had to recompile to apply the change. Well, remove the change, recompile again, and see if the problem comes back. If it does, you know you've got an issue in your code, and that's important to confirm. -- Craig Ringer