Re: I'm no longer puzzled by a foreign key constraint problem - Mailing list pgsql-general

From Craig Ringer
Subject Re: I'm no longer puzzled by a foreign key constraint problem
Date
Msg-id 49138DEA.2080009@postnewspapers.com.au
Whole thread Raw
In response to Re: I'm no longer puzzled by a foreign key constraint problem  (Jonathan Guthrie <jguthrie@brokersys.com>)
List pgsql-general
Jonathan Guthrie wrote:

>> Tom Lane pointed out where to go from here re the server logging. It'll
>> be interesting to see what turns up once you've got new logs that show
>> the backend pid and the involved xid.
>
> I'll be keeping Mr. Lane's message for the logging tips.  For my own
> part, since people kept making a strong distinction between the READ
> COMMITTED transaction isolation level as opposed to the SERIALIZABLE
> transaction level, I concluded that one likely explanation is that the
> transaction isolation level was not what I thought it was and put an
> explicit "ISOLATION LEVEL READ COMMITTED" in the code that issues the
> BEGIN.
>
> The problem has since gone away.

This VERY strongly suggests that your transactions do in fact overlap.
You've probably got something like this:


Transaction 1                       Transaction 2
BEGIN;
SELECT schema_version FROM app_info;
SELECT create_the_object();
                                    BEGIN;
                                    SELECT schema_version FROM app_info;

COMMIT;
                                    SELECT set_permissions();
                                    COMMIT;


In READ COMMITTED isolation this would work fine. Transaction 2 would be
able to see the object transaction 1 created because transaction 1 had
been committed before the statement "SELECT set_permissions()" was issued.

In SERIALIZABLE isolation, because the snapshot is frozen when the first
database access occurs in the transaction (in this case "SELECT
schema_version ..." but it could be anything) and that happens BEFORE
transaction 1 commits, transaction 2 cannot see the work done by
transaction 1 even though it's committed.

If you have a highly layered system with connection pools, etc, then it
seems reasonably likely that you're doing something like "disabling
autocommit" (which might BEGIN a transaction and issue a few setup
statements) in the connection setup code for your connection pool.

If this turns out to be the case you should be able to tell because
you'll have connections in the `idle in transaction' state in the output
of `select * from pg_stat_activity'. These will mess with VACUUM's
ability to clean out dead tuples, causing table and index bloat that'll
slow your system down and waste disk space. In the SERIALIZABLE
isolation level it'll also cause visibility problems that'll give you
major headaches.

The best thing to do, though, is configure your logging as Tom Lane
suggested and definitively confirm whether the transactions of interest
do or do not overlap in time. Even if the problem appears to have gone
away, there may be other consequences and it's something you really need
to investigate.

If that does turn out to be the problem, you also need to look at why
you're using the SERIALIZABLE isolation level by default. As per the
(excellent and very strongly recommended, I cannot stress it enough)
documentation, there are some downsides to using SERIALIZABLE isolation,
like the need to be prepared to retry a transaction in case of
serialization failure. For most operations READ COMMITTED is entirely
safe; you just need to think about concurrency when writing your SQL and
avoid "read-modify-write" code (eg SELECT value, add to value, UPDATE
value).

If you're using an ORM layer then all bets are off, since they *love* to
read-modify-write and there's not really any way around it. The ORM
should provide optimistic locking to detect conflicts, allowing you to
still use READ COMMITTED safely. If it doesn't, you're stuck with
SERIALIZABLE and will have to be very, very careful with your connection
pooling, transaction lifetimes, etc. Of course, you should be anyway to
be VACUUM friendly among other things.

> I'm thinking that there may have
> been some weird interaction between the business logic that's built in
> to the executable and the database layer code that lives in a shared
> object file.  Perhaps it wasn't the code I changed that did it, but the
> fact that I had to recompile to apply the change.

Well, remove the change, recompile again, and see if the problem comes
back. If it does, you know you've got an issue in your code, and that's
important to confirm.


--
Craig Ringer

pgsql-general by date:

Previous
From: Alan Hodgson
Date:
Subject: Re: serial data type usage
Next
From: Berend Tober
Date:
Subject: Re: serial data type usage