Thread: Transactional DDL, but not Serializable

Transactional DDL, but not Serializable

From
Stephen Frost
Date:
Greetings,
 We have a curious situation, consider this:
 Process 1:   BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;CRETE TABLE table1 (i integer);INSERT INTO table1 VALUES
(13);
 Process 2:   BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;CREATE TABLE table2 (i integer);INSERT INTO table2 VALUES
(123);COMMIT;
 Process 1:   SELECT * FROM pg_class WHERE relname = 'table2'; -- no rows returnedSELECT * FROM table2; -- works?!  but
atleast no records returnedINSERT INTO table2 VALUES (456);   -- also works..  now we have a tuple in the table which
appearsto  -- have been added before the table existed..COMMIT; 
 This happens, of course, because we use SysCache to look up table names to Oids and that uses SnapshotNow.  In my
view,this violates the basic principle of least suprise and means that while we have transaction DDL, but it's not
reallyserializable (no, I don't particularly care about that). 
 What I do worry about is that the bulk load discussion going on could be shot down because of this.  We won't let the
earliertransaction see any records in the table today because those tuples have an xmin later, but if we were to insert
thosetuples with the frozen XID (as I proposed in the other thread) then they'd be visible. 
 I don't believe fixing this would be terribly difficult and, I believe, would be similar to what we've done else where
(eg:with indexes)- basically, add a column to pg_class with the 'createdxmin' and then compare that against our
transactionwhenever we're doing table lookups. 
 Thoughts?
     Thanks,
    Stephen

Re: Transactional DDL, but not Serializable

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
>   I don't believe fixing this would be terribly difficult and, I
>   believe, would be similar to what we've done else where (eg: with
>   indexes)- basically, add a column to pg_class with the 'createdxmin'
>   and then compare that against our transaction whenever we're doing
>   table lookups.

Making DDL serializable is *not* simple, and half-baked hacks won't
make that situation better ...
        regards, tom lane


Re: Transactional DDL, but not Serializable

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Making DDL serializable is *not* simple, and half-baked hacks won't
> make that situation better ...

Sorry, that obviously didn't come across clearly (I think I've just been
talking to Kevin far too much).

I'm not interested in making them serializable.  I'd like to not have
tables randomly appear during a serializable transaction.
Thanks,
    Stephen

Re: Transactional DDL, but not Serializable

From
Joshua Berkus
Date:
> Making DDL serializable is *not* simple, and half-baked hacks won't
> make that situation better ...

That seemed unnecessary.  Whether or not you approve of Stephen's solution, he is dealing with a real issue.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


Re: Transactional DDL, but not Serializable

From
Stephen Frost
Date:
* Joshua Berkus (josh@agliodbs.com) wrote:
> That seemed unnecessary.  Whether or not you approve of Stephen's solution, he is dealing with a real issue.

The solution felt, to me at least, to have a lot of parallel to an
index's indcheckxmin.  We've dealt with this issue there and have a
precedent for how to deal with it.  Based on discussions with other
folks it sounds like we may be forced to do it for constraints also, and
I think we'd want to try to deal with all of them in a similar way.

Perhaps the current solution for indexes is a hack and should be tossed
out with a wholesale replacment which solves all these problems, which
would certainly be quite a bit of work, but if that's necessary then
let's discuss it and get an idea down on a wiki somewhere about what
that should look like.
Thanks,
    Stephen

Re: Transactional DDL, but not Serializable

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Sorry, that obviously didn't come across clearly (I think I've just been
> talking to Kevin far too much).

> I'm not interested in making them serializable.  I'd like to not have
> tables randomly appear during a serializable transaction.

Well, basically, you can't have that.  Example: you have an existing
table with primary key, and while you're in the middle of doing some
long transaction, somebody else creates a table with a foreign-key
reference to the one you're about to do a delete from.  Being
serializable does not excuse you from the obligation to check for
FK violations in that "invisible" table.  It might be acceptable to
fail entirely, but not to act as though the table isn't there.
        regards, tom lane


Re: Transactional DDL, but not Serializable

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Well, basically, you can't have that.  Example: you have an existing
> table with primary key, and while you're in the middle of doing some
> long transaction, somebody else creates a table with a foreign-key
> reference to the one you're about to do a delete from.  Being
> serializable does not excuse you from the obligation to check for
> FK violations in that "invisible" table.  It might be acceptable to
> fail entirely, but not to act as though the table isn't there.

That's an excellent example and point.  Is there a 'right' answer (with
regard to the SQL spec, what other databases do, etc)?  When you go to
delete a record from the existing table you could get a FK violation due
to the invisible table, which could end up being rolled back and
removed.

It seems like the semantics around this would call for the adding-FK
transaction to be treated as if the table did already exist and then
handle this case as we would if there wasn't any DDL involved.  Another
approach might be to wait till commit to check the FK, but that'd
probably be considered unkind.

If the spec doesn't dictate anything and/or we can't find anyone else's
semantics that make sense, I suppose we'll need to define our own.  To
that end, perhaps we should put up something on a wiki or similar to
start capturing these and considering what the 'right' answer would be.

Apologies for my ignorance on this.
Thanks,
    Stephen

Re: Transactional DDL, but not Serializable

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Well, basically, you can't have that.  Example: you have an existing
>> table with primary key, and while you're in the middle of doing some
>> long transaction, somebody else creates a table with a foreign-key
>> reference to the one you're about to do a delete from.  Being
>> serializable does not excuse you from the obligation to check for
>> FK violations in that "invisible" table.  It might be acceptable to
>> fail entirely, but not to act as though the table isn't there.

> That's an excellent example and point.  Is there a 'right' answer (with
> regard to the SQL spec, what other databases do, etc)?

I'm not aware that anybody's got an amazingly satisfactory solution.
PG's answer is of course to use up-to-the-minute DDL regardless of what
the transaction might see for other purposes, which certainly has got
disadvantages if you're hoping for truly serializable behavior.  But I'm
not sure there's a better answer.  You could make an argument for
failing any serializable transaction that's affected by DDL changes that
happen after it started.  I don't know whether that cure would be worse
than the disease.
        regards, tom lane


Re: Transactional DDL, but not Serializable

From
Darren Duncan
Date:
Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>>> Well, basically, you can't have that.  Example: you have an existing
>>> table with primary key, and while you're in the middle of doing some
>>> long transaction, somebody else creates a table with a foreign-key
>>> reference to the one you're about to do a delete from.  Being
>>> serializable does not excuse you from the obligation to check for
>>> FK violations in that "invisible" table.  It might be acceptable to
>>> fail entirely, but not to act as though the table isn't there.
> 
>> That's an excellent example and point.  Is there a 'right' answer (with
>> regard to the SQL spec, what other databases do, etc)?
> 
> I'm not aware that anybody's got an amazingly satisfactory solution.
> PG's answer is of course to use up-to-the-minute DDL regardless of what
> the transaction might see for other purposes, which certainly has got
> disadvantages if you're hoping for truly serializable behavior.  But I'm
> not sure there's a better answer.  You could make an argument for
> failing any serializable transaction that's affected by DDL changes that
> happen after it started.  I don't know whether that cure would be worse
> than the disease.

If transaction A commits successfully before transaction B commits, regardless 
of when transaction B started, and transaction A changes/adds/etc any 
constraints on the database, then I would expect transaction B to only commit 
successfully if all of its data changes pass those new/changed constraints.

If B were allowed to commit without that being the case, then it would leave the 
database in an inconsistent state, that is a state where its data doesn't 
conform to its constraints.  A database should always be consistent on 
transaction boundaries, at the very least, if not on statement boundaries.

As to whether B's failure happens when it tries to commit or happens earlier, 
based on visibility issues with A's changes, doesn't matter to me so much (do 
what works best for you/others), but it should fail at some point if it would 
otherwise cause inconsistencies.

-- Darren Duncan