Thread: Transactional DDL, but not Serializable
Greetings, We have a curious situation, consider this: Process 1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;CRETE TABLE table1 (i integer);INSERT INTO table1 VALUES (13); Process 2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;CREATE TABLE table2 (i integer);INSERT INTO table2 VALUES (123);COMMIT; Process 1: SELECT * FROM pg_class WHERE relname = 'table2'; -- no rows returnedSELECT * FROM table2; -- works?! but atleast no records returnedINSERT INTO table2 VALUES (456); -- also works.. now we have a tuple in the table which appearsto -- have been added before the table existed..COMMIT; This happens, of course, because we use SysCache to look up table names to Oids and that uses SnapshotNow. In my view,this violates the basic principle of least suprise and means that while we have transaction DDL, but it's not reallyserializable (no, I don't particularly care about that). What I do worry about is that the bulk load discussion going on could be shot down because of this. We won't let the earliertransaction see any records in the table today because those tuples have an xmin later, but if we were to insert thosetuples with the frozen XID (as I proposed in the other thread) then they'd be visible. I don't believe fixing this would be terribly difficult and, I believe, would be similar to what we've done else where (eg:with indexes)- basically, add a column to pg_class with the 'createdxmin' and then compare that against our transactionwhenever we're doing table lookups. Thoughts? Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > I don't believe fixing this would be terribly difficult and, I > believe, would be similar to what we've done else where (eg: with > indexes)- basically, add a column to pg_class with the 'createdxmin' > and then compare that against our transaction whenever we're doing > table lookups. Making DDL serializable is *not* simple, and half-baked hacks won't make that situation better ... regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Making DDL serializable is *not* simple, and half-baked hacks won't > make that situation better ... Sorry, that obviously didn't come across clearly (I think I've just been talking to Kevin far too much). I'm not interested in making them serializable. I'd like to not have tables randomly appear during a serializable transaction. Thanks, Stephen
> Making DDL serializable is *not* simple, and half-baked hacks won't > make that situation better ... That seemed unnecessary. Whether or not you approve of Stephen's solution, he is dealing with a real issue. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco
* Joshua Berkus (josh@agliodbs.com) wrote: > That seemed unnecessary. Whether or not you approve of Stephen's solution, he is dealing with a real issue. The solution felt, to me at least, to have a lot of parallel to an index's indcheckxmin. We've dealt with this issue there and have a precedent for how to deal with it. Based on discussions with other folks it sounds like we may be forced to do it for constraints also, and I think we'd want to try to deal with all of them in a similar way. Perhaps the current solution for indexes is a hack and should be tossed out with a wholesale replacment which solves all these problems, which would certainly be quite a bit of work, but if that's necessary then let's discuss it and get an idea down on a wiki somewhere about what that should look like. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > Sorry, that obviously didn't come across clearly (I think I've just been > talking to Kevin far too much). > I'm not interested in making them serializable. I'd like to not have > tables randomly appear during a serializable transaction. Well, basically, you can't have that. Example: you have an existing table with primary key, and while you're in the middle of doing some long transaction, somebody else creates a table with a foreign-key reference to the one you're about to do a delete from. Being serializable does not excuse you from the obligation to check for FK violations in that "invisible" table. It might be acceptable to fail entirely, but not to act as though the table isn't there. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Well, basically, you can't have that. Example: you have an existing > table with primary key, and while you're in the middle of doing some > long transaction, somebody else creates a table with a foreign-key > reference to the one you're about to do a delete from. Being > serializable does not excuse you from the obligation to check for > FK violations in that "invisible" table. It might be acceptable to > fail entirely, but not to act as though the table isn't there. That's an excellent example and point. Is there a 'right' answer (with regard to the SQL spec, what other databases do, etc)? When you go to delete a record from the existing table you could get a FK violation due to the invisible table, which could end up being rolled back and removed. It seems like the semantics around this would call for the adding-FK transaction to be treated as if the table did already exist and then handle this case as we would if there wasn't any DDL involved. Another approach might be to wait till commit to check the FK, but that'd probably be considered unkind. If the spec doesn't dictate anything and/or we can't find anyone else's semantics that make sense, I suppose we'll need to define our own. To that end, perhaps we should put up something on a wiki or similar to start capturing these and considering what the 'right' answer would be. Apologies for my ignorance on this. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Well, basically, you can't have that. Example: you have an existing >> table with primary key, and while you're in the middle of doing some >> long transaction, somebody else creates a table with a foreign-key >> reference to the one you're about to do a delete from. Being >> serializable does not excuse you from the obligation to check for >> FK violations in that "invisible" table. It might be acceptable to >> fail entirely, but not to act as though the table isn't there. > That's an excellent example and point. Is there a 'right' answer (with > regard to the SQL spec, what other databases do, etc)? I'm not aware that anybody's got an amazingly satisfactory solution. PG's answer is of course to use up-to-the-minute DDL regardless of what the transaction might see for other purposes, which certainly has got disadvantages if you're hoping for truly serializable behavior. But I'm not sure there's a better answer. You could make an argument for failing any serializable transaction that's affected by DDL changes that happen after it started. I don't know whether that cure would be worse than the disease. regards, tom lane
Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> Well, basically, you can't have that. Example: you have an existing >>> table with primary key, and while you're in the middle of doing some >>> long transaction, somebody else creates a table with a foreign-key >>> reference to the one you're about to do a delete from. Being >>> serializable does not excuse you from the obligation to check for >>> FK violations in that "invisible" table. It might be acceptable to >>> fail entirely, but not to act as though the table isn't there. > >> That's an excellent example and point. Is there a 'right' answer (with >> regard to the SQL spec, what other databases do, etc)? > > I'm not aware that anybody's got an amazingly satisfactory solution. > PG's answer is of course to use up-to-the-minute DDL regardless of what > the transaction might see for other purposes, which certainly has got > disadvantages if you're hoping for truly serializable behavior. But I'm > not sure there's a better answer. You could make an argument for > failing any serializable transaction that's affected by DDL changes that > happen after it started. I don't know whether that cure would be worse > than the disease. If transaction A commits successfully before transaction B commits, regardless of when transaction B started, and transaction A changes/adds/etc any constraints on the database, then I would expect transaction B to only commit successfully if all of its data changes pass those new/changed constraints. If B were allowed to commit without that being the case, then it would leave the database in an inconsistent state, that is a state where its data doesn't conform to its constraints. A database should always be consistent on transaction boundaries, at the very least, if not on statement boundaries. As to whether B's failure happens when it tries to commit or happens earlier, based on visibility issues with A's changes, doesn't matter to me so much (do what works best for you/others), but it should fail at some point if it would otherwise cause inconsistencies. -- Darren Duncan