Thread: Scope of constraint names
SQL92 requires named constraints to have names that are unique within their schema. Our past implementation did not require constraint names to be unique at all; as a compromise I suggested requiring constraint names to be unique for any given relation. Rod Taylor's pending pg_constraint patch implements that approach, but I'm beginning to have second thoughts about it. One problem I see is that pg_constraint entries can *only* be associated with relations; so the table has no way to represent constraints associated with domains --- not to mention assertions, which aren't associated with any table at all. I'm in no hurry to try to implement assertions, but domain constraints are definitely interesting. We'd probably have to put domain constraints into a separate table, which is possible but not very attractive. At the SQL level, constraint names seem to be used in only two contexts: DROP CONSTRAINT subcommands of ALTER TABLE and ALTER DOMAIN commands, and SET CONSTRAINTS ... IMMEDIATE/DEFERRED. In the DROP context there's no real need to identify constraints globally, since the associated table or domain name is available, but in SET CONSTRAINTS the syntax doesn't include a table name. Our current implementation of SET CONSTRAINTS changes the behavior of all constraints matching the specified name, which is pretty bogus given the lack of uniqueness. If we don't go over to the SQL92 approach then I think we need some other way of handling SET CONSTRAINTS that allows a more exact specification of the target constraint. A considerable advantage of per-relation constraint names is that a new unique name can be assigned for a nameless constraint while holding only a lock on the target relation. We'd need a global lock to create unique constraint names in the SQL92 semantics. The only way I can see around that would be to use newoid(), or perhaps a dedicated sequence generator, to construct constraint names. The resulting unpredictable constraint names would be horribly messy to deal with in the regression tests, so I'm not eager to do this. Even per-relation uniqueness has some unhappiness: if you have a domain with a named constraint, and you try to use this domain for two columns of a relation, you'll get a constraint name conflict. Inheriting similar constraint names from two different parent relations is also troublesome. We could get around these either by going back to the old no-uniqueness approach, or by being willing to alter constraint names to make them unique (eg, by tacking on "_nnn" when needed). But this doesn't help SET CONSTRAINTS. At the moment I don't much like any of the alternatives. Ideas anyone? regards, tom lane
> One problem I see is that pg_constraint entries can *only* be associated > with relations; so the table has no way to represent constraints > associated with domains --- not to mention assertions, which aren't > associated with any table at all. I'm in no hurry to try to implement > assertions, but domain constraints are definitely interesting. We'd > probably have to put domain constraints into a separate table, which > is possible but not very attractive. Hmmm...there must be some sort of schema that can do both in one table? Even something nastly like: refid Oid of relation or domain type 'r' for relation and 'd' for domain ... > Our current implementation of SET CONSTRAINTS changes the behavior of > all constraints matching the specified name, which is pretty bogus > given the lack of uniqueness. If we don't go over to the SQL92 approach > then I think we need some other way of handling SET CONSTRAINTS that > allows a more exact specification of the target constraint. If we do go over to SQL92, what kind of problems will people have reloading their old schema? Should <unnamed> be excluded from the uniqueness check...? > A considerable advantage of per-relation constraint names is that a new > unique name can be assigned for a nameless constraint while holding only > a lock on the target relation. We'd need a global lock to create unique > constraint names in the SQL92 semantics. Surely adding a foreign key is what you'd call a 'rare' event in a database, occurring once once for millions or queries? Hence, we shouldn't worry about it too much? > The only way I can see around > that would be to use newoid(), or perhaps a dedicated sequence > generator, to construct constraint names. The resulting unpredictable > constraint names would be horribly messy to deal with in the regression > tests, so I'm not eager to do this. Surely you do the ol' loop and test sort of thing...? > Even per-relation uniqueness has some unhappiness: if you have a domain > with a named constraint, and you try to use this domain for two columns > of a relation, you'll get a constraint name conflict. Inheriting > similar constraint names from two different parent relations is also > troublesome. We could get around these either by going back to the > old no-uniqueness approach, or by being willing to alter constraint > names to make them unique (eg, by tacking on "_nnn" when needed). > But this doesn't help SET CONSTRAINTS. > > At the moment I don't much like any of the alternatives. Ideas anyone? If they're both equally evil, then maybe we should consider going the SQL92 way, for compatibilities sake? Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: >> A considerable advantage of per-relation constraint names is that a new >> unique name can be assigned for a nameless constraint while holding only >> a lock on the target relation. We'd need a global lock to create unique >> constraint names in the SQL92 semantics. > Surely adding a foreign key is what you'd call a 'rare' event in a database, > occurring once once for millions or queries? Hence, we shouldn't worry > about it too much? I don't buy that argument even for foreign keys --- and remember that pg_constraint will also hold entries for CHECK, UNIQUE, and PRIMARY KEY constraints. I don't want to have to take a global lock whenever we create an index. >> The only way I can see around >> that would be to use newoid(), or perhaps a dedicated sequence >> generator, to construct constraint names. The resulting unpredictable >> constraint names would be horribly messy to deal with in the regression >> tests, so I'm not eager to do this. > Surely you do the ol' loop and test sort of thing...? How is a static 'expected' file going to do loop-and-test? One possible answer to that is to report all unnamed constraints as "<unnamed>" in error messages, even though they'd have distinct names internally. I don't much care for that approach though, since it might make it hard for users to figure out which internal name to mention in DROP CONSTRAINT. But it'd keep the expected regression output stable. > If they're both equally evil, then maybe we should consider going the SQL92 > way, for compatibilities sake? If the spec didn't seem so brain-damaged on this point, I'd be more eager to follow it. I can't see any advantage in the way they chose to do it. But yeah, I'd lean to following the spec, if we can think of a way around the locking and regression testing issues it creates. regards, tom lane
> I don't buy that argument even for foreign keys --- and remember that > pg_constraint will also hold entries for CHECK, UNIQUE, and PRIMARY KEY > constraints. I don't want to have to take a global lock whenever we > create an index. I don't understand why a global lock is necessary -- and not simply a lock on the pg_constraint table and the relations the constraint is applied to (foreign key locks two, all others one).
Rod Taylor <rbt@zort.ca> writes: >> I don't want to have to take a global lock whenever we >> create an index. > I don't understand why a global lock is necessary -- To be sure we are creating a unique constraint name. > and not simply a lock on the pg_constraint table In this context, a lock on pg_constraint *is* global, because it will mean that no one else can be creating an index on some other table. They'd need to hold that same lock to ensure that *their* chosen constraint name is unique. regards, tom lane
> > and not simply a lock on the pg_constraint table > > In this context, a lock on pg_constraint *is* global, because it will > mean that no one else can be creating an index on some other table. > They'd need to hold that same lock to ensure that *their* chosen > constraint name is unique. So I am understanding correctly. I think it would be a rare event to have more than one person changing the database structure at the same time. Anyway, the index example is a bad example isn't it? It already takes an lock on pg_class which is just as global. Check constraints and foreign key constraints are two that I can see affected in the manner described. Anyway, my current implementation has constraint names unique to the relation only -- not the namespace, although my locking may be excessive in that area.
Rod Taylor <rbt@zort.ca> writes: > I think it would be a rare event to have more than one person changing > the database structure at the same time. I don't buy this assumption --- consider for example two clients creating temp tables. > Anyway, the index example is a > bad example isn't it? It already takes an lock on pg_class which is > just as global. Au contraire; there is no exclusive lock needed at present. regards, tom lane
> > Anyway, the index example is a > > bad example isn't it? It already takes an lock on pg_class which is > > just as global. > > Au contraire; there is no exclusive lock needed at present. Oh.. I thought pg_class was locked for any relation creation. If thats not the case, then I wouldn't want constraints to impose additional limitations. Misunderstanding on my part.
Rod Taylor <rbt@zort.ca> writes: > bad example isn't it? It already takes an lock on pg_class which is > just as global. >> >> Au contraire; there is no exclusive lock needed at present. > Oh.. I thought pg_class was locked for any relation creation. We only take a standard writer's lock (RowExclusiveLock) on system catalogs that we modify. This is mainly to avoid problems from a concurrent VACUUM FULL moving tuples around --- it doesn't prevent other processes from inserting/updating/deleting other rows in the same catalog. Most of the interesting locking done for DDL operations is done on the relation being defined or modified, not on the system catalogs. So we have full concurrency in terms of being able to do DDL operations on different relations at the same time. That's what I don't want to give up. If we want to go over to SQL-spec-compatible naming of constraints (ie, unique within schemas), I think the only way to cope is to generate unique names for nameless constraints using the OID counter --- so they'd really be, say, "$8458321" --- but when reporting constraint violation errors, substitute "<unnamed>" or some other fixed string in the error report for any constraint having a name of this form. Doing the latter would keep the regression test expected outputs stable. Using the OID counter (or a sequence generator) would avoid the locking problem. Another thing we need to think about in any case is coping with constraints that are inherited from a parent table or from a domain that's used as one or more columns' datatype. Whether we think constraints should have per-schema or per-relation names, we lose anyway if we make multiple copies of such a constraint. Perhaps we need to distinguish "original" constraints (which could reasonably be expected to have schema-wide-unique names) from "derived" constraints, which are the check expressions we actually attach to individual columns. I think we'd want to store the derived versions explicitly (eg, with column numbers in Vars adjusted to match the column that's supposed to be tested in a given table), but they'd not be expected to have unique names. This leads to the idea that a pg_constraint entry needs two name fields: its real name, which is unique but might just be an autogenerated "$nnn", and its logical name that we actually report in constraint violation messages. The logical name would be inherited from the "original" constraint when building a "derived" constraint. We'd also set up pg_depend dependencies to link the derived constraints to their original. This would be needed to make ALTER DOMAIN DROP CONSTRAINT work. Thoughts anyone? regards, tom lane
Tom Lane writes: > A considerable advantage of per-relation constraint names is that a new > unique name can be assigned for a nameless constraint while holding only > a lock on the target relation. We'd need a global lock to create unique > constraint names in the SQL92 semantics. Presumably, the field pg_class.relchecks already keeps a count of the number of constraints, so it should be possible to assign numbers easily. > The only way I can see around that would be to use newoid(), or perhaps > a dedicated sequence generator, to construct constraint names. The > resulting unpredictable constraint names would be horribly messy to deal > with in the regression tests, so I'm not eager to do this. Or we simply assign constraint names explicitly in the regression tests. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> A considerable advantage of per-relation constraint names is that a new >> unique name can be assigned for a nameless constraint while holding only >> a lock on the target relation. We'd need a global lock to create unique >> constraint names in the SQL92 semantics. > Presumably, the field pg_class.relchecks already keeps a count of the > number of constraints, so it should be possible to assign numbers easily. But pg_class.relchecks is per-relation --- how does it help you assign a globally unique number? After much thought I am coming around to the conclusion that we should name constraints within-schemas (ie, there will be a schema OID column in pg_constraint), but *not require these names to be unique*. DROP CONSTRAINT, SET CONSTRAINTS, etc will act on all constraints matching the target name, as they do now. This will create the minimum risk of breaking existing database schemas, while still allowing us to move some of the way towards SQL compliance --- in particular, SET CONSTRAINTS with a schema-qualified constraint name would work as the spec expects. We would still take care to generate unique-within-a-relation names for nameless constraints, using the same code that exists now, but we'd not enforce this by means of a unique index on pg_constraint. A compromise between that and exact SQL semantics would be to enforce uniqueness of conname + connamespace + conrelid + contypid (the last being a column that links to pg_type for domain constraints; conrelid and contypid are each zero if not relevant). This would have the effect of making relation constraint names unique per-relation, and domain constraint names separately unique per-domain, and also allowing global assertion names that are unique per-schema as in SQL92. This seems a little baroque to me, but maybe it will appeal to others. Comments? regards, tom lane