Thread: Scope of constraint names

Scope of constraint names

From
Tom Lane
Date:
SQL92 requires named constraints to have names that are unique within
their schema.  Our past implementation did not require constraint names
to be unique at all; as a compromise I suggested requiring constraint
names to be unique for any given relation.  Rod Taylor's pending
pg_constraint patch implements that approach, but I'm beginning to have
second thoughts about it.

One problem I see is that pg_constraint entries can *only* be associated
with relations; so the table has no way to represent constraints
associated with domains --- not to mention assertions, which aren't
associated with any table at all.  I'm in no hurry to try to implement
assertions, but domain constraints are definitely interesting.  We'd
probably have to put domain constraints into a separate table, which
is possible but not very attractive.

At the SQL level, constraint names seem to be used in only two
contexts: DROP CONSTRAINT subcommands of ALTER TABLE and ALTER DOMAIN
commands, and SET CONSTRAINTS ... IMMEDIATE/DEFERRED.  In the DROP
context there's no real need to identify constraints globally, since
the associated table or domain name is available, but in SET CONSTRAINTS
the syntax doesn't include a table name.

Our current implementation of SET CONSTRAINTS changes the behavior of
all constraints matching the specified name, which is pretty bogus
given the lack of uniqueness.  If we don't go over to the SQL92 approach
then I think we need some other way of handling SET CONSTRAINTS that
allows a more exact specification of the target constraint.

A considerable advantage of per-relation constraint names is that a new
unique name can be assigned for a nameless constraint while holding only
a lock on the target relation.  We'd need a global lock to create unique
constraint names in the SQL92 semantics.  The only way I can see around
that would be to use newoid(), or perhaps a dedicated sequence
generator, to construct constraint names.  The resulting unpredictable
constraint names would be horribly messy to deal with in the regression
tests, so I'm not eager to do this.

Even per-relation uniqueness has some unhappiness: if you have a domain
with a named constraint, and you try to use this domain for two columns
of a relation, you'll get a constraint name conflict.  Inheriting
similar constraint names from two different parent relations is also
troublesome.  We could get around these either by going back to the
old no-uniqueness approach, or by being willing to alter constraint
names to make them unique (eg, by tacking on "_nnn" when needed).
But this doesn't help SET CONSTRAINTS.

At the moment I don't much like any of the alternatives.  Ideas anyone?
        regards, tom lane




Re: Scope of constraint names

From
"Christopher Kings-Lynne"
Date:
> One problem I see is that pg_constraint entries can *only* be associated
> with relations; so the table has no way to represent constraints
> associated with domains --- not to mention assertions, which aren't
> associated with any table at all.  I'm in no hurry to try to implement
> assertions, but domain constraints are definitely interesting.  We'd
> probably have to put domain constraints into a separate table, which
> is possible but not very attractive.

Hmmm...there must be some sort of schema that can do both in one table?
Even something nastly like:

refid Oid of relation or domain
type 'r' for relation and 'd' for domain
...

> Our current implementation of SET CONSTRAINTS changes the behavior of
> all constraints matching the specified name, which is pretty bogus
> given the lack of uniqueness.  If we don't go over to the SQL92 approach
> then I think we need some other way of handling SET CONSTRAINTS that
> allows a more exact specification of the target constraint.

If we do go over to SQL92, what kind of problems will people have reloading
their old schema?  Should <unnamed> be excluded from the uniqueness
check...?

> A considerable advantage of per-relation constraint names is that a new
> unique name can be assigned for a nameless constraint while holding only
> a lock on the target relation.  We'd need a global lock to create unique
> constraint names in the SQL92 semantics.

Surely adding a foreign key is what you'd call a 'rare' event in a database,
occurring once once for millions or queries?  Hence, we shouldn't worry
about it too much?

> The only way I can see around
> that would be to use newoid(), or perhaps a dedicated sequence
> generator, to construct constraint names.  The resulting unpredictable
> constraint names would be horribly messy to deal with in the regression
> tests, so I'm not eager to do this.

Surely you do the ol' loop and test sort of thing...?

> Even per-relation uniqueness has some unhappiness: if you have a domain
> with a named constraint, and you try to use this domain for two columns
> of a relation, you'll get a constraint name conflict.  Inheriting
> similar constraint names from two different parent relations is also
> troublesome.  We could get around these either by going back to the
> old no-uniqueness approach, or by being willing to alter constraint
> names to make them unique (eg, by tacking on "_nnn" when needed).
> But this doesn't help SET CONSTRAINTS.
>
> At the moment I don't much like any of the alternatives.  Ideas anyone?

If they're both equally evil, then maybe we should consider going the SQL92
way, for compatibilities sake?

Chris





Re: Scope of constraint names

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> A considerable advantage of per-relation constraint names is that a new
>> unique name can be assigned for a nameless constraint while holding only
>> a lock on the target relation.  We'd need a global lock to create unique
>> constraint names in the SQL92 semantics.

> Surely adding a foreign key is what you'd call a 'rare' event in a database,
> occurring once once for millions or queries?  Hence, we shouldn't worry
> about it too much?

I don't buy that argument even for foreign keys --- and remember that
pg_constraint will also hold entries for CHECK, UNIQUE, and PRIMARY KEY
constraints.  I don't want to have to take a global lock whenever we
create an index.

>> The only way I can see around
>> that would be to use newoid(), or perhaps a dedicated sequence
>> generator, to construct constraint names.  The resulting unpredictable
>> constraint names would be horribly messy to deal with in the regression
>> tests, so I'm not eager to do this.

> Surely you do the ol' loop and test sort of thing...?

How is a static 'expected' file going to do loop-and-test?

One possible answer to that is to report all unnamed constraints as
"<unnamed>" in error messages, even though they'd have distinct names
internally.  I don't much care for that approach though, since it might
make it hard for users to figure out which internal name to mention in
DROP CONSTRAINT.  But it'd keep the expected regression output stable.

> If they're both equally evil, then maybe we should consider going the SQL92
> way, for compatibilities sake?

If the spec didn't seem so brain-damaged on this point, I'd be more
eager to follow it.  I can't see any advantage in the way they chose
to do it.  But yeah, I'd lean to following the spec, if we can think
of a way around the locking and regression testing issues it creates.
        regards, tom lane




Re: Scope of constraint names

From
Rod Taylor
Date:
> I don't buy that argument even for foreign keys --- and remember that
> pg_constraint will also hold entries for CHECK, UNIQUE, and PRIMARY KEY
> constraints.  I don't want to have to take a global lock whenever we
> create an index.

I don't understand why a global lock is necessary -- and not simply a
lock on the pg_constraint table and the relations the constraint is
applied to (foreign key locks two, all others one).





Re: Scope of constraint names

From
Tom Lane
Date:
Rod Taylor <rbt@zort.ca> writes:
>> I don't want to have to take a global lock whenever we
>> create an index.

> I don't understand why a global lock is necessary --

To be sure we are creating a unique constraint name.

> and not simply a lock on the pg_constraint table

In this context, a lock on pg_constraint *is* global, because it will
mean that no one else can be creating an index on some other table.
They'd need to hold that same lock to ensure that *their* chosen
constraint name is unique.
        regards, tom lane




Re: Scope of constraint names

From
Rod Taylor
Date:
> > and not simply a lock on the pg_constraint table
> 
> In this context, a lock on pg_constraint *is* global, because it will
> mean that no one else can be creating an index on some other table.
> They'd need to hold that same lock to ensure that *their* chosen
> constraint name is unique.

So I am understanding correctly.

I think it would be a rare event to have more than one person changing
the database structure at the same time.  Anyway, the index example is a
bad example isn't it?  It already takes an lock on pg_class which is
just as global.

Check constraints and foreign key constraints are two that I can see
affected in the manner described.


Anyway, my current implementation has constraint names unique to the
relation only -- not the namespace, although my locking may be excessive
in that area.





Re: Scope of constraint names

From
Tom Lane
Date:
Rod Taylor <rbt@zort.ca> writes:
> I think it would be a rare event to have more than one person changing
> the database structure at the same time.

I don't buy this assumption --- consider for example two clients
creating temp tables.

> Anyway, the index example is a
> bad example isn't it?  It already takes an lock on pg_class which is
> just as global.

Au contraire; there is no exclusive lock needed at present.
        regards, tom lane




Re: Scope of constraint names

From
Rod Taylor
Date:
> > Anyway, the index example is a
> > bad example isn't it?  It already takes an lock on pg_class which is
> > just as global.
> 
> Au contraire; there is no exclusive lock needed at present.

Oh..  I thought pg_class was locked for any relation creation.  If thats
not the case, then I wouldn't want constraints to impose additional
limitations.

Misunderstanding on my part.





Re: Scope of constraint names

From
Tom Lane
Date:
Rod Taylor <rbt@zort.ca> writes:
> bad example isn't it?  It already takes an lock on pg_class which is
> just as global.
>> 
>> Au contraire; there is no exclusive lock needed at present.

> Oh..  I thought pg_class was locked for any relation creation.

We only take a standard writer's lock (RowExclusiveLock) on system
catalogs that we modify.  This is mainly to avoid problems from
a concurrent VACUUM FULL moving tuples around --- it doesn't prevent
other processes from inserting/updating/deleting other rows in the
same catalog.

Most of the interesting locking done for DDL operations is done on
the relation being defined or modified, not on the system catalogs.
So we have full concurrency in terms of being able to do DDL operations
on different relations at the same time.  That's what I don't want to
give up.


If we want to go over to SQL-spec-compatible naming of constraints
(ie, unique within schemas), I think the only way to cope is to generate
unique names for nameless constraints using the OID counter --- so
they'd really be, say, "$8458321" --- but when reporting constraint
violation errors, substitute "<unnamed>" or some other fixed string
in the error report for any constraint having a name of this form.
Doing the latter would keep the regression test expected outputs stable.
Using the OID counter (or a sequence generator) would avoid the locking
problem.

Another thing we need to think about in any case is coping with
constraints that are inherited from a parent table or from a domain
that's used as one or more columns' datatype.  Whether we think
constraints should have per-schema or per-relation names, we lose
anyway if we make multiple copies of such a constraint.

Perhaps we need to distinguish "original" constraints (which could
reasonably be expected to have schema-wide-unique names) from "derived"
constraints, which are the check expressions we actually attach to
individual columns.  I think we'd want to store the derived versions
explicitly (eg, with column numbers in Vars adjusted to match the
column that's supposed to be tested in a given table), but they'd not
be expected to have unique names.  This leads to the idea that a
pg_constraint entry needs two name fields: its real name, which is
unique but might just be an autogenerated "$nnn", and its logical name
that we actually report in constraint violation messages.  The logical
name would be inherited from the "original" constraint when building
a "derived" constraint.  We'd also set up pg_depend dependencies to
link the derived constraints to their original.  This would be needed
to make ALTER DOMAIN DROP CONSTRAINT work.

Thoughts anyone?
        regards, tom lane




Re: Scope of constraint names

From
Peter Eisentraut
Date:
Tom Lane writes:

> A considerable advantage of per-relation constraint names is that a new
> unique name can be assigned for a nameless constraint while holding only
> a lock on the target relation.  We'd need a global lock to create unique
> constraint names in the SQL92 semantics.

Presumably, the field pg_class.relchecks already keeps a count of the
number of constraints, so it should be possible to assign numbers easily.

> The only way I can see around that would be to use newoid(), or perhaps
> a dedicated sequence generator, to construct constraint names.  The
> resulting unpredictable constraint names would be horribly messy to deal
> with in the regression tests, so I'm not eager to do this.

Or we simply assign constraint names explicitly in the regression tests.

-- 
Peter Eisentraut   peter_e@gmx.net





Re: Scope of constraint names

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> A considerable advantage of per-relation constraint names is that a new
>> unique name can be assigned for a nameless constraint while holding only
>> a lock on the target relation.  We'd need a global lock to create unique
>> constraint names in the SQL92 semantics.

> Presumably, the field pg_class.relchecks already keeps a count of the
> number of constraints, so it should be possible to assign numbers easily.

But pg_class.relchecks is per-relation --- how does it help you assign a
globally unique number?


After much thought I am coming around to the conclusion that we should
name constraints within-schemas (ie, there will be a schema OID column
in pg_constraint), but *not require these names to be unique*.  DROP
CONSTRAINT, SET CONSTRAINTS, etc will act on all constraints matching
the target name, as they do now.  This will create the minimum risk of
breaking existing database schemas, while still allowing us to move
some of the way towards SQL compliance --- in particular, SET
CONSTRAINTS with a schema-qualified constraint name would work as the
spec expects.

We would still take care to generate unique-within-a-relation names for
nameless constraints, using the same code that exists now, but we'd not
enforce this by means of a unique index on pg_constraint.

A compromise between that and exact SQL semantics would be to enforce
uniqueness of conname + connamespace + conrelid + contypid (the last
being a column that links to pg_type for domain constraints; conrelid
and contypid are each zero if not relevant).  This would have the effect
of making relation constraint names unique per-relation, and domain
constraint names separately unique per-domain, and also allowing global
assertion names that are unique per-schema as in SQL92.  This seems a
little baroque to me, but maybe it will appeal to others.

Comments?
        regards, tom lane