Thread: Multicolumn foreign keys need useless unique indices?
There is a misfeature in 7.2.2 that appears when I have a foreign key that references two columns of a table. Consider following simplified example: CREATE TABLE a ( a int PRIMARY KEY, b int ); CREATE TABLE b ( aref int, bref int, FOREIGN KEY (aref, bref) REFERENCES a(a, b) MATCH FULL ON DELETE CASCADEON UPDATE CASCADE ); I get an error "UNIQUE constraint matching given keys for referenced table "a" not found." because I have unique constraint only on the first field (which is still enough to make the whole combination unique. (b is not even unique))... So I need to add an useless(?) UNIQUE constraint to "(a, b)" for table "a" just to allow creation of multicol FOREIGN KEYs for table "b". And I get NOTICE: CREATE TABLE / UNIQUE will create implicit index 'a_a_key' for table. AFAIK, the extra index only slows down my inserts - it basically contains no usable information... shouldn't the presence of _primary_key_ in multicol foreign key be enough to decide whether the whole key is unique or not? And shouldn't it be enough to find out the tuple in table 'a' corresponding newly inserted tuple in b? Or should I just write my own triggers for checking the integrity of "b"/"bref" column pair to avoid needless index creation? -- Antti Haapala
> AFAIK, the extra index only slows down my inserts - it basically contains > no usable information... Not 100% true. It will speed up cascade delete and update... > shouldn't the presence of _primary_key_ in > multicol foreign key be enough to decide whether the whole key is unique > or not? Hmmm - thinking about it, I don't see why postgres would need the entire thing to be unique...can't think of a reason at the moment. Stephen? Chris
> > AFAIK, the extra index only slows down my inserts - it basically contains > > no usable information... > > Not 100% true. It will speed up cascade delete and update... To clarify things: CREATE TABLE original ( a int PRIMARY KEY, b int ); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'original_pkey' for table 'original' CREATE CREATE TABLE referencer ( aref int, bref int, FOREIGN KEY (aref, bref) REFERENCES original(a, b) MATCH FULL ONDELETE CASCADE ON UPDATE CASCADE ); NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) ERROR: UNIQUE constraint matching given keys for referenced table "original" not found CREATE TABLE original ( a int PRIMARY KEY, b int, UNIQUE (a,b) ); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'original_pkey' for table 'original' NOTICE: CREATE TABLE / UNIQUE will create implicit index 'original_a_key' for table 'original' CREATE CREATE TABLE referencer ( aref int, bref int, FOREIGN KEY (aref, bref) REFERENCES original(a, b) MATCH FULL ONDELETE CASCADE ON UPDATE CASCADE ); NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE ilmo=# \d original Table "original"Column | Type | Modifiers --------+---------+-----------a | integer | not nullb | integer | Primary key: a_pkey Unique keys: a_a_key Triggers: RI_ConstraintTrigger_41250, RI_ConstraintTrigger_41252 ilmo=# \d referencer Table "referencer"Column | Type | Modifiers --------+---------+-----------aref | integer |bref | integer | Triggers: RI_ConstraintTrigger_41248 Actually nothing changes. The unique constraint doesn't add anything new - it allows NULLs in column b and requires that combination (a, b) is unique... and it definitely is because column 'a' is unique (primary key). It just creates a multicol index and adds an useless extra constraint check, while almost the same data is available in index "original_a_pkey". -- Antti Haapala
On Fri, 2002-09-13 at 04:27, Christopher Kings-Lynne wrote: > > AFAIK, the extra index only slows down my inserts - it basically contains > > no usable information... > > Not 100% true. It will speed up cascade delete and update... > > > shouldn't the presence of _primary_key_ in > > multicol foreign key be enough to decide whether the whole key is unique > > or not? > > Hmmm - thinking about it, I don't see why postgres would need the entire > thing to be unique...can't think of a reason at the moment. Stephen? If it's not all unique, you cannot be guaranteed there is a single row with those values in the referenced table. -- Rod Taylor
Rod Taylor <rbt@rbt.ca> writes: > On Fri, 2002-09-13 at 04:27, Christopher Kings-Lynne wrote: >> Hmmm - thinking about it, I don't see why postgres would need the entire >> thing to be unique...can't think of a reason at the moment. Stephen? > If it's not all unique, you cannot be guaranteed there is a single row > with those values in the referenced table. Right. The single-column unique constraint guarantees at most one match, but it isn't helpful for checking if there's at least one match. The spec obviously intends that the index supporting the unique constraint be useful for verifying the existence of a match. I read this in SQL92: a) If the <referenced table and columns> specifies a <reference column list>, then the set of columnnames of that <refer- ence column list> shall be equal to the set of column names in the uniquecolumns of a unique constraint of the refer- enced table. It says "equal to", not "superset of". So we are behaving per spec. regards, tom lane
> Rod Taylor <rbt@rbt.ca> writes: > > On Fri, 2002-09-13 at 04:27, Christopher Kings-Lynne wrote: > >> Hmmm - thinking about it, I don't see why postgres would need the entire > >> thing to be unique...can't think of a reason at the moment. Stephen? > > > If it's not all unique, you cannot be guaranteed there is a single row > > with those values in the referenced table. > > Right. The single-column unique constraint guarantees at most one > match, but it isn't helpful for checking if there's at least one match. > The spec obviously intends that the index supporting the unique > constraint be useful for verifying the existence of a match. > > I read this in SQL92: > > a) If the <referenced table and columns> specifies a <reference > column list>, then the set of column names of that <refer- > ence column list> shall be equal to the set of column names > in the unique columns of a unique constraint of the refer- > enced table. > > It says "equal to", not "superset of". So we are behaving per spec. That's what I used when doing it. It possibly is a stronger than necessary statement but I assumed at the time they had some reason for wanting to define it that way.
On Fri, 13 Sep 2002, Antti Haapala wrote: > > > AFAIK, the extra index only slows down my inserts - it basically contains > > > no usable information... > > > > Not 100% true. It will speed up cascade delete and update... > > To clarify things: > > CREATE TABLE original ( > a int PRIMARY KEY, > b int > ); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > 'original_pkey' for table 'original' > CREATE > > CREATE TABLE referencer ( > aref int, > bref int, > FOREIGN KEY (aref, bref) REFERENCES original(a, b) > MATCH FULL ON DELETE CASCADE ON UPDATE CASCADE > ); > NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN > KEY check(s) > ERROR: UNIQUE constraint matching given keys for referenced table > "original" not found SQL 92 would want you to normalize and remove bref from referencer since it's redundant. You're storing a reference to a table and some of the dependent values to that reference in another table. That's probably the best workaround, although I assume your real case is more complicated.
On Fri, 2002-09-13 at 16:00, Tom Lane wrote: > Rod Taylor <rbt@rbt.ca> writes: > > On Fri, 2002-09-13 at 04:27, Christopher Kings-Lynne wrote: > >> Hmmm - thinking about it, I don't see why postgres would need the entire > >> thing to be unique...can't think of a reason at the moment. Stephen? > > > If it's not all unique, you cannot be guaranteed there is a single row > > with those values in the referenced table. > > Right. The single-column unique constraint guarantees at most one > match, but it isn't helpful for checking if there's at least one match. Due to postgres's implementation we can't do the 'at least' part using only index anyway - we must check the actual table. > The spec obviously intends that the index supporting the unique > constraint be useful for verifying the existence of a match. Does the spec say _anything_ about implementing unique contraint using an unique index ? > I read this in SQL92: > > a) If the <referenced table and columns> specifies a <reference > column list>, then the set of column names of that <refer- > ence column list> shall be equal to the set of column names > in the unique columns of a unique constraint of the refer- > enced table. > > It says "equal to", not "superset of". So we are behaving per spec. But we are doing it in a suboptimal way. If we have unique index on t.i and we define additional unique constraint on (t.i, t.j), then we don't need the extra unique index to be created - the index on t.i is enough to quarantee the uniqueness of (t.i,t.j) or any set of columns that includes t.i. --------------- Hannu PS. IMHO our unique is still broken as shown by the following: hannu=# create table t(i int unique); NOTICE: CREATE TABLE / UNIQUE will create implicit index 't_i_key' for table 't' CREATE TABLE hannu=# insert into t values(1); INSERT 41555 1 hannu=# insert into t values(2); INSERT 41556 1 hannu=# update t set i=i-1; UPDATE 2 hannu=# update t set i=i+1; ERROR: Cannot insert a duplicate key into unique index t_i_key hannu=# DB2 has no problems doing it: db2 => create table t(i int not null unique) DB20000I The SQL command completed successfully. db2 => insert into t values(1) DB20000I The SQL command completed successfully. db2 => insert into t values(2) DB20000I The SQL command completed successfully. db2 => update t set i=i+1 DB20000I The SQL command completed successfully. db2 => update t set i=i-1 DB20000I The SQL command completed successfully. neither has Oracle SQL> create table t(i int not null unique); Table created. SQL> insert into t values(1); 1 row created. SQL> insert into t values(2); 1 row created. SQL> update t set i=i+1; 2 rows updated. SQL> update t set i=i-1; 2 rows updated. SQL> ---------------- Hannu
> hannu=# update t set i=i+1; > ERROR: Cannot insert a duplicate key into unique index t_i_key A possibility may be to reverse the sequential scan order for the simple cases, but anything any more complex and the check should be deferred till end of statement, rather than checking immediately. -- Rod Taylor
On Fri, 2002-09-13 at 17:42, Rod Taylor wrote: > > > hannu=# update t set i=i+1; > > ERROR: Cannot insert a duplicate key into unique index t_i_key > > A possibility may be to reverse the sequential scan order for the simple > cases, but anything any more complex and the check should be deferred > till end of statement, rather than checking immediately. Or we could keep a 'conflict list' that would be dynamically added to and deleted from during the statement and the statement would be aborted if 1) there were any entries in the list at the end of statement or 2) if the list overflowed at some predefined limit (say 1000 or 100.000 conflicts) during the statement. in our simple case we would have at most 1 conflict in the list at any time. -------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: > If we have unique index on t.i and we define additional unique > constraint on (t.i, t.j), then we don't need the extra unique index to > be created - the index on t.i is enough to quarantee the uniqueness of > (t.i,t.j) or any set of columns that includes t.i. You missed the point: we are concerned about existence of a row, not only uniqueness. regards, tom lane
On Sat, 2002-09-14 at 20:14, Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > If we have unique index on t.i and we define additional unique > > constraint on (t.i, t.j), then we don't need the extra unique index to > > be created - the index on t.i is enough to quarantee the uniqueness of > > (t.i,t.j) or any set of columns that includes t.i. > > You missed the point: we are concerned about existence of a row, not only > uniqueness. Maybe I'm missing something, but I'll reiterate my two points 1) to check for existance of a referenced tuple for a foreigh key we have to: * lookup the row in index and * check if the row is live in the relation so the index will help us equally for both cases, as it will point to N entries of which only one can be alive at a time and which all have to be checked. It will be only marginally more work to check if the only live entry does match the non-index columns. And I think that my other point holds as well - there is no need for extra unique index on (redundant) unique constraint that is put over a superset of columns covered by _another_ unique constraint. There will probably be additional work if we want to drop the original constraint, but this is a separate issue. --------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: > so the index will help us equally for both cases, as it will point to N > entries of which only one can be alive at a time and which all have to > be checked. > It will be only marginally more work to check if the only live entry > does match the non-index columns. But the "marginally more work" represents code that does not exist at all right now, and which there's no really convenient place to add AFAIR. This seems to me to be going rather out of our way to support a coding practice that is specifically disallowed by the standard. Something that no one has bothered to ask, but seems to me relevant, is exactly why we should consider it important to support foreign keys of this form? Aren't we talking about a poor schema design in the first place, if the referenced column set covers more than just the unique key of the referenced table? At the very least this is a violation of normalization, and so it's inherently inefficient. > There will probably be additional work if we want to drop the original > constraint, but this is a separate issue. It's not separate, because it's more work that we *will* have to do, to support a feature that is nonstandard and of debatable usefulness. regards, tom lane