Thread: Foreign keys for non-default datatypes, redux

Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

10 February 2007, 02:36:00

Almost a year ago, we talked about the problem that referential
integrity should be selecting comparison operators on the basis
of b-tree index opclasses, instead of assuming that the appropriate
operator is always named "=":
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00960.php
http://archives.postgresql.org/pgsql-hackers/2006-03/msg00161.php

I'm about to go off and implement that at last.  To refresh folks'
memory, what I think we agreed to was that at the time of definition
of a foreign-key constraint, we should identify the specific equality
operator to be used for (each column of) the constraint.  The method
for doing this is to be:

* First, identify the unique index that is relied on to enforce
uniqueness of the PK entries (we do this already of course).

* Look to see if there is an equality operator in this index's
opfamily accepting exactly the PK and FK data types (ie, "PK = FK").
If so, use that.

* Else, check to see if there is an implicit promotion from the FK
datatype to the PK datatype.  If so, use the equality operator
"PK = PK", which must exist since the opfamily supports an index
on the PK datatype.

* Else fail (this means that the present warning about "inefficient"
foreign keys will become a hard error).

The good thing about this proposal is that we know that we have
identified an operator whose notion of equality is compatible with
the notion of equality being enforced by the unique index, and thus
a lot of potential gotchas with nondefault opclasses go away.

My intention is that we'd record pg_depend entries making the RI
constraint dependent on not only the index, but the specific operators
to use.  This would not have been too critical a year ago given that
opclasses were effectively immutable; but in the current opfamily design
it's entirely likely that we'd select cross-type equality operators that
are considered "loose" and potentially droppable from the opfamily.
So we need dependencies to prevent the operators from disappearing out
from under us.  (Come to think of it, we might want to record
dependencies on the casts too, if we're using implicit casts?)

What I'm thinking about right now is that the ri_triggers.c routines
need to be able to find out which operators they're supposed to use,
so that they can construct the RI queries correctly.  We could possibly
have them dredge the information out of pg_depend, but this seems
inefficient, and I'm not entirely sure how one would match up operators
with columns given only the pg_depend entries.  What I'd like to propose
instead is:

* Add an oid[] column to pg_constraint that stores the equality operator
OIDs for a foreign-key constraint, in the same column order as conkey[]
and confkey[].

* Add an OID column to pg_trigger giving the OID of the constraint
owning the trigger (or 0 if none).  Add this information to struct
Trigger as well, so that it gets passed to trigger functions.

Given the pg_constraint OID, the RI triggers could fetch the constraint
row and look at conkey[], confkey[], and the new operator oid[] array
to determine what they need to know.

This would actually mean that they don't need pg_trigger.tgargs at all.
I am pretty strongly tempted to stop storing anything in tgargs for RI
triggers --- it's ugly, and updating the info during RENAME commands
is a pain in the rear.  On the other hand removing it might break
client-side code that expects to look at tgargs to learn about FK
constraints.  I'd personally think that pg_constraint is a lot easier to
work with, but there might be some code out there left over from way
back before pg_constraint existed --- anyone know of any such issue?
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

Stephan Szabo

Date:

10 February 2007, 17:42:10

On Fri, 9 Feb 2007, Tom Lane wrote:

> Almost a year ago, we talked about the problem that referential
> integrity should be selecting comparison operators on the basis
> of b-tree index opclasses, instead of assuming that the appropriate
> operator is always named "=":
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg00960.php
> http://archives.postgresql.org/pgsql-hackers/2006-03/msg00161.php
>
> I'm about to go off and implement that at last.  To refresh folks'
> memory, what I think we agreed to was that at the time of definition
> of a foreign-key constraint, we should identify the specific equality
> operator to be used for (each column of) the constraint.  The method
> for doing this is to be:
>
> * First, identify the unique index that is relied on to enforce
> uniqueness of the PK entries (we do this already of course).
>
> * Look to see if there is an equality operator in this index's
> opfamily accepting exactly the PK and FK data types (ie, "PK = FK").
> If so, use that.
>
> * Else, check to see if there is an implicit promotion from the FK
> datatype to the PK datatype.  If so, use the equality operator
> "PK = PK", which must exist since the opfamily supports an index
> on the PK datatype.
>
> * Else fail (this means that the present warning about "inefficient"
> foreign keys will become a hard error).

I assume you're speaking of the version where we just change the
constraints to use statements with the OPERATOR() syntax and potential
casts rather than the discussion at the end about changing the pk checks
to avoid planning entirely?

> My intention is that we'd record pg_depend entries making the RI
> constraint dependent on not only the index, but the specific operators
> to use.  This would not have been too critical a year ago given that
> opclasses were effectively immutable; but in the current opfamily design
> it's entirely likely that we'd select cross-type equality operators that
> are considered "loose" and potentially droppable from the opfamily.
> So we need dependencies to prevent the operators from disappearing out
> from under us.  (Come to think of it, we might want to record
> dependencies on the casts too, if we're using implicit casts?)

I think we probably should, so the above seems reasonable to me.

> * Add an oid[] column to pg_constraint that stores the equality operator
> OIDs for a foreign-key constraint, in the same column order as conkey[]
> and confkey[].
>
> * Add an OID column to pg_trigger giving the OID of the constraint
> owning the trigger (or 0 if none).  Add this information to struct
> Trigger as well, so that it gets passed to trigger functions.
>
> Given the pg_constraint OID, the RI triggers could fetch the constraint
> row and look at conkey[], confkey[], and the new operator oid[] array
> to determine what they need to know.
>
> This would actually mean that they don't need pg_trigger.tgargs at all.
> I am pretty strongly tempted to stop storing anything in tgargs for RI
> triggers --- it's ugly, and updating the info during RENAME commands
> is a pain in the rear.  On the other hand removing it might break
> client-side code that expects to look at tgargs to learn about FK
> constraints.  I'd personally think that pg_constraint is a lot easier to
> work with, but there might be some code out there left over from way
> back before pg_constraint existed --- anyone know of any such issue?

I'd say we probably want to keep the tgargs info for at least a version or
two after changing the implementation.  Getting rid of using the args info
sounds like a good idea.  One side question is what should we do about the
places in the current system where it checks for the key sets being empty?
AFAIK, we still don't actually support letting you define a constraint
that way, and I haven't heard any complaints about that, and I'm not even
sure if that actually made it into the spec proper.

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

10 February 2007, 18:00:07

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I assume you're speaking of the version where we just change the
> constraints to use statements with the OPERATOR() syntax and potential
> casts rather than the discussion at the end about changing the pk checks
> to avoid planning entirely?

Yeah, we might get around to doing that someday but I'm not excited
about it right now.  (I'm mainly doing this because it fits in with the
operator-family work I've been doing --- that also got rid of some
unsupportable assumptions about operators being named "=" ...)

> I'd say we probably want to keep the tgargs info for at least a version or
> two after changing the implementation.  Getting rid of using the args info
> sounds like a good idea.

We whack the catalogs around in incompatible ways in every release.  I'm
willing to keep filling tgargs if someone can point to a real use-case,
but not just because there might be code out there somewhere using it.

> One side question is what should we do about the
> places in the current system where it checks for the key sets being empty?

I don't see that this affects that either way.  I can't quite imagine
what the semantics would be, though --- there's no such thing as a
unique constraint with no columns, so how can there be an RI constraint
with none?
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

Stephan Szabo

Date:

10 February 2007, 18:05:25

On Sat, 10 Feb 2007, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > One side question is what should we do about the
> > places in the current system where it checks for the key sets being empty?
>
> I don't see that this affects that either way.  I can't quite imagine
> what the semantics would be, though --- there's no such thing as a
> unique constraint with no columns, so how can there be an RI constraint
> with none?

Well, the code currently has checks with comments based on SQL3
text AFAICT.       /* ----------        * SQL3 11.9 <referential constraint definition>        *      General rules 2)
a):       *              If Rf and Rt are empty (no columns to compare given)        *              constraint is true
if0 < (SELECT COUNT(*) FROM T)        *        *      Note: The special case that no columns are given cannot        *
           occur up to now in Postgres, it's just there for        *              future enhancements.        *
----------       */
 
The reason I was wondering is that it uses tgnargs == 4 as the check, and
if we change the meanings of tgnargs, we'd need to change the check.
Personally, I think we should probably just pull out the special case for
now, but thought it should be brought up.

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

10 February 2007, 18:22:36

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> The reason I was wondering is that it uses tgnargs == 4 as the check, and
> if we change the meanings of tgnargs, we'd need to change the check.

Sure, it'd be looking for a zero-length conkeys array instead.
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

10 February 2007, 21:05:05

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Fri, 9 Feb 2007, Tom Lane wrote:
>> I am pretty strongly tempted to stop storing anything in tgargs for RI
>> triggers --- it's ugly, and updating the info during RENAME commands
>> is a pain in the rear.

> I'd say we probably want to keep the tgargs info for at least a version or
> two after changing the implementation.  Getting rid of using the args info
> sounds like a good idea.

After digging around in the code for awhile I realized that there's a
potentially bigger backwards-compatibility issue here: if we make the
RI triggers dependent on finding a pg_constraint entry, then foreign
key constraints loaded from dumps from pre-7.3 databases will no longer
work.  Those dumps just contain "CREATE CONSTRAINT TRIGGER" commands
which will not provide enough information.  We can make the triggers
throw errors suggesting that the user drop the triggers and perform
ALTER TABLE ADD CONSTRAINT.  Is that enough, or do we need to try
harder?

It would probably be possible to teach pg_dump to cons up ADD CONSTRAINT
commands when dumping from an old server, but I think it would be a lot
of work (certainly we punted on that idea back in the 7.3 devel cycle)
and I'm not sure there are enough people running pre-7.3 PG for it to
be worth the effort to provide an automated solution.

Comments?
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

"Joshua D. Drake"

Date:

10 February 2007, 21:10:07

> After digging around in the code for awhile I realized that there's a
> potentially bigger backwards-compatibility issue here: if we make the
> RI triggers dependent on finding a pg_constraint entry, then foreign
> key constraints loaded from dumps from pre-7.3 databases will no longer
> work.  Those dumps just contain "CREATE CONSTRAINT TRIGGER" commands
> which will not provide enough information.  We can make the triggers
> throw errors suggesting that the user drop the triggers and perform
> ALTER TABLE ADD CONSTRAINT.  Is that enough, or do we need to try
> harder?

I think it is reasonable to expect that we can not support 7.3 dumps in
that manner considering we are talking about 8.3 ;). We can't be
backward compatible forever.

Further in their right mind is trying to do a 24x7 shop on 7.3. They
could always dump to 8.1 and then to 8.3.

Joshua D. Drake



-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

10 February 2007, 21:20:19

"Joshua D. Drake" <jd@commandprompt.com> writes:
>> key constraints loaded from dumps from pre-7.3 databases will no longer
>> work.  Those dumps just contain "CREATE CONSTRAINT TRIGGER" commands
>> which will not provide enough information.  We can make the triggers
>> throw errors suggesting that the user drop the triggers and perform
>> ALTER TABLE ADD CONSTRAINT.  Is that enough, or do we need to try
>> harder?

> Further in their right mind is trying to do a 24x7 shop on 7.3. They
> could always dump to 8.1 and then to 8.3.

Actually that wouldn't help: you'd still have a CREATE CONSTRAINT TRIGGER
-based foreign key.  The only thing that'd really fix it is having used
the old contrib "adddepend" utility at some point along the line.  Since
we never forced people to do that, it's fairly likely that some never
did.

[Thinks for a bit...]  It would still work to run adddepend over the
schema even after having loaded it into 8.3, assuming that adddepend
hasn't suffered bit-rot.  We dropped it from contrib because no one
was maintaining it anymore, but AFAIR there was no evidence that it's
actively broken.  So maybe we can just point to that for anyone who
comes along with an upgrade problem.
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

12 February 2007, 04:36:46

I wrote:
>> * Add an oid[] column to pg_constraint that stores the equality operator
>> OIDs for a foreign-key constraint, in the same column order as conkey[]
>> and confkey[].

It turns out this isn't sufficient: ri_Check_Pk_Match() wants to
generate PK = PK checks, and the PK = FK operator isn't the right one
for that.  The information I suggested adding to pg_constraint isn't
enough to let it find out which operator is the right one.

We could handle this in a couple of ways:

1. Add yet another column with PK=PK operator OIDs to pg_constraint.

2. Add a column with the underlying PK index's OID to pg_constraint, and
expect ri_Check_Pk_Match to dredge the info from that.  This is probably
possible, but not exactly trivial because of which-column-is-which
considerations.

3. Leave pg_constraint alone and expect ri_Check_Pk_Match to look in
pg_depend to find out the underlying PK index, then proceed as in #2.

From an efficiency standpoint #1 seems the best, and yet it seems a bit
ugly.  Not that the others aren't.  Comments, other ideas?
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

Stephan Szabo

Date:

13 February 2007, 00:42:21

On Mon, 12 Feb 2007, Tom Lane wrote:

> I wrote:
> >> * Add an oid[] column to pg_constraint that stores the equality operator
> >> OIDs for a foreign-key constraint, in the same column order as conkey[]
> >> and confkey[].
>
> It turns out this isn't sufficient: ri_Check_Pk_Match() wants to
> generate PK = PK checks, and the PK = FK operator isn't the right one
> for that.

Ugh, right, for modifications of the pk side with no action to make sure
there isn't a new row with that key.

> The information I suggested adding to pg_constraint isn't enough to let
> it find out which operator is the right one.
>
> We could handle this in a couple of ways:
>
> 1. Add yet another column with PK=PK operator OIDs to pg_constraint.
>
> 2. Add a column with the underlying PK index's OID to pg_constraint, and
> expect ri_Check_Pk_Match to dredge the info from that.  This is probably
> possible, but not exactly trivial because of which-column-is-which
> considerations.
>
> 3. Leave pg_constraint alone and expect ri_Check_Pk_Match to look in
> pg_depend to find out the underlying PK index, then proceed as in #2.
>
> >From an efficiency standpoint #1 seems the best, and yet it seems a bit
> ugly.  Not that the others aren't.  Comments, other ideas?

I think #1, while ugly, is probably less ugly than the others, although I
guess it means even more work if the underlying type of the column is
changed.

Is there any reason to think that in the future we might need more such
things for some constraints?

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

13 February 2007, 00:55:18

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I think #1, while ugly, is probably less ugly than the others, although I
> guess it means even more work if the underlying type of the column is
> changed.

Oy, I hadn't thought of that.  [ considers... ]  I *think* that it'll
work without special code, because ALTER COLUMN TYPE drops and recreates
the constraints, but definitely something to test.  Thanks for the
reminder.

> Is there any reason to think that in the future we might need more such
> things for some constraints?

Um ... more such which, exactly?  And do you have something in mind if
the answer is "yes"?
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

Robert Treat

Date:

13 February 2007, 02:27:42

On Saturday 10 February 2007 13:59, Tom Lane wrote:
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > I'd say we probably want to keep the tgargs info for at least a version
> > or two after changing the implementation.  Getting rid of using the args
> > info sounds like a good idea.
>
> We whack the catalogs around in incompatible ways in every release.  I'm
> willing to keep filling tgargs if someone can point to a real use-case,
> but not just because there might be code out there somewhere using it.
>

I'm pretty sure we use tgargs in phppgadmin, though exactly why escapes me... 
I am thinking it would be to display a triggers arguments?   Assuming we can 
still get all the same information one way or another I suppose we can update 
our code, though right now that code is pretty well intermixed with the 
normal function code iirc (I don't think it has been updated at all in the 
8.x series, so my memory is pretty fuzzy on this), so if I could avoid 
changing it I would... 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Re: Foreign keys for non-default datatypes, redux

From

"Florian G. Pflug"

Date:

13 February 2007, 09:46:46

Robert Treat wrote:
> On Saturday 10 February 2007 13:59, Tom Lane wrote:
>> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>>> I'd say we probably want to keep the tgargs info for at least a version
>>> or two after changing the implementation.  Getting rid of using the args
>>> info sounds like a good idea.
>> We whack the catalogs around in incompatible ways in every release.  I'm
>> willing to keep filling tgargs if someone can point to a real use-case,
>> but not just because there might be code out there somewhere using it.
>>
> 
> I'm pretty sure we use tgargs in phppgadmin, though exactly why escapes me... 
> I am thinking it would be to display a triggers arguments?   Assuming we can 
> still get all the same information one way or another I suppose we can update 
> our code, though right now that code is pretty well intermixed with the 
> normal function code iirc (I don't think it has been updated at all in the 
> 8.x series, so my memory is pretty fuzzy on this), so if I could avoid 
> changing it I would... 

As far as I understood the proposal, tgargs wouldn't go away, it would just not
be populated for RI triggers. So as long as pgadmin3 doesn't use tgargs to get
information about constraints, pgadmin would be fine I believe...

greetings, Florian Pflug

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

13 February 2007, 13:56:23

"Florian G. Pflug" <fgp@phlo.org> writes:
> As far as I understood the proposal, tgargs wouldn't go away, it would
> just not be populated for RI triggers.

Yes, of course.  I wasn't suggesting that we take away the ability to
pass arguments to triggers in general.
        regards, tom lane

Re: Foreign keys for non-default datatypes, redux

From

Tom Lane

Date:

14 February 2007, 01:17:08

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Mon, 12 Feb 2007, Tom Lane wrote:
>> It turns out this isn't sufficient: ri_Check_Pk_Match() wants to
>> generate PK = PK checks, and the PK = FK operator isn't the right one
>> for that.

> Ugh, right, for modifications of the pk side with no action to make sure
> there isn't a new row with that key.

When I looked closer I found out that ri_AttributesEqual() is applied to
pairs of FK values as well as pairs of PK values.  The old coding was
looking up the default btree operator for the types, which is close but
not close enough, if we want to allow for the possibility of unique
indexes built on non-default operator classes.  So I ended up with three
new columns in pg_constraint --- PK=FK, PK=PK, FK=FK operator OIDs.
Anyway it's all done and committed.

It strikes me BTW that the RI mechanism was vulnerable to the same sort
of search_path-based shenanigans that Peter is worried about: any RI
checks triggered within a SECURITY DEFINER function could have been
subverted by substituting a different "=" operator.
        regards, tom lane