Thread: Foreign keys for non-default datatypes, redux
Almost a year ago, we talked about the problem that referential integrity should be selecting comparison operators on the basis of b-tree index opclasses, instead of assuming that the appropriate operator is always named "=": http://archives.postgresql.org/pgsql-hackers/2006-02/msg00960.php http://archives.postgresql.org/pgsql-hackers/2006-03/msg00161.php I'm about to go off and implement that at last. To refresh folks' memory, what I think we agreed to was that at the time of definition of a foreign-key constraint, we should identify the specific equality operator to be used for (each column of) the constraint. The method for doing this is to be: * First, identify the unique index that is relied on to enforce uniqueness of the PK entries (we do this already of course). * Look to see if there is an equality operator in this index's opfamily accepting exactly the PK and FK data types (ie, "PK = FK"). If so, use that. * Else, check to see if there is an implicit promotion from the FK datatype to the PK datatype. If so, use the equality operator "PK = PK", which must exist since the opfamily supports an index on the PK datatype. * Else fail (this means that the present warning about "inefficient" foreign keys will become a hard error). The good thing about this proposal is that we know that we have identified an operator whose notion of equality is compatible with the notion of equality being enforced by the unique index, and thus a lot of potential gotchas with nondefault opclasses go away. My intention is that we'd record pg_depend entries making the RI constraint dependent on not only the index, but the specific operators to use. This would not have been too critical a year ago given that opclasses were effectively immutable; but in the current opfamily design it's entirely likely that we'd select cross-type equality operators that are considered "loose" and potentially droppable from the opfamily. So we need dependencies to prevent the operators from disappearing out from under us. (Come to think of it, we might want to record dependencies on the casts too, if we're using implicit casts?) What I'm thinking about right now is that the ri_triggers.c routines need to be able to find out which operators they're supposed to use, so that they can construct the RI queries correctly. We could possibly have them dredge the information out of pg_depend, but this seems inefficient, and I'm not entirely sure how one would match up operators with columns given only the pg_depend entries. What I'd like to propose instead is: * Add an oid[] column to pg_constraint that stores the equality operator OIDs for a foreign-key constraint, in the same column order as conkey[] and confkey[]. * Add an OID column to pg_trigger giving the OID of the constraint owning the trigger (or 0 if none). Add this information to struct Trigger as well, so that it gets passed to trigger functions. Given the pg_constraint OID, the RI triggers could fetch the constraint row and look at conkey[], confkey[], and the new operator oid[] array to determine what they need to know. This would actually mean that they don't need pg_trigger.tgargs at all. I am pretty strongly tempted to stop storing anything in tgargs for RI triggers --- it's ugly, and updating the info during RENAME commands is a pain in the rear. On the other hand removing it might break client-side code that expects to look at tgargs to learn about FK constraints. I'd personally think that pg_constraint is a lot easier to work with, but there might be some code out there left over from way back before pg_constraint existed --- anyone know of any such issue? regards, tom lane
On Fri, 9 Feb 2007, Tom Lane wrote: > Almost a year ago, we talked about the problem that referential > integrity should be selecting comparison operators on the basis > of b-tree index opclasses, instead of assuming that the appropriate > operator is always named "=": > http://archives.postgresql.org/pgsql-hackers/2006-02/msg00960.php > http://archives.postgresql.org/pgsql-hackers/2006-03/msg00161.php > > I'm about to go off and implement that at last. To refresh folks' > memory, what I think we agreed to was that at the time of definition > of a foreign-key constraint, we should identify the specific equality > operator to be used for (each column of) the constraint. The method > for doing this is to be: > > * First, identify the unique index that is relied on to enforce > uniqueness of the PK entries (we do this already of course). > > * Look to see if there is an equality operator in this index's > opfamily accepting exactly the PK and FK data types (ie, "PK = FK"). > If so, use that. > > * Else, check to see if there is an implicit promotion from the FK > datatype to the PK datatype. If so, use the equality operator > "PK = PK", which must exist since the opfamily supports an index > on the PK datatype. > > * Else fail (this means that the present warning about "inefficient" > foreign keys will become a hard error). I assume you're speaking of the version where we just change the constraints to use statements with the OPERATOR() syntax and potential casts rather than the discussion at the end about changing the pk checks to avoid planning entirely? > My intention is that we'd record pg_depend entries making the RI > constraint dependent on not only the index, but the specific operators > to use. This would not have been too critical a year ago given that > opclasses were effectively immutable; but in the current opfamily design > it's entirely likely that we'd select cross-type equality operators that > are considered "loose" and potentially droppable from the opfamily. > So we need dependencies to prevent the operators from disappearing out > from under us. (Come to think of it, we might want to record > dependencies on the casts too, if we're using implicit casts?) I think we probably should, so the above seems reasonable to me. > * Add an oid[] column to pg_constraint that stores the equality operator > OIDs for a foreign-key constraint, in the same column order as conkey[] > and confkey[]. > > * Add an OID column to pg_trigger giving the OID of the constraint > owning the trigger (or 0 if none). Add this information to struct > Trigger as well, so that it gets passed to trigger functions. > > Given the pg_constraint OID, the RI triggers could fetch the constraint > row and look at conkey[], confkey[], and the new operator oid[] array > to determine what they need to know. > > This would actually mean that they don't need pg_trigger.tgargs at all. > I am pretty strongly tempted to stop storing anything in tgargs for RI > triggers --- it's ugly, and updating the info during RENAME commands > is a pain in the rear. On the other hand removing it might break > client-side code that expects to look at tgargs to learn about FK > constraints. I'd personally think that pg_constraint is a lot easier to > work with, but there might be some code out there left over from way > back before pg_constraint existed --- anyone know of any such issue? I'd say we probably want to keep the tgargs info for at least a version or two after changing the implementation. Getting rid of using the args info sounds like a good idea. One side question is what should we do about the places in the current system where it checks for the key sets being empty? AFAIK, we still don't actually support letting you define a constraint that way, and I haven't heard any complaints about that, and I'm not even sure if that actually made it into the spec proper.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > I assume you're speaking of the version where we just change the > constraints to use statements with the OPERATOR() syntax and potential > casts rather than the discussion at the end about changing the pk checks > to avoid planning entirely? Yeah, we might get around to doing that someday but I'm not excited about it right now. (I'm mainly doing this because it fits in with the operator-family work I've been doing --- that also got rid of some unsupportable assumptions about operators being named "=" ...) > I'd say we probably want to keep the tgargs info for at least a version or > two after changing the implementation. Getting rid of using the args info > sounds like a good idea. We whack the catalogs around in incompatible ways in every release. I'm willing to keep filling tgargs if someone can point to a real use-case, but not just because there might be code out there somewhere using it. > One side question is what should we do about the > places in the current system where it checks for the key sets being empty? I don't see that this affects that either way. I can't quite imagine what the semantics would be, though --- there's no such thing as a unique constraint with no columns, so how can there be an RI constraint with none? regards, tom lane
On Sat, 10 Feb 2007, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > One side question is what should we do about the > > places in the current system where it checks for the key sets being empty? > > I don't see that this affects that either way. I can't quite imagine > what the semantics would be, though --- there's no such thing as a > unique constraint with no columns, so how can there be an RI constraint > with none? Well, the code currently has checks with comments based on SQL3 text AFAICT. /* ---------- * SQL3 11.9 <referential constraint definition> * General rules 2) a): * If Rf and Rt are empty (no columns to compare given) * constraint is true if0 < (SELECT COUNT(*) FROM T) * * Note: The special case that no columns are given cannot * occur up to now in Postgres, it's just there for * future enhancements. * ---------- */ The reason I was wondering is that it uses tgnargs == 4 as the check, and if we change the meanings of tgnargs, we'd need to change the check. Personally, I think we should probably just pull out the special case for now, but thought it should be brought up.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > The reason I was wondering is that it uses tgnargs == 4 as the check, and > if we change the meanings of tgnargs, we'd need to change the check. Sure, it'd be looking for a zero-length conkeys array instead. regards, tom lane
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Fri, 9 Feb 2007, Tom Lane wrote: >> I am pretty strongly tempted to stop storing anything in tgargs for RI >> triggers --- it's ugly, and updating the info during RENAME commands >> is a pain in the rear. > I'd say we probably want to keep the tgargs info for at least a version or > two after changing the implementation. Getting rid of using the args info > sounds like a good idea. After digging around in the code for awhile I realized that there's a potentially bigger backwards-compatibility issue here: if we make the RI triggers dependent on finding a pg_constraint entry, then foreign key constraints loaded from dumps from pre-7.3 databases will no longer work. Those dumps just contain "CREATE CONSTRAINT TRIGGER" commands which will not provide enough information. We can make the triggers throw errors suggesting that the user drop the triggers and perform ALTER TABLE ADD CONSTRAINT. Is that enough, or do we need to try harder? It would probably be possible to teach pg_dump to cons up ADD CONSTRAINT commands when dumping from an old server, but I think it would be a lot of work (certainly we punted on that idea back in the 7.3 devel cycle) and I'm not sure there are enough people running pre-7.3 PG for it to be worth the effort to provide an automated solution. Comments? regards, tom lane
> After digging around in the code for awhile I realized that there's a > potentially bigger backwards-compatibility issue here: if we make the > RI triggers dependent on finding a pg_constraint entry, then foreign > key constraints loaded from dumps from pre-7.3 databases will no longer > work. Those dumps just contain "CREATE CONSTRAINT TRIGGER" commands > which will not provide enough information. We can make the triggers > throw errors suggesting that the user drop the triggers and perform > ALTER TABLE ADD CONSTRAINT. Is that enough, or do we need to try > harder? I think it is reasonable to expect that we can not support 7.3 dumps in that manner considering we are talking about 8.3 ;). We can't be backward compatible forever. Further in their right mind is trying to do a 24x7 shop on 7.3. They could always dump to 8.1 and then to 8.3. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
"Joshua D. Drake" <jd@commandprompt.com> writes: >> key constraints loaded from dumps from pre-7.3 databases will no longer >> work. Those dumps just contain "CREATE CONSTRAINT TRIGGER" commands >> which will not provide enough information. We can make the triggers >> throw errors suggesting that the user drop the triggers and perform >> ALTER TABLE ADD CONSTRAINT. Is that enough, or do we need to try >> harder? > Further in their right mind is trying to do a 24x7 shop on 7.3. They > could always dump to 8.1 and then to 8.3. Actually that wouldn't help: you'd still have a CREATE CONSTRAINT TRIGGER -based foreign key. The only thing that'd really fix it is having used the old contrib "adddepend" utility at some point along the line. Since we never forced people to do that, it's fairly likely that some never did. [Thinks for a bit...] It would still work to run adddepend over the schema even after having loaded it into 8.3, assuming that adddepend hasn't suffered bit-rot. We dropped it from contrib because no one was maintaining it anymore, but AFAIR there was no evidence that it's actively broken. So maybe we can just point to that for anyone who comes along with an upgrade problem. regards, tom lane
I wrote: >> * Add an oid[] column to pg_constraint that stores the equality operator >> OIDs for a foreign-key constraint, in the same column order as conkey[] >> and confkey[]. It turns out this isn't sufficient: ri_Check_Pk_Match() wants to generate PK = PK checks, and the PK = FK operator isn't the right one for that. The information I suggested adding to pg_constraint isn't enough to let it find out which operator is the right one. We could handle this in a couple of ways: 1. Add yet another column with PK=PK operator OIDs to pg_constraint. 2. Add a column with the underlying PK index's OID to pg_constraint, and expect ri_Check_Pk_Match to dredge the info from that. This is probably possible, but not exactly trivial because of which-column-is-which considerations. 3. Leave pg_constraint alone and expect ri_Check_Pk_Match to look in pg_depend to find out the underlying PK index, then proceed as in #2. From an efficiency standpoint #1 seems the best, and yet it seems a bit ugly. Not that the others aren't. Comments, other ideas? regards, tom lane
On Mon, 12 Feb 2007, Tom Lane wrote: > I wrote: > >> * Add an oid[] column to pg_constraint that stores the equality operator > >> OIDs for a foreign-key constraint, in the same column order as conkey[] > >> and confkey[]. > > It turns out this isn't sufficient: ri_Check_Pk_Match() wants to > generate PK = PK checks, and the PK = FK operator isn't the right one > for that. Ugh, right, for modifications of the pk side with no action to make sure there isn't a new row with that key. > The information I suggested adding to pg_constraint isn't enough to let > it find out which operator is the right one. > > We could handle this in a couple of ways: > > 1. Add yet another column with PK=PK operator OIDs to pg_constraint. > > 2. Add a column with the underlying PK index's OID to pg_constraint, and > expect ri_Check_Pk_Match to dredge the info from that. This is probably > possible, but not exactly trivial because of which-column-is-which > considerations. > > 3. Leave pg_constraint alone and expect ri_Check_Pk_Match to look in > pg_depend to find out the underlying PK index, then proceed as in #2. > > >From an efficiency standpoint #1 seems the best, and yet it seems a bit > ugly. Not that the others aren't. Comments, other ideas? I think #1, while ugly, is probably less ugly than the others, although I guess it means even more work if the underlying type of the column is changed. Is there any reason to think that in the future we might need more such things for some constraints?
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > I think #1, while ugly, is probably less ugly than the others, although I > guess it means even more work if the underlying type of the column is > changed. Oy, I hadn't thought of that. [ considers... ] I *think* that it'll work without special code, because ALTER COLUMN TYPE drops and recreates the constraints, but definitely something to test. Thanks for the reminder. > Is there any reason to think that in the future we might need more such > things for some constraints? Um ... more such which, exactly? And do you have something in mind if the answer is "yes"? regards, tom lane
On Saturday 10 February 2007 13:59, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > I'd say we probably want to keep the tgargs info for at least a version > > or two after changing the implementation. Getting rid of using the args > > info sounds like a good idea. > > We whack the catalogs around in incompatible ways in every release. I'm > willing to keep filling tgargs if someone can point to a real use-case, > but not just because there might be code out there somewhere using it. > I'm pretty sure we use tgargs in phppgadmin, though exactly why escapes me... I am thinking it would be to display a triggers arguments? Assuming we can still get all the same information one way or another I suppose we can update our code, though right now that code is pretty well intermixed with the normal function code iirc (I don't think it has been updated at all in the 8.x series, so my memory is pretty fuzzy on this), so if I could avoid changing it I would... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote: > On Saturday 10 February 2007 13:59, Tom Lane wrote: >> Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >>> I'd say we probably want to keep the tgargs info for at least a version >>> or two after changing the implementation. Getting rid of using the args >>> info sounds like a good idea. >> We whack the catalogs around in incompatible ways in every release. I'm >> willing to keep filling tgargs if someone can point to a real use-case, >> but not just because there might be code out there somewhere using it. >> > > I'm pretty sure we use tgargs in phppgadmin, though exactly why escapes me... > I am thinking it would be to display a triggers arguments? Assuming we can > still get all the same information one way or another I suppose we can update > our code, though right now that code is pretty well intermixed with the > normal function code iirc (I don't think it has been updated at all in the > 8.x series, so my memory is pretty fuzzy on this), so if I could avoid > changing it I would... As far as I understood the proposal, tgargs wouldn't go away, it would just not be populated for RI triggers. So as long as pgadmin3 doesn't use tgargs to get information about constraints, pgadmin would be fine I believe... greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > As far as I understood the proposal, tgargs wouldn't go away, it would > just not be populated for RI triggers. Yes, of course. I wasn't suggesting that we take away the ability to pass arguments to triggers in general. regards, tom lane
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Mon, 12 Feb 2007, Tom Lane wrote: >> It turns out this isn't sufficient: ri_Check_Pk_Match() wants to >> generate PK = PK checks, and the PK = FK operator isn't the right one >> for that. > Ugh, right, for modifications of the pk side with no action to make sure > there isn't a new row with that key. When I looked closer I found out that ri_AttributesEqual() is applied to pairs of FK values as well as pairs of PK values. The old coding was looking up the default btree operator for the types, which is close but not close enough, if we want to allow for the possibility of unique indexes built on non-default operator classes. So I ended up with three new columns in pg_constraint --- PK=FK, PK=PK, FK=FK operator OIDs. Anyway it's all done and committed. It strikes me BTW that the RI mechanism was vulnerable to the same sort of search_path-based shenanigans that Peter is worried about: any RI checks triggered within a SECURITY DEFINER function could have been subverted by substituting a different "=" operator. regards, tom lane