Thread: complex referential integrity constraints
So, I have the following problem. Suppose you have two kinds of animals, sheep and wolves. Since they have very similar properties, you create a single table to hold both kinds of animals, and an animal_type table to specify the type of each animal: CREATE TABLE animal_type ( id integer not null, name varchar(80) not null, primary key (id) ); INSERT INTO animal_type VALUES (1, 'Sheep'); INSERT INTO animal_type VALUES (2, 'Wolf'); CREATE TABLE animal ( id serial, type_id integer not null references animal_type (id), name varchar(80) not null, age integer not null, weight_in_pounds integer not null, primary key (id) ); The animal_type table is more or less written in stone, but the animal table will be updated frequently. Now, let's suppose that we want to keep track of all of the cases where one animal is mauled by another animal: CREATE TABLE mauling ( id serial, attacker_id integer not null references animal (id), victim_id integer not null references animal (id), attack_time timestamp not null, primary key (id) ); The problem with this is that I have a very unsettled feeling about the foreign key constraints on this table. The victim_id constraint is fine, but the attacker_id constraint is really inadequate, because the attacker CAN NEVER BE A SHEEP. I really want a way to write a constraint that says that the attacker must be an animal, but specifically, a wolf. It would be really nice to be able to write: FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id) Or: CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2; -- and then FOREIGN KEY (attacker_id) REFERENCES INDEX wolves ...but that's entirely speculative syntax. I don't think there's any easy way to do this. (Please tell me I'm wrong.) The problem really comes in when people start modifying the animal table. Every once in a while we have a case where we record something as a wolf, but it turns out to have been a sheep in wolf's clothing. In this case, we want to do something like this: UPDATE animal SET type_id = 1 WHERE id = 572; HOWEVER, this operation MUST NOT be allowed if it turns out there is a row in the mauling table where attacker_id = 572, because that would violate my integrity constraints that says that sheep do not maul. Any suggestions? I've thought about creating rules or triggers to check the conditions, but I'm scared that this could either (a) get really complicated when there are a lot more tables and constraints involved or (b) introduce race conditions. Thanks, ...Robert
"Robert Haas" <Robert.Haas@dyntek.com> writes: > ... The problem with this is that I have a very unsettled feeling about the > foreign key constraints on this table. The victim_id constraint is > fine, but the attacker_id constraint is really inadequate, because the > attacker CAN NEVER BE A SHEEP. I think the only way to do this in SQL is to denormalize a bit. If you copy the animal_type field into the maulings table then you can apply a check constraint there. So FOREIGN KEY (attacker_id, attacker_type_id) REFERENCES animal (id, type_id) ON UPDATE CASCADE CHECK (attacker_type_id != 'sheep') The thing that's still a bit annoying is that you'd have to hard-wire the numerical code for SHEEP into the check constraint; you couldn't really write it symbolically as I did above. Perhaps you should further denormalize and keep real animal type names not codes in the animal type table, thus CREATE TABLE animal_type ( name varchar(80) primary key ); CREATE TABLE animal ( id serial, type varchar(80) references animal_type, ... ); whereupon the maulings table also has real type names not IDs. No doubt some relational-theory maven will come along and slap your wrist for doing this, but he should first explain how to do it without denormalization... Also, I think what you've really done here is created a "poor man's enum". There will probably be real enum types in PG 8.3, which would offer a more efficient solution to the problem of representing animal types. regards, tom lane
On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote: > So, I have the following problem. > > Suppose you have two kinds of animals, sheep and wolves. Since they > have very similar properties, you create a single table to hold both > kinds of animals, and an animal_type table to specify the type of each > animal: > > CREATE TABLE animal_type ( > id integer not null, > name varchar(80) not null, > primary key (id) > ); > INSERT INTO animal_type VALUES (1, 'Sheep'); > INSERT INTO animal_type VALUES (2, 'Wolf'); > > CREATE TABLE animal ( > id serial, > type_id integer not null references animal_type (id), > name varchar(80) not null, > age integer not null, > weight_in_pounds integer not null, > primary key (id) > ); > > The animal_type table is more or less written in stone, but the animal > table will be updated frequently. Now, let's suppose that we want to > keep track of all of the cases where one animal is mauled by another > animal: > > CREATE TABLE mauling ( > id serial, > attacker_id integer not null references animal (id), > victim_id integer not null references animal (id), > attack_time timestamp not null, > primary key (id) > ); > > The problem with this is that I have a very unsettled feeling about the > foreign key constraints on this table. The victim_id constraint is > fine, but the attacker_id constraint is really inadequate, because the > attacker CAN NEVER BE A SHEEP. I really want a way to write a > constraint that says that the attacker must be an animal, but > specifically, a wolf. > > It would be really nice to be able to write: > > FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id) > > Or: > > CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2; > -- and then > FOREIGN KEY (attacker_id) REFERENCES INDEX wolves > > ...but that's entirely speculative syntax. I don't think there's any > easy way to do this. (Please tell me I'm wrong.) > > The problem really comes in when people start modifying the animal > table. Every once in a while we have a case where we record something > as a wolf, but it turns out to have been a sheep in wolf's clothing. In > this case, we want to do something like this: > > UPDATE animal SET type_id = 1 WHERE id = 572; > > HOWEVER, this operation MUST NOT be allowed if it turns out there is a > row in the mauling table where attacker_id = 572, because that would > violate my integrity constraints that says that sheep do not maul. > > Any suggestions? I've thought about creating rules or triggers to check > the conditions, but I'm scared that this could either (a) get really > complicated when there are a lot more tables and constraints involved or > (b) introduce race conditions. Why don't you add a field in animal_types that is boolean mauler. Then you can add a trigger on the mauling table to raise an error when the attacker_id is an animal type mauler. --elein
Robert Haas wrote: > So, I have the following problem. > > Suppose you have two kinds of animals, sheep and wolves. Since they > have very similar properties, you create a single table to hold both > kinds of animals, and an animal_type table to specify the type of each > animal: > > CREATE TABLE animal_type ( > id integer not null, > name varchar(80) not null, > primary key (id) > ); > INSERT INTO animal_type VALUES (1, 'Sheep'); > INSERT INTO animal_type VALUES (2, 'Wolf'); > > CREATE TABLE animal ( > id serial, > type_id integer not null references animal_type (id), > name varchar(80) not null, > age integer not null, > weight_in_pounds integer not null, > primary key (id) > ); > > The animal_type table is more or less written in stone, but the animal > table will be updated frequently. Now, let's suppose that we want to > keep track of all of the cases where one animal is mauled by another > animal: I kind of get the feeling that you'd want it like this: CREATE TABLE predator ( ... ) INHERITS animal; And then put your foreign key constraints from predator to mauling. You may want to be more accurate about what kind of animals sheep are as well. I haven't really given this much thought though, I'm just quickly reading my mail before starting work ;) -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
>-----Original Message----- >From: pgsql-general-owner@postgresql.org >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of elein >Sent: zondag 18 februari 2007 23:16 >To: Robert Haas >Cc: pgsql-general@postgresql.org >Subject: Re: [GENERAL] complex referential integrity constraints > >On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote: >> So, I have the following problem. >> >> Suppose you have two kinds of animals, sheep and wolves. Since they >> have very similar properties, you create a single table to hold both >> kinds of animals, and an animal_type table to specify the >type of each >> animal: >> >> CREATE TABLE animal_type ( >> id integer not null, >> name varchar(80) not null, >> primary key (id) >> ); >> INSERT INTO animal_type VALUES (1, 'Sheep'); INSERT INTO animal_type >> VALUES (2, 'Wolf'); >> >> CREATE TABLE animal ( >> id serial, >> type_id integer not null references >animal_type (id), >> name varchar(80) not null, >> age integer not null, >> weight_in_pounds integer not null, >> primary key (id) >> ); >> >> The animal_type table is more or less written in stone, but >the animal >> table will be updated frequently. Now, let's suppose that >we want to >> keep track of all of the cases where one animal is mauled by another >> animal: >> >> CREATE TABLE mauling ( >> id serial, >> attacker_id integer not null references animal (id), >> victim_id integer not null references animal (id), >> attack_time timestamp not null, >> primary key (id) >> ); >> >> The problem with this is that I have a very unsettled feeling about >> the foreign key constraints on this table. The victim_id constraint >> is fine, but the attacker_id constraint is really >inadequate, because >> the attacker CAN NEVER BE A SHEEP. I really want a way to write a >> constraint that says that the attacker must be an animal, but >> specifically, a wolf. >> >> It would be really nice to be able to write: >> >> FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id) >> >> Or: >> >> CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2; >> -- and then >> FOREIGN KEY (attacker_id) REFERENCES INDEX wolves >> >> ...but that's entirely speculative syntax. I don't think >there's any >> easy way to do this. (Please tell me I'm wrong.) >> >> The problem really comes in when people start modifying the animal >> table. Every once in a while we have a case where we record >something >> as a wolf, but it turns out to have been a sheep in wolf's >clothing. >> In this case, we want to do something like this: >> >> UPDATE animal SET type_id = 1 WHERE id = 572; >> >> HOWEVER, this operation MUST NOT be allowed if it turns out >there is a >> row in the mauling table where attacker_id = 572, because that would >> violate my integrity constraints that says that sheep do not maul. >> >> Any suggestions? I've thought about creating rules or triggers to >> check the conditions, but I'm scared that this could either (a) get >> really complicated when there are a lot more tables and constraints >> involved or >> (b) introduce race conditions. > >Why don't you add a field in animal_types that is boolean mauler. >Then you can add a trigger on the mauling table to raise an >error when the attacker_id is an animal type mauler. This is only partial. You need a lot more triggers to guarentee the constraints are enforced. Precisely you need to validate: * mauling on insert/update of attacker_id * animal on update of type_id * animal_type on update of your property Of course you need to think about the MVCC model, such that: Transaction 1 executes INSERT INTO mauling VALUES ('someattacker'), Transaction 2 executes UPDATE animal_type SET mauler = false WHERE name = 'someattacker', such that both transaction happen in parallel. This is perfectly possible and will make it possible to violate the constraint, UNLESS locking of the tuples is done correctly. These contraints are not trivial to implement (unfortunally). It would be great if they where. - Joris
On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote: > So, I have the following problem. > > Suppose you have two kinds of animals, sheep and wolves. Since they > have very similar properties, you create a single table to hold both > kinds of animals, and an animal_type table to specify the type of each > animal: > > CREATE TABLE animal_type ( > id integer not null, > name varchar(80) not null, > primary key (id) > ); > INSERT INTO animal_type VALUES (1, 'Sheep'); > INSERT INTO animal_type VALUES (2, 'Wolf'); > > CREATE TABLE animal ( > id serial, > type_id integer not null references animal_type (id), > name varchar(80) not null, > age integer not null, > weight_in_pounds integer not null, > primary key (id) > ); > > The animal_type table is more or less written in stone, but the animal > table will be updated frequently. Now, let's suppose that we want to > keep track of all of the cases where one animal is mauled by another > animal: > > CREATE TABLE mauling ( > id serial, > attacker_id integer not null references animal (id), > victim_id integer not null references animal (id), > attack_time timestamp not null, > primary key (id) > ); > > The problem with this is that I have a very unsettled feeling about the > foreign key constraints on this table. The victim_id constraint is > fine, but the attacker_id constraint is really inadequate, because the > attacker CAN NEVER BE A SHEEP. I really want a way to write a > constraint that says that the attacker must be an animal, but > specifically, a wolf. > > It would be really nice to be able to write: > > FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id) > > Or: > > CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2; > -- and then > FOREIGN KEY (attacker_id) REFERENCES INDEX wolves > > ...but that's entirely speculative syntax. I don't think there's any > easy way to do this. (Please tell me I'm wrong.) > > The problem really comes in when people start modifying the animal > table. Every once in a while we have a case where we record something > as a wolf, but it turns out to have been a sheep in wolf's clothing. In > this case, we want to do something like this: > > UPDATE animal SET type_id = 1 WHERE id = 572; > > HOWEVER, this operation MUST NOT be allowed if it turns out there is a > row in the mauling table where attacker_id = 572, because that would > violate my integrity constraints that says that sheep do not maul. > > Any suggestions? I've thought about creating rules or triggers to check > the conditions, but I'm scared that this could either (a) get really > complicated when there are a lot more tables and constraints involved or > (b) introduce race conditions. > > Thanks, > > ...Robert I'd do something like this: CREATE TABLE animal_type ( animal_name TEXT PRIMARY KEY, CHECK(animal_name = trim(animal_name)) ); /* Only one of {Wolf,wolf} can be in the table. */ CREATE UNIQUE INDEX just_one_animal_name ON animal_type(LOWER(animal_name)); CREATE TABLE predator ( animal_name TEXT NOT NULL REFERENCES animal_type(animal_name) ON DELETE CASCADE, PRIMARY KEY(animal_name) ); CREATE TABLE prey ( animal_name TEXT NOT NULL REFERENCES animal_type(animal_name) ON DELETE CASCADE, PRIMARY KEY(animal_name) ); CREATE TABLE mauling ( id SERIAL PRIMARY KEY, attacker_id INTEGER NOT NULL REFERENCES predator (animal_type_id), victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id), attack_time TIMESTAMP WITH TIME ZONE NOT NULL ); Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
> I'd do something like this: > > CREATE TABLE animal_type ( > animal_name TEXT PRIMARY KEY, > CHECK(animal_name = trim(animal_name)) > ); > > /* Only one of {Wolf,wolf} can be in the table. */ > > CREATE UNIQUE INDEX just_one_animal_name > ON animal_type(LOWER(animal_name)); > > CREATE TABLE predator ( > animal_name TEXT NOT NULL > REFERENCES animal_type(animal_name) > ON DELETE CASCADE, > PRIMARY KEY(animal_name) > ); > > CREATE TABLE prey ( > animal_name TEXT NOT NULL > REFERENCES animal_type(animal_name) > ON DELETE CASCADE, > PRIMARY KEY(animal_name) > ); > > CREATE TABLE mauling ( > id SERIAL PRIMARY KEY, > attacker_id INTEGER NOT NULL REFERENCES predator (animal_type_id), > victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id), > attack_time TIMESTAMP WITH TIME ZONE NOT NULL > ); Just to add to David's idea, I would create two update-able views that joined animal to predator and another for animal to prey. This way, you only have to insert/update/delete from 1 update-able view rather than two tables. Of course, I am still waiting for the future version of postgresql that will handle this functionality seamlessly using table inheritance. :-) Regards, Richard Broersma Jr.
On Mon, Feb 19, 2007 at 10:52:51AM -0800, Richard Broersma Jr wrote: > > I'd do something like this: > > > > CREATE TABLE animal_type ( > > animal_name TEXT PRIMARY KEY, > > CHECK(animal_name = trim(animal_name)) > > ); > > > > /* Only one of {Wolf,wolf} can be in the table. */ > > > > CREATE UNIQUE INDEX just_one_animal_name > > ON animal_type(LOWER(animal_name)); > > > > CREATE TABLE predator ( > > animal_name TEXT NOT NULL > > REFERENCES animal_type(animal_name) > > ON DELETE CASCADE, > > PRIMARY KEY(animal_name) > > ); > > > > CREATE TABLE prey ( > > animal_name TEXT NOT NULL > > REFERENCES animal_type(animal_name) > > ON DELETE CASCADE, > > PRIMARY KEY(animal_name) > > ); > > > > CREATE TABLE mauling ( > > id SERIAL PRIMARY KEY, > > attacker_id INTEGER NOT NULL REFERENCES predator (animal_type_id), > > victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id), > > attack_time TIMESTAMP WITH TIME ZONE NOT NULL > > ); > > Just to add to David's idea, I would create two update-able views > that joined animal to predator and another for animal to prey. This > way, you only have to insert/update/delete from 1 update-able view > rather than two tables. You could just do a rewrite RULE on predator and prey for each of INSERT, UPDATE and DELETE that has a DO INSTEAD action that writes to animal. This wouldn't handle COPY, though. > Of course, I am still waiting for the future version of postgresql > that will handle this functionality seamlessly using table > inheritance. :-) You mean writeable VIEWs? I think it would be nice to have some cases handled, but there are several kinds of VIEWs I can think of where the only sane way to write to them is to define the writing behavior on a case-by-case basis. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
I partially agree: If people CAN do stupid things, they are 'clever' enough to find a way to actually do it. I've seen them destroy things, by just using a system in a way it was not intended. They effectively found a way to blow away the very thing that part was designed for. But indeed, it's a lot of work, especially if the number of tables that must be referenced increases. I'm a strong supporter for ensuring consistency. Postgres has what it takes to do the job, but it doesn't make my life a lot easier. But it seems to be as good as it gets today... Perhaps we should rather define a 'database' constraint in the order of: "For every mauling, the attacking animal must be of the attacker type" (in a computer understandable manner). From the set theory this should be possible without too much problems, However doing so efficiently might be slightly harder. This might be a fun project and useful for the TODO list. At least it makes it a lot easier (and maintanable) to enforce database-wide constraints. - Joris >-----Original Message----- >From: Robert Haas [mailto:Robert.Haas@dyntek.com] >Sent: woensdag 21 februari 2007 3:37 >To: Joris Dobbelsteen; elein >Cc: pgsql-general@postgresql.org >Subject: RE: [GENERAL] complex referential integrity constraints > >Yes, exactly. And while you might not care about all of those >(e.g. I care about the first two but am not worried about the >third one because I'm the only one who will ever update that >table), writing multiple triggers to enforce each constraint >of this type quickly gets old if there are even a few of them. > It is exponentially harder to write a constraint of this type >than it is to write a simple foreign key constraint. > >...Robert > >-----Original Message----- >From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl] >Sent: Monday, February 19, 2007 5:59 AM >To: elein; Robert Haas >Cc: pgsql-general@postgresql.org >Subject: RE: [GENERAL] complex referential integrity constraints > >>Why don't you add a field in animal_types that is boolean mauler. >>Then you can add a trigger on the mauling table to raise an >error when >>the attacker_id is an animal type mauler. > >This is only partial. You need a lot more triggers to >guarentee the constraints are enforced. >Precisely you need to validate: >* mauling on insert/update of attacker_id >* animal on update of type_id >* animal_type on update of your property > >Of course you need to think about the MVCC model, such that: >Transaction 1 executes >INSERT INTO mauling VALUES ('someattacker'), Transaction 2 >executes UPDATE animal_type SET mauler = false WHERE name = >'someattacker', such that both transaction happen in parallel. > >This is perfectly possible and will make it possible to >violate the constraint, UNLESS locking of the tuples is done correctly. > >These contraints are not trivial to implement (unfortunally). >It would be great if they where. > >- Joris >
>-----Original Message----- >From: Robert Haas [mailto:Robert.Haas@dyntek.com] >Sent: donderdag 22 februari 2007 15:58 >To: Joris Dobbelsteen; elein >Cc: pgsql-general@postgresql.org >Subject: RE: [GENERAL] complex referential integrity constraints > >The ability to make a foreign key reference a specific partial >unique index (rather than just a set of columns that have a >unique index) would solve many problems of this type. As >another example, you might have a table where one of the >columns is "is_deleted boolean not null". By creating a >partial unique index on the primary key of that table "WHERE >NOT is_deleted" and then pointing a foreign key at it, you >could enforce that each row in the child table references a >parent who isn't deleted. > >However, this would break down when there's more one than >intermediate step involved. For example, if you have: > >CREATE TABLE animal_type ( > id serial, > name varchar(60) not null, > is_attacker boolean not null, > primary key (id) >); > >CREATE TABLE animal ( > id serial, > type_id integer not null references animal_type (id), > name varchar(60) not null, > primary key (id) >); > >CREATE TABLE mauling ( > id serial, > attacker_id integer not null references animal (id), > victim_id integer not null references animal (id), > attack_time timestamp with time zone not null, > primary key (id) >); > >It would be easy to enforce the constraint that the attacker >must be an animal of some specific type, but difficult to >enforce the constraint that the attacker must be an animal >whose type, in turn, has a true value for is_attacker. Even worse, I don't you can guarentee that this constraint is enforced at all times. That means, not if you are using triggers. The only option seems using foreign keys and put in a lot of redundant data. >The best idea that I can think of right now to handle multiple >levels of tables is to allow FOREIGN KEY constraints to >references a VIEW, rather than a table. Then you could say: > >CREATE VIEW attackers AS >SELECT a.id FROM animal a, animal_type t WHERE a.type_id = >t.id AND t.attacker; > >...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id). Perhaps "Alban Hertroys" idea solves this problem a little easier. However it lacks the possibility to make quick changes later on (predator is a predator, or you are screwed, no second change). This is not acceptable in problems where such things are decided after object creation or might be changed later on. >This syntax would solve a number of other problems as well, >such as requiring that some record in table A has a parent >either in table P or in table Q. However, I think this would >probably require implementing some kind of materialized view >so that you could actually build an index on the view, and >that opens up a whole new can of worms, because it's not very >difficult to define a view that is costly to update incrementally. You don't need a materialized view to put a database to its knees. You can already do that today, with ease. I wouldn't worry too much about that. If you mean from a syntax I suggested I do not believe it's the 'right' way to define an (materialized) view, rather use a trigger-like style of system. There are some other issues, however. >The problem is really that there is a pretty large gap between >writing a foreign key constraint, which is trivial, and >enforcing a constraint using triggers, which is quite a bit >more complex (and therefore, easy to screw up), because the >foreign key automatically handles all the cases (insert into >child table, update of child table, update of parent table, >delete from parent table) whereas with triggers you have to >address each of those cases individually. Exactly, that is why I suggested such a system. If its not easy to enforce constraints, it will never happen properly. Especially if problems get more complex. >Unfortunately, >something tells me that implementing a more powerful system >for foreign key constraints is a non-trivial project, however >useful it would be. >Still, I'd love to see it in the TODO file, too. Me too, I get the impression that SQL is too weak for most constraints. - Joris >...Robert > >-----Original Message----- >From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl] >Sent: Thursday, February 22, 2007 8:03 AM >To: Robert Haas; elein >Cc: pgsql-general@postgresql.org >Subject: RE: [GENERAL] complex referential integrity constraints > >I partially agree: >If people CAN do stupid things, they are 'clever' enough to >find a way to actually do it. I've seen them destroy things, >by just using a system in a way it was not intended. They >effectively found a way to blow away the very thing that part >was designed for. >But indeed, it's a lot of work, especially if the number of >tables that must be referenced increases. I'm a strong >supporter for ensuring consistency. Postgres has what it takes >to do the job, but it doesn't make my life a lot easier. But >it seems to be as good as it gets today... > >Perhaps we should rather define a 'database' constraint in the >order of: >"For every mauling, the attacking animal must be of the attacker type" >(in a computer understandable manner). From the set theory >this should be possible without too much problems, However >doing so efficiently might be slightly harder. >This might be a fun project and useful for the TODO list. At >least it makes it a lot easier (and maintanable) to enforce >database-wide constraints. > >- Joris > >>-----Original Message----- >>From: Robert Haas [mailto:Robert.Haas@dyntek.com] >>Sent: woensdag 21 februari 2007 3:37 >>To: Joris Dobbelsteen; elein >>Cc: pgsql-general@postgresql.org >>Subject: RE: [GENERAL] complex referential integrity constraints >> >>Yes, exactly. And while you might not care about all of >those (e.g. I >>care about the first two but am not worried about the third >one because >>I'm the only one who will ever update that table), writing multiple >>triggers to enforce each constraint of this type quickly gets old if >>there are even a few of them. >> It is exponentially harder to write a constraint of this >type than it >>is to write a simple foreign key constraint. >> >>...Robert >> >>-----Original Message----- >>From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl] >>Sent: Monday, February 19, 2007 5:59 AM >>To: elein; Robert Haas >>Cc: pgsql-general@postgresql.org >>Subject: RE: [GENERAL] complex referential integrity constraints >> >>>Why don't you add a field in animal_types that is boolean mauler. >>>Then you can add a trigger on the mauling table to raise an >>error when >>>the attacker_id is an animal type mauler. >> >>This is only partial. You need a lot more triggers to guarentee the >>constraints are enforced. >>Precisely you need to validate: >>* mauling on insert/update of attacker_id >>* animal on update of type_id >>* animal_type on update of your property >> >>Of course you need to think about the MVCC model, such that: >>Transaction 1 executes >>INSERT INTO mauling VALUES ('someattacker'), Transaction 2 executes >>UPDATE animal_type SET mauler = false WHERE name = >'someattacker', such >>that both transaction happen in parallel. >> >>This is perfectly possible and will make it possible to violate the >>constraint, UNLESS locking of the tuples is done correctly. >> >>These contraints are not trivial to implement (unfortunally). >>It would be great if they where. >> >>- Joris >> >
On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote: > Even worse, I don't you can guarentee that this constraint is enforced > at all times. That means, not if you are using triggers. > The only option seems using foreign keys and put in a lot of redundant > data. Err, foreign keys are implemented using triggers, so this statement is self-contradictary. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
>-----Original Message----- >From: pgsql-general-owner@postgresql.org >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of >Martijn van Oosterhout >Sent: donderdag 22 februari 2007 18:17 >To: Joris Dobbelsteen >Cc: Robert Haas; pgsql-general@postgresql.org >Subject: Re: [GENERAL] complex referential integrity constraints > >On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote: >> Even worse, I don't you can guarentee that this constraint >is enforced >> at all times. That means, not if you are using triggers. >> The only option seems using foreign keys and put in a lot of >redundant >> data. > >Err, foreign keys are implemented using triggers, so this >statement is self-contradictary. Are you really sure they are executed under the same visibility rules? - Joris
On Thu, 22 Feb 2007, Joris Dobbelsteen wrote: > >-----Original Message----- > >From: pgsql-general-owner@postgresql.org > >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of > >Martijn van Oosterhout > >Sent: donderdag 22 februari 2007 18:17 > >To: Joris Dobbelsteen > >Cc: Robert Haas; pgsql-general@postgresql.org > >Subject: Re: [GENERAL] complex referential integrity constraints > > > >On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote: > >> Even worse, I don't you can guarentee that this constraint > >is enforced > >> at all times. That means, not if you are using triggers. > >> The only option seems using foreign keys and put in a lot of > >redundant > >> data. > > > >Err, foreign keys are implemented using triggers, so this > >statement is self-contradictary. > > Are you really sure they are executed under the same visibility rules? IIRC, the ri triggers use calls that you aren't able to get at in triggers written in any of the PLs, but I think you should be able to replicate the feat in a trigger written in C.
On Thu, Feb 22, 2007 at 06:51:49PM +0100, Joris Dobbelsteen wrote: > >Err, foreign keys are implemented using triggers, so this > >statement is self-contradictary. > > Are you really sure they are executed under the same visibility rules? Reasonably. I have no idea what visibility rules would make any difference at all. AIUI a foreign key just takes a shared lock on the referenced row and all the magic of MVCC makes sure the row exists when the transaction completes. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
The idea here is that a wolf can attack a sheep, or a wolf can attack another wolf, but sheep can't attack anything. I suppose I could list each wolf in both the predator and prey tables, but that seems a bit duplicative (and causes other problems). ...Robert -----Original Message----- From: David Fetter [mailto:david@fetter.org] Sent: Monday, February 19, 2007 1:04 PM To: Robert Haas Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote: > So, I have the following problem. > > Suppose you have two kinds of animals, sheep and wolves. Since they > have very similar properties, you create a single table to hold both > kinds of animals, and an animal_type table to specify the type of each > animal: > > CREATE TABLE animal_type ( > id integer not null, > name varchar(80) not null, > primary key (id) > ); > INSERT INTO animal_type VALUES (1, 'Sheep'); > INSERT INTO animal_type VALUES (2, 'Wolf'); > > CREATE TABLE animal ( > id serial, > type_id integer not null references animal_type (id), > name varchar(80) not null, > age integer not null, > weight_in_pounds integer not null, > primary key (id) > ); > > The animal_type table is more or less written in stone, but the animal > table will be updated frequently. Now, let's suppose that we want to > keep track of all of the cases where one animal is mauled by another > animal: > > CREATE TABLE mauling ( > id serial, > attacker_id integer not null references animal (id), > victim_id integer not null references animal (id), > attack_time timestamp not null, > primary key (id) > ); > > The problem with this is that I have a very unsettled feeling about the > foreign key constraints on this table. The victim_id constraint is > fine, but the attacker_id constraint is really inadequate, because the > attacker CAN NEVER BE A SHEEP. I really want a way to write a > constraint that says that the attacker must be an animal, but > specifically, a wolf. > > It would be really nice to be able to write: > > FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id) > > Or: > > CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2; > -- and then > FOREIGN KEY (attacker_id) REFERENCES INDEX wolves > > ...but that's entirely speculative syntax. I don't think there's any > easy way to do this. (Please tell me I'm wrong.) > > The problem really comes in when people start modifying the animal > table. Every once in a while we have a case where we record something > as a wolf, but it turns out to have been a sheep in wolf's clothing. In > this case, we want to do something like this: > > UPDATE animal SET type_id = 1 WHERE id = 572; > > HOWEVER, this operation MUST NOT be allowed if it turns out there is a > row in the mauling table where attacker_id = 572, because that would > violate my integrity constraints that says that sheep do not maul. > > Any suggestions? I've thought about creating rules or triggers to check > the conditions, but I'm scared that this could either (a) get really > complicated when there are a lot more tables and constraints involved or > (b) introduce race conditions. > > Thanks, > > ...Robert I'd do something like this: CREATE TABLE animal_type ( animal_name TEXT PRIMARY KEY, CHECK(animal_name = trim(animal_name)) ); /* Only one of {Wolf,wolf} can be in the table. */ CREATE UNIQUE INDEX just_one_animal_name ON animal_type(LOWER(animal_name)); CREATE TABLE predator ( animal_name TEXT NOT NULL REFERENCES animal_type(animal_name) ON DELETE CASCADE, PRIMARY KEY(animal_name) ); CREATE TABLE prey ( animal_name TEXT NOT NULL REFERENCES animal_type(animal_name) ON DELETE CASCADE, PRIMARY KEY(animal_name) ); CREATE TABLE mauling ( id SERIAL PRIMARY KEY, attacker_id INTEGER NOT NULL REFERENCES predator (animal_type_id), victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id), attack_time TIMESTAMP WITH TIME ZONE NOT NULL ); Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
The ability to make a foreign key reference a specific partial unique index (rather than just a set of columns that have a unique index) would solve many problems of this type. As another example, you might have a table where one of the columns is "is_deleted boolean not null". By creating a partial unique index on the primary key of that table "WHERE NOT is_deleted" and then pointing a foreign key at it, you could enforce that each row in the child table references a parent who isn't deleted. However, this would break down when there's more one than intermediate step involved. For example, if you have: CREATE TABLE animal_type ( id serial, name varchar(60) not null, is_attacker boolean not null, primary key (id) ); CREATE TABLE animal ( id serial, type_id integer not null references animal_type (id), name varchar(60) not null, primary key (id) ); CREATE TABLE mauling ( id serial, attacker_id integer not null references animal (id), victim_id integer not null references animal (id), attack_time timestamp with time zone not null, primary key (id) ); It would be easy to enforce the constraint that the attacker must be an animal of some specific type, but difficult to enforce the constraint that the attacker must be an animal whose type, in turn, has a true value for is_attacker. The best idea that I can think of right now to handle multiple levels of tables is to allow FOREIGN KEY constraints to references a VIEW, rather than a table. Then you could say: CREATE VIEW attackers AS SELECT a.id FROM animal a, animal_type t WHERE a.type_id = t.id AND t.attacker; ...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id). This syntax would solve a number of other problems as well, such as requiring that some record in table A has a parent either in table P or in table Q. However, I think this would probably require implementing some kind of materialized view so that you could actually build an index on the view, and that opens up a whole new can of worms, because it's not very difficult to define a view that is costly to update incrementally. The problem is really that there is a pretty large gap between writing a foreign key constraint, which is trivial, and enforcing a constraint using triggers, which is quite a bit more complex (and therefore, easy to screw up), because the foreign key automatically handles all the cases (insert into child table, update of child table, update of parent table, delete from parent table) whereas with triggers you have to address each of those cases individually. Unfortunately, something tells me that implementing a more powerful system for foreign key constraints is a non-trivial project, however useful it would be. Still, I'd love to see it in the TODO file, too. ...Robert -----Original Message----- From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl] Sent: Thursday, February 22, 2007 8:03 AM To: Robert Haas; elein Cc: pgsql-general@postgresql.org Subject: RE: [GENERAL] complex referential integrity constraints I partially agree: If people CAN do stupid things, they are 'clever' enough to find a way to actually do it. I've seen them destroy things, by just using a system in a way it was not intended. They effectively found a way to blow away the very thing that part was designed for. But indeed, it's a lot of work, especially if the number of tables that must be referenced increases. I'm a strong supporter for ensuring consistency. Postgres has what it takes to do the job, but it doesn't make my life a lot easier. But it seems to be as good as it gets today... Perhaps we should rather define a 'database' constraint in the order of: "For every mauling, the attacking animal must be of the attacker type" (in a computer understandable manner). From the set theory this should be possible without too much problems, However doing so efficiently might be slightly harder. This might be a fun project and useful for the TODO list. At least it makes it a lot easier (and maintanable) to enforce database-wide constraints. - Joris >-----Original Message----- >From: Robert Haas [mailto:Robert.Haas@dyntek.com] >Sent: woensdag 21 februari 2007 3:37 >To: Joris Dobbelsteen; elein >Cc: pgsql-general@postgresql.org >Subject: RE: [GENERAL] complex referential integrity constraints > >Yes, exactly. And while you might not care about all of those >(e.g. I care about the first two but am not worried about the >third one because I'm the only one who will ever update that >table), writing multiple triggers to enforce each constraint >of this type quickly gets old if there are even a few of them. > It is exponentially harder to write a constraint of this type >than it is to write a simple foreign key constraint. > >...Robert > >-----Original Message----- >From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl] >Sent: Monday, February 19, 2007 5:59 AM >To: elein; Robert Haas >Cc: pgsql-general@postgresql.org >Subject: RE: [GENERAL] complex referential integrity constraints > >>Why don't you add a field in animal_types that is boolean mauler. >>Then you can add a trigger on the mauling table to raise an >error when >>the attacker_id is an animal type mauler. > >This is only partial. You need a lot more triggers to >guarentee the constraints are enforced. >Precisely you need to validate: >* mauling on insert/update of attacker_id >* animal on update of type_id >* animal_type on update of your property > >Of course you need to think about the MVCC model, such that: >Transaction 1 executes >INSERT INTO mauling VALUES ('someattacker'), Transaction 2 >executes UPDATE animal_type SET mauler = false WHERE name = >'someattacker', such that both transaction happen in parallel. > >This is perfectly possible and will make it possible to >violate the constraint, UNLESS locking of the tuples is done correctly. > >These contraints are not trivial to implement (unfortunally). >It would be great if they where. > >- Joris >
Yes, exactly. And while you might not care about all of those (e.g. I care about the first two but am not worried about the third one because I'm the only one who will ever update that table), writing multiple triggers to enforce each constraint of this type quickly gets old if there are even a few of them. It is exponentially harder to write a constraint of this type than it is to write a simple foreign key constraint. ...Robert -----Original Message----- From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl] Sent: Monday, February 19, 2007 5:59 AM To: elein; Robert Haas Cc: pgsql-general@postgresql.org Subject: RE: [GENERAL] complex referential integrity constraints >Why don't you add a field in animal_types that is boolean mauler. >Then you can add a trigger on the mauling table to raise an >error when the attacker_id is an animal type mauler. This is only partial. You need a lot more triggers to guarentee the constraints are enforced. Precisely you need to validate: * mauling on insert/update of attacker_id * animal on update of type_id * animal_type on update of your property Of course you need to think about the MVCC model, such that: Transaction 1 executes INSERT INTO mauling VALUES ('someattacker'), Transaction 2 executes UPDATE animal_type SET mauler = false WHERE name = 'someattacker', such that both transaction happen in parallel. This is perfectly possible and will make it possible to violate the constraint, UNLESS locking of the tuples is done correctly. These contraints are not trivial to implement (unfortunally). It would be great if they where. - Joris
>-----Original Message----- >From: Martijn van Oosterhout [mailto:kleptog@svana.org] >Sent: donderdag 22 februari 2007 23:15 >To: Joris Dobbelsteen >Cc: Robert Haas; pgsql-general@postgresql.org >Subject: Re: [GENERAL] complex referential integrity constraints > >On Thu, Feb 22, 2007 at 06:51:49PM +0100, Joris Dobbelsteen wrote: >> >Err, foreign keys are implemented using triggers, so this statement >> >is self-contradictary. >> >> Are you really sure they are executed under the same >visibility rules? > >Reasonably. I have no idea what visibility rules would make >any difference at all. AIUI a foreign key just takes a shared >lock on the referenced row and all the magic of MVCC makes >sure the row exists when the transaction completes. Try this: (sorry for any typo's in SQL, if they exist) CREATE TABLE a (val integer NOT NULL PRIMARY KEY); CREATE TABLE b (val integer NOT NULL, val2 integer NOT NULL, PRIMARY KEY (val, val2); -- we will be doing foreign key ourselves INSERT INTO a VALUES (1); INSERT INTO a VALUES (2); INSERT INTO a VALUES (1,1); INSERT INTO a VALUES (2,2); -- Now two transaction (T1, T2) T1: BEGIN ISOLATION LEVEL SERIALIZABLE; T2: BEGIN ISOLATION LEVEL SERIALIZABLE; -- Lets see what we have got. T1: SELECT * FROM a; T1: SELECT * FROM b; T2: SELECT * FROM a; T2: SELECT * FROM b; -- lets insert something... T2: INSERT INTO a VALUES (2,100); -- results in a lock being acquired T2: SELECT 1 FROM a x WHERE val = 2 FOR SHARE ON x; -- this is your lock -- Ok, done for now... T2: COMMIT; -- now the lock is gone -- This means T1 doesn't see the row, right? T1: SELECT * FROM b; -- now lets delete T1: DELETE FROM a WHERE val = 2; -- on cascade delete, thus: T1: DELETE FROM b WHERE val = 2; -- won't see new tuple (serializable isolation) T1: COMMIT; SELECT * FROM b; val val2 2 100 Sorry, constraint wasn't enforced ;) It does matter. Now try it with this: CREATE TABLE b (val integer NOT NULL, val2 integer NOT NULL, PRIMARY KEY (val, val2), FOREIGN KEY val REFERENCES a(val) ON UPDATE CASCADE ON DELETE CASCADE); That won't inhibit this behaviour, but proberly enforces the constraint (as one would have expected). I believe T2 will abort as in the manual. Your statement might be correct, but it doesn't take enough account of how the visibility rules under MVCC are played. It seems the foreign keys (as well as primary keys) have there rules applied differently, they see that row and will cause an abort. - Joris
>-----Original Message----- >From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com] >Sent: donderdag 22 februari 2007 23:13 >To: Joris Dobbelsteen >Cc: Martijn van Oosterhout; Robert Haas; pgsql-general@postgresql.org >Subject: Re: [GENERAL] complex referential integrity constraints > >On Thu, 22 Feb 2007, Joris Dobbelsteen wrote: > >> >-----Original Message----- >> >From: pgsql-general-owner@postgresql.org >> >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of >Martijn van >> >Oosterhout >> >Sent: donderdag 22 februari 2007 18:17 >> >To: Joris Dobbelsteen >> >Cc: Robert Haas; pgsql-general@postgresql.org >> >Subject: Re: [GENERAL] complex referential integrity constraints >> > >> >On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote: >> >> Even worse, I don't you can guarentee that this constraint >> >is enforced >> >> at all times. That means, not if you are using triggers. >> >> The only option seems using foreign keys and put in a lot of >> >redundant >> >> data. >> > >> >Err, foreign keys are implemented using triggers, so this statement >> >is self-contradictary. >> >> Are you really sure they are executed under the same >visibility rules? > >IIRC, the ri triggers use calls that you aren't able to get at >in triggers written in any of the PLs, but I think you should >be able to replicate the feat in a trigger written in C. Why they never did that? No need or no pratical situations where it went wrong? IMHO it should become possible to expose this functionality to the PL languages or as part of the trigger system? The current system can be shown to be underpowered to enforce constraints. It seems a bit cumbersome to have C functions for this purpose. However I must admit that looking through the postgres code doesn't make it much clearer to me whats actually going for these kind of tricky problems... - Joris
On Fri, Feb 23, 2007 at 12:41:25AM +0100, Joris Dobbelsteen wrote: > >Reasonably. I have no idea what visibility rules would make > >any difference at all. AIUI a foreign key just takes a shared > >lock on the referenced row and all the magic of MVCC makes > >sure the row exists when the transaction completes. > > Try this: > (sorry for any typo's in SQL, if they exist) <snip> Well, I took a look at the RI code and the only stuff I saw that looked interesting was this: utils/adt/ri_triggers.c: if (IsXactIsoLevelSerializable && detectNewRows) { CommandCounterIncrement(); /* be sure all my own work is visible */ test_snapshot = CopySnapshot(GetLatestSnapshot()); crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot()); } It then proceeds to use that snapshot to execute the query to get the share lock. It's probably true that other PL's can't do this directly. Not sure how to deal with that. I got confused because I thought the first version of RI did use straight pl/pgsql functions, so I thought that was enough. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Robert Haas wrote: > The idea here is that a wolf can attack a sheep, or a wolf can attack > another wolf, but sheep can't attack anything. I suppose I could list > each wolf in both the predator and prey tables, but that seems a bit > duplicative (and causes other problems). > > ...Robert I'm quite certain a wolf is much more likely to attack a sheep than to attack another wolf, and even more unlikely to attack for example a lion. It seems to me that just the fact that it can isn't enough information. It looks like you need "weighted constraints"; there's 0 chance that a sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep, >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact numbers will vary, and I have absolutely no idea what they would be like. It probably requires some kind of ranking system that adjusts according to the known animals and their likelihood to attack eachother. I'm pretty sure you can't get this done without defining some triggers. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys wrote: > Robert Haas wrote: > > The idea here is that a wolf can attack a sheep, or a wolf can attack > > another wolf, but sheep can't attack anything. I suppose I could list > > each wolf in both the predator and prey tables, but that seems a bit > > duplicative (and causes other problems). > > I'm quite certain a wolf is much more likely to attack a sheep than to > attack another wolf, and even more unlikely to attack for example a > lion. It seems to me that just the fact that it can isn't enough > information. > > It looks like you need "weighted constraints"; there's 0 chance that a > sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep, > >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact > numbers will vary, and I have absolutely no idea what they would be > like. It probably requires some kind of ranking system that adjusts > according to the known animals and their likelihood to attack eachother. Depending on what you're modelling, even this could be too simple -- for example, while a single wolf is unlikely to attack a lion, a pack of wolves have a lot more probability of doing so. Do you keep packs of wolves in your barn? If so, watch your lions. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>-----Original Message----- >From: Martijn van Oosterhout [mailto:kleptog@svana.org] >Sent: vrijdag 23 februari 2007 9:50 >To: Joris Dobbelsteen >Cc: pgsql-general@postgresql.org >Subject: Re: [GENERAL] complex referential integrity constraints > >On Fri, Feb 23, 2007 at 12:41:25AM +0100, Joris Dobbelsteen wrote: >> >Reasonably. I have no idea what visibility rules would make any >> >difference at all. AIUI a foreign key just takes a shared >lock on the >> >referenced row and all the magic of MVCC makes sure the row exists >> >when the transaction completes. >> >> Try this: >> (sorry for any typo's in SQL, if they exist) > ><snip> > >Well, I took a look at the RI code and the only stuff I saw >that looked interesting was this: > >utils/adt/ri_triggers.c: > if (IsXactIsoLevelSerializable && detectNewRows) > { > CommandCounterIncrement(); /* be >sure all my own work is visible */ > test_snapshot = CopySnapshot(GetLatestSnapshot()); > crosscheck_snapshot = >CopySnapshot(GetTransactionSnapshot()); > } > >It then proceeds to use that snapshot to execute the query to >get the share lock. > >It's probably true that other PL's can't do this directly. Not >sure how to deal with that. I got confused because I thought >the first version of RI did use straight pl/pgsql functions, >so I thought that was enough. You got it right... /* * SPI_execute_snapshot -- identical to SPI_execute_plan, except that we allow * the caller to specify exactly which snapshots to use. This is currently * not documented in spi.sgml because it is only intended for use by RI * triggers. * * Passing snapshot == InvalidSnapshot will select the normal behavior of * fetching a new snapshot for each query. */ int SPI_execute_snapshot(void *plan, Datum *Values, const char *Nulls, Snapshot snapshot, Snapshot crosscheck_snapshot, bool read_only, long tcount) They got the point right: only intended for use by RI triggers. That's exactly the type I'm trying to build ;) They are exposed to the C versions (its in include/executor/spi.h), but to me it looks a bit cumbersome to have triggers written in C. What would be a good way to expose this to normal PL triggers? Since this would open a new set of possibilities... As part of a "create trigger ... for referencial integrity"? As an extension to a statement? Special construct in the languages? - Joris
On Fri, 23 Feb 2007, Joris Dobbelsteen wrote: > >-----Original Message----- > >From: Martijn van Oosterhout [mailto:kleptog@svana.org] > >Sent: vrijdag 23 februari 2007 9:50 > >To: Joris Dobbelsteen > >Cc: pgsql-general@postgresql.org > >Subject: Re: [GENERAL] complex referential integrity constraints > > > >On Fri, Feb 23, 2007 at 12:41:25AM +0100, Joris Dobbelsteen wrote: > >> >Reasonably. I have no idea what visibility rules would make any > >> >difference at all. AIUI a foreign key just takes a shared > >lock on the > >> >referenced row and all the magic of MVCC makes sure the row exists > >> >when the transaction completes. > >> > >> Try this: > >> (sorry for any typo's in SQL, if they exist) > > > ><snip> > > > >Well, I took a look at the RI code and the only stuff I saw > >that looked interesting was this: > > > >utils/adt/ri_triggers.c: > > if (IsXactIsoLevelSerializable && detectNewRows) > > { > > CommandCounterIncrement(); /* be > >sure all my own work is visible */ > > test_snapshot = CopySnapshot(GetLatestSnapshot()); > > crosscheck_snapshot = > >CopySnapshot(GetTransactionSnapshot()); > > } > > > >It then proceeds to use that snapshot to execute the query to > >get the share lock. > > > >It's probably true that other PL's can't do this directly. Not > >sure how to deal with that. I got confused because I thought > >the first version of RI did use straight pl/pgsql functions, > >so I thought that was enough. > > You got it right... > > /* > * SPI_execute_snapshot -- identical to SPI_execute_plan, except that we > allow > * the caller to specify exactly which snapshots to use. This is > currently > * not documented in spi.sgml because it is only intended for use by RI > * triggers. > * > * Passing snapshot == InvalidSnapshot will select the normal behavior > of > * fetching a new snapshot for each query. > */ > int > SPI_execute_snapshot(void *plan, > Datum *Values, const char > *Nulls, > Snapshot snapshot, Snapshot > crosscheck_snapshot, > bool read_only, long tcount) > > They got the point right: only intended for use by RI triggers. That's > exactly the type I'm trying to build ;) > They are exposed to the C versions (its in include/executor/spi.h), but > to me it looks a bit cumbersome to have triggers written in C. I was wondering if some sort of generator might work. Something that would take what you're trying to do and generate the triggers for you, but I haven't really worked out what that'd look like. > What would be a good way to expose this to normal PL triggers? Since > this would open a new set of possibilities... > > As part of a "create trigger ... for referencial integrity"? > As an extension to a statement? > Special construct in the languages? I think the first thing to do is to figure out what such triggers need to do. Does such a trigger need to potentially run some queries on the normal snapshot? Does it potentially need different snapshots for different statements or is only one special snapshot sufficient? And other such questions. From there, a -hackers discussion might be meaningful.
On Fri, Feb 23, 2007 at 09:39:52AM -0500, Robert Haas wrote: > Actually, what would be really nice is if there were just a button I > could push that would make all of my data automatically correct. > Can that go into 8.3? Thanks, ...Robert Oh, no problem. Just compile with -ldwim ;) Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote! http://www.postgresql.org/about/donate
to attack eachother. > > Depending on what you're modelling, even this could be too simple -- for > example, while a single wolf is unlikely to attack a lion, a pack of > wolves have a lot more probability of doing so. > > Do you keep packs of wolves in your barn? If so, watch your lions. Well from the previous thread that discussed the use of the <animal> table and sub-set tables <prey> and <preditor>, if a preditor can attach a prey item or preditor item, then a table relation only needs to be created between <preditor> and <animal>. This way only preditors can attack, but they can attach any other animal preditor or prey. Regards, Richard Broersma Jr.
I don't understand what a weighted constraint would mean. Either the attacker_id can be a wolf, or it can't. Knowing that it is only 1% likely over the long haul is insufficient to disallow any particular transaction. It's certainly true that the constraint as stated is insufficient to guarantee that the table will contain good data. For example if we looked at the maulings table and wolves were always mauling other wolves but never sheep, we would naturally want to dig into that a little more and find out why they weren't picking easier targets. But this is neither here nor there, because NO constraint (foreign key, check, or what have you) is ever strong enough to ensure that the data in a table is completely clean. At least as I understand it, the purpose of these constraints is to allow us to write application code which relies on certain basic invariants being true, i.e. so that we can join animal to animal_type and not have to worry about rows dropping out because some animals had an invalid type, or rows getting added because there are two animal_type records with the same id. Besides, the problem as stated is a proxy for some real problem which is part of a non-zoological project the details of which (a) would take too long to explain and (b) should probably not be posted to a public mailing list. :-) So far, the best ideas I've seen have been: (a) Tom Lane's idea of denormalizing by copying the animal type column into the maulings table with ON UPDATE CASCADE, and then adding a CHECK constraint on that column, and (b) Creating a separate table called "wolf" and some triggers that ensure that the wolf table will always contain the subset of IDs from the animal table where the type_id is that of a wolf, with a foreign key constraint from that id column back to animal with "on delete cascade". This ensures that nobody can delete a wolf or change it into a sheep if it has maulings, but permits it otherwise. For what it's worth, I've adopted the latter solution for the present. Unfortunately, it's too much work to do it everywhere it would be nice to have, so I'm just doing it in some really critical cases and hoping that the others don't break. Thanks, ...Robert -----Original Message----- From: Alban Hertroys [mailto:alban@magproductions.nl] Sent: Friday, February 23, 2007 4:02 AM To: Robert Haas Cc: David Fetter; pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints Robert Haas wrote: > The idea here is that a wolf can attack a sheep, or a wolf can attack > another wolf, but sheep can't attack anything. I suppose I could list > each wolf in both the predator and prey tables, but that seems a bit > duplicative (and causes other problems). > > ...Robert I'm quite certain a wolf is much more likely to attack a sheep than to attack another wolf, and even more unlikely to attack for example a lion. It seems to me that just the fact that it can isn't enough information. It looks like you need "weighted constraints"; there's 0 chance that a sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep, >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact numbers will vary, and I have absolutely no idea what they would be like. It probably requires some kind of ranking system that adjusts according to the known animals and their likelihood to attack eachother. I'm pretty sure you can't get this done without defining some triggers. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Actually, what would be really nice is if there were just a button I could push that would make all of my data automatically correct. Can that go into 8.3? Thanks, ...Robert -----Original Message----- From: Alvaro Herrera [mailto:alvherre@commandprompt.com] Sent: Friday, February 23, 2007 9:35 AM To: Alban Hertroys Cc: Robert Haas; David Fetter; pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints Alban Hertroys wrote: > Robert Haas wrote: > > The idea here is that a wolf can attack a sheep, or a wolf can attack > > another wolf, but sheep can't attack anything. I suppose I could list > > each wolf in both the predator and prey tables, but that seems a bit > > duplicative (and causes other problems). > > I'm quite certain a wolf is much more likely to attack a sheep than to > attack another wolf, and even more unlikely to attack for example a > lion. It seems to me that just the fact that it can isn't enough > information. > > It looks like you need "weighted constraints"; there's 0 chance that a > sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep, > >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact > numbers will vary, and I have absolutely no idea what they would be > like. It probably requires some kind of ranking system that adjusts > according to the known animals and their likelihood to attack eachother. Depending on what you're modelling, even this could be too simple -- for example, while a single wolf is unlikely to attack a lion, a pack of wolves have a lot more probability of doing so. Do you keep packs of wolves in your barn? If so, watch your lions. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Robert Haas wrote: > I don't understand what a weighted constraint would mean. Either the > attacker_id can be a wolf, or it can't. Knowing that it is only 1% > likely over the long haul is insufficient to disallow any particular > transaction. Basically I suggested to combine the constraint with a probability. If the probability of one animal attacking another is 0% it can't attack the other animal - that's a strict constraint. The other way around it can, and you'll also immediately know how likely that is to happen. An added bonus is that you can generalize certain constraints. If certain animals are less than - say 25% - likely to attack other certain other animals you could determine that the attacked animal is not in fact prey. An example would probably be wolves attacking other wolves (or predators in general). For relations that depend on an animal being prey, a constraint would be that this number be <25%. In this discussion it is also not entirely defined what attacking means. Is a ram defending his horde from wolves attacking (wolves)? I realise this all started from an analogy to a real problem, so most of this is probably not very relevant to your actual problem. No less interesting, though. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Robert Haas wrote: > I sort of think that this kind of stuff belongs in a separate table > somehow. For example in this case I would want to: I wasn't suggesting otherwise. A constraint is a constraint; whether it involves an extra table or not wasn't really relevant until now (apparently). > CREATE TABLE attack_probability ( > attacker_type_id integer not null references animal_type (id), > victim_type_id integer not null references animal_type (id), > percent_chance integer not null, Personally I would prefer something that can hold decimals, a numeric fe. And I usually add cascading behaviour definitions explicitly. > primary key (attacker_type_id, victim_type_id) > ); > > ...and then declare that the CONSTRAINT on the "maulings" table is that > if I look up the types of the attacker and victim, that pair of types > must be present in the attack_probability table with percent_chance > 0. This was pretty much what I had in mind. Not _exactly_, as I didn't concern myself with the implementation details yet. But it pretty much boils down to what you suggested, yes. > I guess my point here is that I think in your proposal you are letting > domain-specific data creep into the schema. It's the job of the schema > to enforce integrity constraints, but not to know specifically how > things work. The fact (if it is a fact) that the chance of one type of > animal attacking another can be captured as a probability (rather than, > say, one probability for the day time and another probability for the > night time, or one probability for each specific animal rather than each > animal type, or I don't know what's going on and want to infer the > probabilities from the data after I've gathered it) is domain-specific. > I don't really want the information about attack probabilities (or > whatever) to be something that's hardcoded in my schema; I want it to be > part of the data in the schema, with the schema enforcing such > constraints on that data as I may see fit to define. I don't want to > have to select things out of system tables to find out attack > probabilities. Also, as a practical matter, I suspect that such a setup > would result in an absurdly complex constraint language. I was merely pointing out the possibility of such a constraint - maybe "business rule" is more appropriate (but isn't that a constraint as well?). In the original suggestions this wasn't possible, it simply stated that wolves cannot attack lions - which isn't necessarily true. I suppose that's the point I was trying to make in general; the whole who-attacks-who business isn't that black-and-white - more like different shades of gray. In the end you'll have to make a decision about what combinations are possible depending on what you want to get out of your database. It may not be necessary to get into this much detail - or it may. Not for me to say. > ...Robert -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
I sort of think that this kind of stuff belongs in a separate table somehow. For example in this case I would want to: CREATE TABLE attack_probability ( attacker_type_id integer not null references animal_type (id), victim_type_id integer not null references animal_type (id), percent_chance integer not null, primary key (attacker_type_id, victim_type_id) ); ...and then declare that the CONSTRAINT on the "maulings" table is that if I look up the types of the attacker and victim, that pair of types must be present in the attack_probability table with percent_chance > 0. I guess my point here is that I think in your proposal you are letting domain-specific data creep into the schema. It's the job of the schema to enforce integrity constraints, but not to know specifically how things work. The fact (if it is a fact) that the chance of one type of animal attacking another can be captured as a probability (rather than, say, one probability for the day time and another probability for the night time, or one probability for each specific animal rather than each animal type, or I don't know what's going on and want to infer the probabilities from the data after I've gathered it) is domain-specific. I don't really want the information about attack probabilities (or whatever) to be something that's hardcoded in my schema; I want it to be part of the data in the schema, with the schema enforcing such constraints on that data as I may see fit to define. I don't want to have to select things out of system tables to find out attack probabilities. Also, as a practical matter, I suspect that such a setup would result in an absurdly complex constraint language. ...Robert -----Original Message----- From: Alban Hertroys [mailto:alban@magproductions.nl] Sent: Monday, February 26, 2007 4:15 AM To: Robert Haas Cc: David Fetter; pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints Robert Haas wrote: > I don't understand what a weighted constraint would mean. Either the > attacker_id can be a wolf, or it can't. Knowing that it is only 1% > likely over the long haul is insufficient to disallow any particular > transaction. Basically I suggested to combine the constraint with a probability. If the probability of one animal attacking another is 0% it can't attack the other animal - that's a strict constraint. The other way around it can, and you'll also immediately know how likely that is to happen. An added bonus is that you can generalize certain constraints. If certain animals are less than - say 25% - likely to attack other certain other animals you could determine that the attacked animal is not in fact prey. An example would probably be wolves attacking other wolves (or predators in general). For relations that depend on an animal being prey, a constraint would be that this number be <25%. In this discussion it is also not entirely defined what attacking means. Is a ram defending his horde from wolves attacking (wolves)? I realise this all started from an analogy to a real problem, so most of this is probably not very relevant to your actual problem. No less interesting, though. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //