Thread: complex referential integrity constraints

complex referential integrity constraints

From

"Robert Haas"

Date:

18 February 2007, 11:26:43

So, I have the following problem.

Suppose you have two kinds of animals, sheep and wolves.  Since they
have very similar properties, you create a single table to hold both
kinds of animals, and an animal_type table to specify the type of each
animal:

CREATE TABLE animal_type (
    id            integer not null,
    name            varchar(80) not null,
    primary key (id)
);
INSERT INTO animal_type VALUES (1, 'Sheep');
INSERT INTO animal_type VALUES (2, 'Wolf');

CREATE TABLE animal (
    id            serial,
    type_id             integer not null references animal_type (id),
    name            varchar(80) not null,
    age            integer not null,
    weight_in_pounds    integer not null,
    primary key (id)
);

The animal_type table is more or less written in stone, but the animal
table will be updated frequently.  Now, let's suppose that we want to
keep track of all of the cases where one animal is mauled by another
animal:

CREATE TABLE mauling (
    id                  serial,
    attacker_id         integer not null references animal (id),
    victim_id           integer not null references animal (id),
    attack_time         timestamp not null,
    primary key (id)
);

The problem with this is that I have a very unsettled feeling about the
foreign key constraints on this table.  The victim_id constraint is
fine, but the attacker_id constraint is really inadequate, because the
attacker CAN NEVER BE A SHEEP.  I really want a way to write a
constraint that says that the attacker must be an animal, but
specifically, a wolf.

It would be really nice to be able to write:

FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)

Or:

CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
-- and then
FOREIGN KEY (attacker_id) REFERENCES INDEX wolves

...but that's entirely speculative syntax.  I don't think there's any
easy way to do this.  (Please tell me I'm wrong.)

The problem really comes in when people start modifying the animal
table.  Every once in a while we have a case where we record something
as a wolf, but it turns out to have been a sheep in wolf's clothing.  In
this case, we want to do something like this:

UPDATE animal SET type_id = 1 WHERE id = 572;

HOWEVER, this operation MUST NOT be allowed if it turns out there is a
row in the mauling table where attacker_id = 572, because that would
violate my integrity constraints that says that sheep do not maul.

Any suggestions?  I've thought about creating rules or triggers to check
the conditions, but I'm scared that this could either (a) get really
complicated when there are a lot more tables and constraints involved or
(b) introduce race conditions.

Thanks,

...Robert

Re: complex referential integrity constraints

From

Tom Lane

Date:

18 February 2007, 17:08:28

"Robert Haas" <Robert.Haas@dyntek.com> writes:
> ... The problem with this is that I have a very unsettled feeling about the
> foreign key constraints on this table.  The victim_id constraint is
> fine, but the attacker_id constraint is really inadequate, because the
> attacker CAN NEVER BE A SHEEP.

I think the only way to do this in SQL is to denormalize a bit.  If you
copy the animal_type field into the maulings table then you can apply a
check constraint there.  So

 FOREIGN KEY (attacker_id, attacker_type_id) REFERENCES animal (id, type_id)
 ON UPDATE CASCADE

 CHECK (attacker_type_id != 'sheep')

The thing that's still a bit annoying is that you'd have to hard-wire
the numerical code for SHEEP into the check constraint; you couldn't
really write it symbolically as I did above.  Perhaps you should further
denormalize and keep real animal type names not codes in the animal
type table, thus

CREATE TABLE animal_type (
    name            varchar(80) primary key
);

CREATE TABLE animal (
    id            serial,
    type                varchar(80) references animal_type,
    ...
);

whereupon the maulings table also has real type names not IDs.

No doubt some relational-theory maven will come along and slap your
wrist for doing this, but he should first explain how to do it without
denormalization...

Also, I think what you've really done here is created a "poor man's
enum".  There will probably be real enum types in PG 8.3, which would
offer a more efficient solution to the problem of representing animal
types.

            regards, tom lane

Re: complex referential integrity constraints

From

elein

Date:

18 February 2007, 21:14:56

On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:
> So, I have the following problem.
>
> Suppose you have two kinds of animals, sheep and wolves.  Since they
> have very similar properties, you create a single table to hold both
> kinds of animals, and an animal_type table to specify the type of each
> animal:
>
> CREATE TABLE animal_type (
>     id            integer not null,
>     name            varchar(80) not null,
>     primary key (id)
> );
> INSERT INTO animal_type VALUES (1, 'Sheep');
> INSERT INTO animal_type VALUES (2, 'Wolf');
>
> CREATE TABLE animal (
>     id            serial,
>     type_id             integer not null references animal_type (id),
>     name            varchar(80) not null,
>     age            integer not null,
>     weight_in_pounds    integer not null,
>     primary key (id)
> );
>
> The animal_type table is more or less written in stone, but the animal
> table will be updated frequently.  Now, let's suppose that we want to
> keep track of all of the cases where one animal is mauled by another
> animal:
>
> CREATE TABLE mauling (
>     id                  serial,
>     attacker_id         integer not null references animal (id),
>     victim_id           integer not null references animal (id),
>     attack_time         timestamp not null,
>     primary key (id)
> );
>
> The problem with this is that I have a very unsettled feeling about the
> foreign key constraints on this table.  The victim_id constraint is
> fine, but the attacker_id constraint is really inadequate, because the
> attacker CAN NEVER BE A SHEEP.  I really want a way to write a
> constraint that says that the attacker must be an animal, but
> specifically, a wolf.
>
> It would be really nice to be able to write:
>
> FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)
>
> Or:
>
> CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
> -- and then
> FOREIGN KEY (attacker_id) REFERENCES INDEX wolves
>
> ...but that's entirely speculative syntax.  I don't think there's any
> easy way to do this.  (Please tell me I'm wrong.)
>
> The problem really comes in when people start modifying the animal
> table.  Every once in a while we have a case where we record something
> as a wolf, but it turns out to have been a sheep in wolf's clothing.  In
> this case, we want to do something like this:
>
> UPDATE animal SET type_id = 1 WHERE id = 572;
>
> HOWEVER, this operation MUST NOT be allowed if it turns out there is a
> row in the mauling table where attacker_id = 572, because that would
> violate my integrity constraints that says that sheep do not maul.
>
> Any suggestions?  I've thought about creating rules or triggers to check
> the conditions, but I'm scared that this could either (a) get really
> complicated when there are a lot more tables and constraints involved or
> (b) introduce race conditions.

Why don't you add a field in animal_types that is boolean mauler.
Then you can add a trigger on the mauling table to raise an error
when the attacker_id is an animal type mauler.

--elein

Re: complex referential integrity constraints

From

Alban Hertroys

Date:

19 February 2007, 08:12:58

Robert Haas wrote:
> So, I have the following problem.
>
> Suppose you have two kinds of animals, sheep and wolves.  Since they
> have very similar properties, you create a single table to hold both
> kinds of animals, and an animal_type table to specify the type of each
> animal:
>
> CREATE TABLE animal_type (
>     id            integer not null,
>     name            varchar(80) not null,
>     primary key (id)
> );
> INSERT INTO animal_type VALUES (1, 'Sheep');
> INSERT INTO animal_type VALUES (2, 'Wolf');
>
> CREATE TABLE animal (
>     id            serial,
>     type_id             integer not null references animal_type (id),
>     name            varchar(80) not null,
>     age            integer not null,
>     weight_in_pounds    integer not null,
>     primary key (id)
> );
>
> The animal_type table is more or less written in stone, but the animal
> table will be updated frequently.  Now, let's suppose that we want to
> keep track of all of the cases where one animal is mauled by another
> animal:

I kind of get the feeling that you'd want it like this:

CREATE TABLE predator (
...
) INHERITS animal;

And then put your foreign key constraints from predator to mauling.

You may want to be more accurate about what kind of animals sheep are as
well.

I haven't really given this much thought though, I'm just quickly
reading my mail before starting work ;)

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: complex referential integrity constraints

From

"Joris Dobbelsteen"

Date:

19 February 2007, 09:58:16

>-----Original Message-----
>From: pgsql-general-owner@postgresql.org
>[mailto:pgsql-general-owner@postgresql.org] On Behalf Of elein
>Sent: zondag 18 februari 2007 23:16
>To: Robert Haas
>Cc: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] complex referential integrity constraints
>
>On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:
>> So, I have the following problem.
>>
>> Suppose you have two kinds of animals, sheep and wolves.  Since they
>> have very similar properties, you create a single table to hold both
>> kinds of animals, and an animal_type table to specify the
>type of each
>> animal:
>>
>> CREATE TABLE animal_type (
>>     id            integer not null,
>>     name            varchar(80) not null,
>>     primary key (id)
>> );
>> INSERT INTO animal_type VALUES (1, 'Sheep'); INSERT INTO animal_type
>> VALUES (2, 'Wolf');
>>
>> CREATE TABLE animal (
>>     id            serial,
>>     type_id             integer not null references
>animal_type (id),
>>     name            varchar(80) not null,
>>     age            integer not null,
>>     weight_in_pounds    integer not null,
>>     primary key (id)
>> );
>>
>> The animal_type table is more or less written in stone, but
>the animal
>> table will be updated frequently.  Now, let's suppose that
>we want to
>> keep track of all of the cases where one animal is mauled by another
>> animal:
>>
>> CREATE TABLE mauling (
>>     id                  serial,
>>     attacker_id         integer not null references animal (id),
>>     victim_id           integer not null references animal (id),
>>     attack_time         timestamp not null,
>>     primary key (id)
>> );
>>
>> The problem with this is that I have a very unsettled feeling about
>> the foreign key constraints on this table.  The victim_id constraint
>> is fine, but the attacker_id constraint is really
>inadequate, because
>> the attacker CAN NEVER BE A SHEEP.  I really want a way to write a
>> constraint that says that the attacker must be an animal, but
>> specifically, a wolf.
>>
>> It would be really nice to be able to write:
>>
>> FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)
>>
>> Or:
>>
>> CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
>> -- and then
>> FOREIGN KEY (attacker_id) REFERENCES INDEX wolves
>>
>> ...but that's entirely speculative syntax.  I don't think
>there's any
>> easy way to do this.  (Please tell me I'm wrong.)
>>
>> The problem really comes in when people start modifying the animal
>> table.  Every once in a while we have a case where we record
>something
>> as a wolf, but it turns out to have been a sheep in wolf's
>clothing.
>> In this case, we want to do something like this:
>>
>> UPDATE animal SET type_id = 1 WHERE id = 572;
>>
>> HOWEVER, this operation MUST NOT be allowed if it turns out
>there is a
>> row in the mauling table where attacker_id = 572, because that would
>> violate my integrity constraints that says that sheep do not maul.
>>
>> Any suggestions?  I've thought about creating rules or triggers to
>> check the conditions, but I'm scared that this could either (a) get
>> really complicated when there are a lot more tables and constraints
>> involved or
>> (b) introduce race conditions.
>
>Why don't you add a field in animal_types that is boolean mauler.
>Then you can add a trigger on the mauling table to raise an
>error when the attacker_id is an animal type mauler.

This is only partial. You need a lot more triggers to guarentee the
constraints are enforced.
Precisely you need to validate:
* mauling on insert/update of attacker_id
* animal on update of type_id
* animal_type on update of your property

Of course you need to think about the MVCC model, such that:
Transaction 1 executes
INSERT INTO mauling VALUES ('someattacker'),
Transaction 2 executes
UPDATE animal_type SET mauler = false WHERE name = 'someattacker',
such that both transaction happen in parallel.

This is perfectly possible and will make it possible to violate the
constraint, UNLESS locking of the tuples is done correctly.

These contraints are not trivial to implement (unfortunally). It would
be great if they where.

- Joris

Re: complex referential integrity constraints

From

David Fetter

Date:

19 February 2007, 17:04:02

On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:
> So, I have the following problem.
>
> Suppose you have two kinds of animals, sheep and wolves.  Since they
> have very similar properties, you create a single table to hold both
> kinds of animals, and an animal_type table to specify the type of each
> animal:
>
> CREATE TABLE animal_type (
>     id            integer not null,
>     name            varchar(80) not null,
>     primary key (id)
> );
> INSERT INTO animal_type VALUES (1, 'Sheep');
> INSERT INTO animal_type VALUES (2, 'Wolf');
>
> CREATE TABLE animal (
>     id            serial,
>     type_id             integer not null references animal_type (id),
>     name            varchar(80) not null,
>     age            integer not null,
>     weight_in_pounds    integer not null,
>     primary key (id)
> );
>
> The animal_type table is more or less written in stone, but the animal
> table will be updated frequently.  Now, let's suppose that we want to
> keep track of all of the cases where one animal is mauled by another
> animal:
>
> CREATE TABLE mauling (
>     id                  serial,
>     attacker_id         integer not null references animal (id),
>     victim_id           integer not null references animal (id),
>     attack_time         timestamp not null,
>     primary key (id)
> );
>
> The problem with this is that I have a very unsettled feeling about the
> foreign key constraints on this table.  The victim_id constraint is
> fine, but the attacker_id constraint is really inadequate, because the
> attacker CAN NEVER BE A SHEEP.  I really want a way to write a
> constraint that says that the attacker must be an animal, but
> specifically, a wolf.
>
> It would be really nice to be able to write:
>
> FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)
>
> Or:
>
> CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
> -- and then
> FOREIGN KEY (attacker_id) REFERENCES INDEX wolves
>
> ...but that's entirely speculative syntax.  I don't think there's any
> easy way to do this.  (Please tell me I'm wrong.)
>
> The problem really comes in when people start modifying the animal
> table.  Every once in a while we have a case where we record something
> as a wolf, but it turns out to have been a sheep in wolf's clothing.  In
> this case, we want to do something like this:
>
> UPDATE animal SET type_id = 1 WHERE id = 572;
>
> HOWEVER, this operation MUST NOT be allowed if it turns out there is a
> row in the mauling table where attacker_id = 572, because that would
> violate my integrity constraints that says that sheep do not maul.
>
> Any suggestions?  I've thought about creating rules or triggers to check
> the conditions, but I'm scared that this could either (a) get really
> complicated when there are a lot more tables and constraints involved or
> (b) introduce race conditions.
>
> Thanks,
>
> ...Robert

I'd do something like this:

CREATE TABLE animal_type (
    animal_name  TEXT PRIMARY KEY,
    CHECK(animal_name = trim(animal_name))
);

/* Only one of {Wolf,wolf} can be in the table. */

CREATE UNIQUE INDEX just_one_animal_name
    ON animal_type(LOWER(animal_name));

CREATE TABLE predator (
    animal_name TEXT NOT NULL
                REFERENCES animal_type(animal_name)
                ON DELETE CASCADE,
    PRIMARY KEY(animal_name)
);

CREATE TABLE prey (
    animal_name TEXT NOT NULL
                REFERENCES animal_type(animal_name)
                ON DELETE CASCADE,
    PRIMARY KEY(animal_name)
);

CREATE TABLE mauling (
    id             SERIAL PRIMARY KEY,
    attacker_id    INTEGER NOT NULL REFERENCES predator (animal_type_id),
    victim_id      INTEGER NOT NULL REFERENCES prey (animal_type_id),
    attack_time    TIMESTAMP WITH TIME ZONE NOT NULL
);

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: complex referential integrity constraints

From

Richard Broersma Jr

Date:

19 February 2007, 17:52:59

> I'd do something like this:
>
> CREATE TABLE animal_type (
>     animal_name  TEXT PRIMARY KEY,
>     CHECK(animal_name = trim(animal_name))
> );
>
> /* Only one of {Wolf,wolf} can be in the table. */
>
> CREATE UNIQUE INDEX just_one_animal_name
>     ON animal_type(LOWER(animal_name));
>
> CREATE TABLE predator (
>     animal_name TEXT NOT NULL
>                 REFERENCES animal_type(animal_name)
>                 ON DELETE CASCADE,
>     PRIMARY KEY(animal_name)
> );
>
> CREATE TABLE prey (
>     animal_name TEXT NOT NULL
>                 REFERENCES animal_type(animal_name)
>                 ON DELETE CASCADE,
>     PRIMARY KEY(animal_name)
> );
>
> CREATE TABLE mauling (
>     id             SERIAL PRIMARY KEY,
>     attacker_id    INTEGER NOT NULL REFERENCES predator (animal_type_id),
>     victim_id      INTEGER NOT NULL REFERENCES prey (animal_type_id),
>     attack_time    TIMESTAMP WITH TIME ZONE NOT NULL
> );

Just to add to David's idea, I would create two update-able views that joined animal to predator
and another for animal to prey.  This way, you only have to insert/update/delete from 1
update-able view rather than two tables.

Of course, I am still waiting for the future version of postgresql that will handle this
functionality seamlessly using table inheritance. :-)

Regards,
Richard Broersma Jr.

Re: complex referential integrity constraints

From

David Fetter

Date:

20 February 2007, 02:08:13

On Mon, Feb 19, 2007 at 10:52:51AM -0800, Richard Broersma Jr wrote:
> > I'd do something like this:
> >
> > CREATE TABLE animal_type (
> >     animal_name  TEXT PRIMARY KEY,
> >     CHECK(animal_name = trim(animal_name))
> > );
> >
> > /* Only one of {Wolf,wolf} can be in the table. */
> >
> > CREATE UNIQUE INDEX just_one_animal_name
> >     ON animal_type(LOWER(animal_name));
> >
> > CREATE TABLE predator (
> >     animal_name TEXT NOT NULL
> >                 REFERENCES animal_type(animal_name)
> >                 ON DELETE CASCADE,
> >     PRIMARY KEY(animal_name)
> > );
> >
> > CREATE TABLE prey (
> >     animal_name TEXT NOT NULL
> >                 REFERENCES animal_type(animal_name)
> >                 ON DELETE CASCADE,
> >     PRIMARY KEY(animal_name)
> > );
> >
> > CREATE TABLE mauling (
> >     id             SERIAL PRIMARY KEY,
> >     attacker_id    INTEGER NOT NULL REFERENCES predator (animal_type_id),
> >     victim_id      INTEGER NOT NULL REFERENCES prey (animal_type_id),
> >     attack_time    TIMESTAMP WITH TIME ZONE NOT NULL
> > );
>
> Just to add to David's idea, I would create two update-able views
> that joined animal to predator and another for animal to prey.  This
> way, you only have to insert/update/delete from 1 update-able view
> rather than two tables.

You could just do a rewrite RULE on predator and prey for each of
INSERT, UPDATE and DELETE that has a DO INSTEAD action that writes to
animal.  This wouldn't handle COPY, though.

> Of course, I am still waiting for the future version of postgresql
> that will handle this functionality seamlessly using table
> inheritance. :-)

You mean writeable VIEWs?  I think it would be nice to have some cases
handled, but there are several kinds of VIEWs I can think of where the
only sane way to write to them is to define the writing behavior on a
case-by-case basis.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: complex referential integrity constraints

From

"Joris Dobbelsteen"

Date:

22 February 2007, 12:02:46

I partially agree:
If people CAN do stupid things, they are 'clever' enough to find a way
to actually do it. I've seen them destroy things, by just using a system
in a way it was not intended. They effectively found a way to blow away
the very thing that part was designed for.
But indeed, it's a lot of work, especially if the number of tables that
must be referenced increases. I'm a strong supporter for ensuring
consistency. Postgres has what it takes to do the job, but it doesn't
make my life a lot easier. But it seems to be as good as it gets
today...

Perhaps we should rather define a 'database' constraint in the order of:
"For every mauling, the attacking animal must be of the attacker type"
(in a computer understandable manner). From the set theory this should
be possible without too much problems, However doing so efficiently
might be slightly harder.
This might be a fun project and useful for the TODO list. At least it
makes it a lot easier (and maintanable) to enforce database-wide
constraints.

- Joris

>-----Original Message-----
>From: Robert Haas [mailto:Robert.Haas@dyntek.com]
>Sent: woensdag 21 februari 2007 3:37
>To: Joris Dobbelsteen; elein
>Cc: pgsql-general@postgresql.org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>Yes, exactly.  And while you might not care about all of those
>(e.g. I care about the first two but am not worried about the
>third one because I'm the only one who will ever update that
>table), writing multiple triggers to enforce each constraint
>of this type quickly gets old if there are even a few of them.
> It is exponentially harder to write a constraint of this type
>than it is to write a simple foreign key constraint.
>
>...Robert
>
>-----Original Message-----
>From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
>Sent: Monday, February 19, 2007 5:59 AM
>To: elein; Robert Haas
>Cc: pgsql-general@postgresql.org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>>Why don't you add a field in animal_types that is boolean mauler.
>>Then you can add a trigger on the mauling table to raise an
>error when
>>the attacker_id is an animal type mauler.
>
>This is only partial. You need a lot more triggers to
>guarentee the constraints are enforced.
>Precisely you need to validate:
>* mauling on insert/update of attacker_id
>* animal on update of type_id
>* animal_type on update of your property
>
>Of course you need to think about the MVCC model, such that:
>Transaction 1 executes
>INSERT INTO mauling VALUES ('someattacker'), Transaction 2
>executes UPDATE animal_type SET mauler = false WHERE name =
>'someattacker', such that both transaction happen in parallel.
>
>This is perfectly possible and will make it possible to
>violate the constraint, UNLESS locking of the tuples is done correctly.
>
>These contraints are not trivial to implement (unfortunally).
>It would be great if they where.
>
>- Joris
>

Re: complex referential integrity constraints

From

"Joris Dobbelsteen"

Date:

22 February 2007, 15:28:26

>-----Original Message-----
>From: Robert Haas [mailto:Robert.Haas@dyntek.com]
>Sent: donderdag 22 februari 2007 15:58
>To: Joris Dobbelsteen; elein
>Cc: pgsql-general@postgresql.org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>The ability to make a foreign key reference a specific partial
>unique index (rather than just a set of columns that have a
>unique index) would solve many problems of this type.  As
>another example, you might have a table where one of the
>columns is "is_deleted boolean not null".  By creating a
>partial unique index on the primary key of that table "WHERE
>NOT is_deleted" and then pointing a foreign key at it, you
>could enforce that each row in the child table references a
>parent who isn't deleted.
>
>However, this would break down when there's more one than
>intermediate step involved.  For example, if you have:
>
>CREATE TABLE animal_type (
>      id            serial,
>    name            varchar(60) not null,
>    is_attacker        boolean not null,
>    primary key (id)
>);
>
>CREATE TABLE animal (
>    id            serial,
>    type_id        integer not null references animal_type (id),
>    name            varchar(60) not null,
>    primary key (id)
>);
>
>CREATE TABLE mauling (
>    id            serial,
>    attacker_id        integer not null references animal (id),
>    victim_id        integer not null references animal (id),
>    attack_time        timestamp with time zone not null,
>    primary key (id)
>);
>
>It would be easy to enforce the constraint that the attacker
>must be an animal of some specific type, but difficult to
>enforce the constraint that the attacker must be an animal
>whose type, in turn, has a true value for is_attacker.

Even worse, I don't you can guarentee that this constraint is enforced
at all times. That means, not if you are using triggers.
The only option seems using foreign keys and put in a lot of redundant
data.

>The best idea that I can think of right now to handle multiple
>levels of tables is to allow FOREIGN KEY constraints to
>references a VIEW, rather than a table.  Then you could say:
>
>CREATE VIEW attackers AS
>SELECT a.id FROM animal a, animal_type t WHERE a.type_id =
>t.id AND t.attacker;
>
>...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id).

Perhaps "Alban Hertroys" idea solves this problem a little easier.
However it lacks the possibility to make quick changes later on
(predator is a predator, or you are screwed, no second change).
This is not acceptable in problems where such things are decided after
object creation or might be changed later on.

>This syntax would solve a number of other problems as well,
>such as requiring that some record in table A has a parent
>either in table P or in table Q. However, I think this would
>probably require implementing some kind of materialized view
>so that you could actually build an index on the view, and
>that opens up a whole new can of worms, because it's not very
>difficult to define a view that is costly to update incrementally.

You don't need a materialized view to put a database to its knees. You
can already do that today, with ease. I wouldn't worry too much about
that.
If you mean from a syntax I suggested I do not believe it's the 'right'
way to define an (materialized) view, rather use a trigger-like style of
system. There are some other issues, however.

>The problem is really that there is a pretty large gap between
>writing a foreign key constraint, which is trivial, and
>enforcing a constraint using triggers, which is quite a bit
>more complex (and therefore, easy to screw up), because the
>foreign key automatically handles all the cases (insert into
>child table, update of child table, update of parent table,
>delete from parent table) whereas with triggers you have to
>address each of those cases individually.

Exactly, that is why I suggested such a system. If its not easy to
enforce constraints, it will never happen properly. Especially if
problems get more complex.

>Unfortunately,
>something tells me that implementing a more powerful system
>for foreign key constraints is a non-trivial project, however
>useful it would be.
>Still, I'd love to see it in the TODO file, too.

Me too, I get the impression that SQL is too weak for most constraints.

- Joris

>...Robert
>
>-----Original Message-----
>From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
>Sent: Thursday, February 22, 2007 8:03 AM
>To: Robert Haas; elein
>Cc: pgsql-general@postgresql.org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>I partially agree:
>If people CAN do stupid things, they are 'clever' enough to
>find a way to actually do it. I've seen them destroy things,
>by just using a system in a way it was not intended. They
>effectively found a way to blow away the very thing that part
>was designed for.
>But indeed, it's a lot of work, especially if the number of
>tables that must be referenced increases. I'm a strong
>supporter for ensuring consistency. Postgres has what it takes
>to do the job, but it doesn't make my life a lot easier. But
>it seems to be as good as it gets today...
>
>Perhaps we should rather define a 'database' constraint in the
>order of:
>"For every mauling, the attacking animal must be of the attacker type"
>(in a computer understandable manner). From the set theory
>this should be possible without too much problems, However
>doing so efficiently might be slightly harder.
>This might be a fun project and useful for the TODO list. At
>least it makes it a lot easier (and maintanable) to enforce
>database-wide constraints.
>
>- Joris
>
>>-----Original Message-----
>>From: Robert Haas [mailto:Robert.Haas@dyntek.com]
>>Sent: woensdag 21 februari 2007 3:37
>>To: Joris Dobbelsteen; elein
>>Cc: pgsql-general@postgresql.org
>>Subject: RE: [GENERAL] complex referential integrity constraints
>>
>>Yes, exactly.  And while you might not care about all of
>those (e.g. I
>>care about the first two but am not worried about the third
>one because
>>I'm the only one who will ever update that table), writing multiple
>>triggers to enforce each constraint of this type quickly gets old if
>>there are even a few of them.
>> It is exponentially harder to write a constraint of this
>type than it
>>is to write a simple foreign key constraint.
>>
>>...Robert
>>
>>-----Original Message-----
>>From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
>>Sent: Monday, February 19, 2007 5:59 AM
>>To: elein; Robert Haas
>>Cc: pgsql-general@postgresql.org
>>Subject: RE: [GENERAL] complex referential integrity constraints
>>
>>>Why don't you add a field in animal_types that is boolean mauler.
>>>Then you can add a trigger on the mauling table to raise an
>>error when
>>>the attacker_id is an animal type mauler.
>>
>>This is only partial. You need a lot more triggers to guarentee the
>>constraints are enforced.
>>Precisely you need to validate:
>>* mauling on insert/update of attacker_id
>>* animal on update of type_id
>>* animal_type on update of your property
>>
>>Of course you need to think about the MVCC model, such that:
>>Transaction 1 executes
>>INSERT INTO mauling VALUES ('someattacker'), Transaction 2 executes
>>UPDATE animal_type SET mauler = false WHERE name =
>'someattacker', such
>>that both transaction happen in parallel.
>>
>>This is perfectly possible and will make it possible to violate the
>>constraint, UNLESS locking of the tuples is done correctly.
>>
>>These contraints are not trivial to implement (unfortunally).
>>It would be great if they where.
>>
>>- Joris
>>
>

Re: complex referential integrity constraints

From

Martijn van Oosterhout

Date:

22 February 2007, 16:17:44

On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:
> Even worse, I don't you can guarentee that this constraint is enforced
> at all times. That means, not if you are using triggers.
> The only option seems using foreign keys and put in a lot of redundant
> data.

Err, foreign keys are implemented using triggers, so this statement is
self-contradictary.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: complex referential integrity constraints

From

"Joris Dobbelsteen"

Date:

22 February 2007, 16:51:46

>-----Original Message-----
>From: pgsql-general-owner@postgresql.org
>[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
>Martijn van Oosterhout
>Sent: donderdag 22 februari 2007 18:17
>To: Joris Dobbelsteen
>Cc: Robert Haas; pgsql-general@postgresql.org
>Subject: Re: [GENERAL] complex referential integrity constraints
>
>On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:
>> Even worse, I don't you can guarentee that this constraint
>is enforced
>> at all times. That means, not if you are using triggers.
>> The only option seems using foreign keys and put in a lot of
>redundant
>> data.
>
>Err, foreign keys are implemented using triggers, so this
>statement is self-contradictary.

Are you really sure they are executed under the same visibility rules?

- Joris

Re: complex referential integrity constraints

From

Stephan Szabo

Date:

22 February 2007, 21:13:31

On Thu, 22 Feb 2007, Joris Dobbelsteen wrote:

> >-----Original Message-----
> >From: pgsql-general-owner@postgresql.org
> >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
> >Martijn van Oosterhout
> >Sent: donderdag 22 februari 2007 18:17
> >To: Joris Dobbelsteen
> >Cc: Robert Haas; pgsql-general@postgresql.org
> >Subject: Re: [GENERAL] complex referential integrity constraints
> >
> >On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:
> >> Even worse, I don't you can guarentee that this constraint
> >is enforced
> >> at all times. That means, not if you are using triggers.
> >> The only option seems using foreign keys and put in a lot of
> >redundant
> >> data.
> >
> >Err, foreign keys are implemented using triggers, so this
> >statement is self-contradictary.
>
> Are you really sure they are executed under the same visibility rules?

IIRC, the ri triggers use calls that you aren't able to get at in
triggers written in any of the PLs, but I think you should be able to
replicate the feat in a trigger written in C.

Re: complex referential integrity constraints

From

Martijn van Oosterhout

Date:

22 February 2007, 21:15:30

On Thu, Feb 22, 2007 at 06:51:49PM +0100, Joris Dobbelsteen wrote:
> >Err, foreign keys are implemented using triggers, so this
> >statement is self-contradictary.
>
> Are you really sure they are executed under the same visibility rules?

Reasonably. I have no idea what visibility rules would make any
difference at all. AIUI a foreign key just takes a shared lock on the
referenced row and all the magic of MVCC makes sure the row exists when
the transaction completes.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: complex referential integrity constraints

From

"Robert Haas"

Date:

22 February 2007, 21:32:37

The idea here is that a wolf can attack a sheep, or a wolf can attack
another wolf, but sheep can't attack anything.  I suppose I could list
each wolf in both the predator and prey tables, but that seems a bit
duplicative (and causes other problems).

...Robert

-----Original Message-----
From: David Fetter [mailto:david@fetter.org]
Sent: Monday, February 19, 2007 1:04 PM
To: Robert Haas
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote:
> So, I have the following problem.
>
> Suppose you have two kinds of animals, sheep and wolves.  Since they
> have very similar properties, you create a single table to hold both
> kinds of animals, and an animal_type table to specify the type of each
> animal:
>
> CREATE TABLE animal_type (
>     id            integer not null,
>     name            varchar(80) not null,
>     primary key (id)
> );
> INSERT INTO animal_type VALUES (1, 'Sheep');
> INSERT INTO animal_type VALUES (2, 'Wolf');
>
> CREATE TABLE animal (
>     id            serial,
>     type_id             integer not null references animal_type (id),
>     name            varchar(80) not null,
>     age            integer not null,
>     weight_in_pounds    integer not null,
>     primary key (id)
> );
>
> The animal_type table is more or less written in stone, but the animal
> table will be updated frequently.  Now, let's suppose that we want to
> keep track of all of the cases where one animal is mauled by another
> animal:
>
> CREATE TABLE mauling (
>     id                  serial,
>     attacker_id         integer not null references animal (id),
>     victim_id           integer not null references animal (id),
>     attack_time         timestamp not null,
>     primary key (id)
> );
>
> The problem with this is that I have a very unsettled feeling about
the
> foreign key constraints on this table.  The victim_id constraint is
> fine, but the attacker_id constraint is really inadequate, because the
> attacker CAN NEVER BE A SHEEP.  I really want a way to write a
> constraint that says that the attacker must be an animal, but
> specifically, a wolf.
>
> It would be really nice to be able to write:
>
> FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id)
>
> Or:
>
> CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2;
> -- and then
> FOREIGN KEY (attacker_id) REFERENCES INDEX wolves
>
> ...but that's entirely speculative syntax.  I don't think there's any
> easy way to do this.  (Please tell me I'm wrong.)
>
> The problem really comes in when people start modifying the animal
> table.  Every once in a while we have a case where we record something
> as a wolf, but it turns out to have been a sheep in wolf's clothing.
In
> this case, we want to do something like this:
>
> UPDATE animal SET type_id = 1 WHERE id = 572;
>
> HOWEVER, this operation MUST NOT be allowed if it turns out there is a
> row in the mauling table where attacker_id = 572, because that would
> violate my integrity constraints that says that sheep do not maul.
>
> Any suggestions?  I've thought about creating rules or triggers to
check
> the conditions, but I'm scared that this could either (a) get really
> complicated when there are a lot more tables and constraints involved
or
> (b) introduce race conditions.
>
> Thanks,
>
> ...Robert

I'd do something like this:

CREATE TABLE animal_type (
    animal_name  TEXT PRIMARY KEY,
    CHECK(animal_name = trim(animal_name))
);

/* Only one of {Wolf,wolf} can be in the table. */

CREATE UNIQUE INDEX just_one_animal_name
    ON animal_type(LOWER(animal_name));

CREATE TABLE predator (
    animal_name TEXT NOT NULL
                REFERENCES animal_type(animal_name)
                ON DELETE CASCADE,
    PRIMARY KEY(animal_name)
);

CREATE TABLE prey (
    animal_name TEXT NOT NULL
                REFERENCES animal_type(animal_name)
                ON DELETE CASCADE,
    PRIMARY KEY(animal_name)
);

CREATE TABLE mauling (
    id             SERIAL PRIMARY KEY,
    attacker_id    INTEGER NOT NULL REFERENCES predator
(animal_type_id),
    victim_id      INTEGER NOT NULL REFERENCES prey (animal_type_id),
    attack_time    TIMESTAMP WITH TIME ZONE NOT NULL
);

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: complex referential integrity constraints

From

"Robert Haas"

Date:

22 February 2007, 21:32:43

The ability to make a foreign key reference a specific partial unique
index (rather than just a set of columns that have a unique index) would
solve many problems of this type.  As another example, you might have a
table where one of the columns is "is_deleted boolean not null".  By
creating a partial unique index on the primary key of that table "WHERE
NOT is_deleted" and then pointing a foreign key at it, you could enforce
that each row in the child table references a parent who isn't deleted.

However, this would break down when there's more one than intermediate
step involved.  For example, if you have:

CREATE TABLE animal_type (
      id            serial,
    name            varchar(60) not null,
    is_attacker        boolean not null,
    primary key (id)
);

CREATE TABLE animal (
    id            serial,
    type_id        integer not null references animal_type (id),
    name            varchar(60) not null,
    primary key (id)
);

CREATE TABLE mauling (
    id            serial,
    attacker_id        integer not null references animal (id),
    victim_id        integer not null references animal (id),
    attack_time        timestamp with time zone not null,
    primary key (id)
);

It would be easy to enforce the constraint that the attacker must be an
animal of some specific type, but difficult to enforce the constraint
that the attacker must be an animal whose type, in turn, has a true
value for is_attacker.

The best idea that I can think of right now to handle multiple levels of
tables is to allow FOREIGN KEY constraints to references a VIEW, rather
than a table.  Then you could say:

CREATE VIEW attackers AS
SELECT a.id FROM animal a, animal_type t WHERE a.type_id = t.id AND
t.attacker;

...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id).

This syntax would solve a number of other problems as well, such as
requiring that some record in table A has a parent either in table P or
in table Q.  However, I think this would probably require implementing
some kind of materialized view so that you could actually build an index
on the view, and that opens up a whole new can of worms, because it's
not very difficult to define a view that is costly to update
incrementally.

The problem is really that there is a pretty large gap between writing a
foreign key constraint, which is trivial, and enforcing a constraint
using triggers, which is quite a bit more complex (and therefore, easy
to screw up), because the foreign key automatically handles all the
cases (insert into child table, update of child table, update of parent
table, delete from parent table) whereas with triggers you have to
address each of those cases individually.  Unfortunately, something
tells me that implementing a more powerful system for foreign key
constraints is a non-trivial project, however useful it would be.
Still, I'd love to see it in the TODO file, too.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Thursday, February 22, 2007 8:03 AM
To: Robert Haas; elein
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

I partially agree:
If people CAN do stupid things, they are 'clever' enough to find a way
to actually do it. I've seen them destroy things, by just using a system
in a way it was not intended. They effectively found a way to blow away
the very thing that part was designed for.
But indeed, it's a lot of work, especially if the number of tables that
must be referenced increases. I'm a strong supporter for ensuring
consistency. Postgres has what it takes to do the job, but it doesn't
make my life a lot easier. But it seems to be as good as it gets
today...

Perhaps we should rather define a 'database' constraint in the order of:
"For every mauling, the attacking animal must be of the attacker type"
(in a computer understandable manner). From the set theory this should
be possible without too much problems, However doing so efficiently
might be slightly harder.
This might be a fun project and useful for the TODO list. At least it
makes it a lot easier (and maintanable) to enforce database-wide
constraints.

- Joris

>-----Original Message-----
>From: Robert Haas [mailto:Robert.Haas@dyntek.com]
>Sent: woensdag 21 februari 2007 3:37
>To: Joris Dobbelsteen; elein
>Cc: pgsql-general@postgresql.org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>Yes, exactly.  And while you might not care about all of those
>(e.g. I care about the first two but am not worried about the
>third one because I'm the only one who will ever update that
>table), writing multiple triggers to enforce each constraint
>of this type quickly gets old if there are even a few of them.
> It is exponentially harder to write a constraint of this type
>than it is to write a simple foreign key constraint.
>
>...Robert
>
>-----Original Message-----
>From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
>Sent: Monday, February 19, 2007 5:59 AM
>To: elein; Robert Haas
>Cc: pgsql-general@postgresql.org
>Subject: RE: [GENERAL] complex referential integrity constraints
>
>>Why don't you add a field in animal_types that is boolean mauler.
>>Then you can add a trigger on the mauling table to raise an
>error when
>>the attacker_id is an animal type mauler.
>
>This is only partial. You need a lot more triggers to
>guarentee the constraints are enforced.
>Precisely you need to validate:
>* mauling on insert/update of attacker_id
>* animal on update of type_id
>* animal_type on update of your property
>
>Of course you need to think about the MVCC model, such that:
>Transaction 1 executes
>INSERT INTO mauling VALUES ('someattacker'), Transaction 2
>executes UPDATE animal_type SET mauler = false WHERE name =
>'someattacker', such that both transaction happen in parallel.
>
>This is perfectly possible and will make it possible to
>violate the constraint, UNLESS locking of the tuples is done correctly.
>
>These contraints are not trivial to implement (unfortunally).
>It would be great if they where.
>
>- Joris
>

Re: complex referential integrity constraints

From

"Robert Haas"

Date:

22 February 2007, 21:32:46

Yes, exactly.  And while you might not care about all of those (e.g. I
care about the first two but am not worried about the third one because
I'm the only one who will ever update that table), writing multiple
triggers to enforce each constraint of this type quickly gets old if
there are even a few of them.  It is exponentially harder to write a
constraint of this type than it is to write a simple foreign key
constraint.

...Robert

-----Original Message-----
From: Joris Dobbelsteen [mailto:Joris@familiedobbelsteen.nl]
Sent: Monday, February 19, 2007 5:59 AM
To: elein; Robert Haas
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] complex referential integrity constraints

>Why don't you add a field in animal_types that is boolean mauler.
>Then you can add a trigger on the mauling table to raise an
>error when the attacker_id is an animal type mauler.

This is only partial. You need a lot more triggers to guarentee the
constraints are enforced.
Precisely you need to validate:
* mauling on insert/update of attacker_id
* animal on update of type_id
* animal_type on update of your property

Of course you need to think about the MVCC model, such that:
Transaction 1 executes
INSERT INTO mauling VALUES ('someattacker'),
Transaction 2 executes
UPDATE animal_type SET mauler = false WHERE name = 'someattacker',
such that both transaction happen in parallel.

This is perfectly possible and will make it possible to violate the
constraint, UNLESS locking of the tuples is done correctly.

These contraints are not trivial to implement (unfortunally). It would
be great if they where.

- Joris

Re: complex referential integrity constraints

From

"Joris Dobbelsteen"

Date:

22 February 2007, 23:31:09

>-----Original Message-----
>From: Martijn van Oosterhout [mailto:kleptog@svana.org]
>Sent: donderdag 22 februari 2007 23:15
>To: Joris Dobbelsteen
>Cc: Robert Haas; pgsql-general@postgresql.org
>Subject: Re: [GENERAL] complex referential integrity constraints
>
>On Thu, Feb 22, 2007 at 06:51:49PM +0100, Joris Dobbelsteen wrote:
>> >Err, foreign keys are implemented using triggers, so this statement
>> >is self-contradictary.
>>
>> Are you really sure they are executed under the same
>visibility rules?
>
>Reasonably. I have no idea what visibility rules would make
>any difference at all. AIUI a foreign key just takes a shared
>lock on the referenced row and all the magic of MVCC makes
>sure the row exists when the transaction completes.

Try this:
(sorry for any typo's in SQL, if they exist)

CREATE TABLE a (val integer NOT NULL PRIMARY KEY);
CREATE TABLE b (val integer NOT NULL, val2 integer NOT NULL, PRIMARY KEY
(val, val2);
-- we will be doing foreign key ourselves
INSERT INTO a VALUES (1);
INSERT INTO a VALUES (2);
INSERT INTO a VALUES (1,1);
INSERT INTO a VALUES (2,2);

-- Now two transaction (T1, T2)
T1: BEGIN ISOLATION LEVEL SERIALIZABLE;
T2: BEGIN ISOLATION LEVEL SERIALIZABLE;
-- Lets see what we have got.
T1: SELECT * FROM a;
T1: SELECT * FROM b;
T2: SELECT * FROM a;
T2: SELECT * FROM b;
-- lets insert something...
T2: INSERT INTO a VALUES (2,100);
-- results in a lock being acquired
T2: SELECT 1 FROM a x WHERE val = 2 FOR SHARE ON x;    -- this is your
lock
-- Ok, done for now...
T2: COMMIT;                    -- now the lock is gone
-- This means T1 doesn't see the row, right?
T1: SELECT * FROM b;
-- now lets delete
T1: DELETE FROM a WHERE val = 2;
-- on cascade delete, thus:
T1: DELETE FROM b WHERE val = 2;    -- won't see new tuple
(serializable isolation)
T1: COMMIT;

SELECT * FROM b;
val    val2
2    100

Sorry, constraint wasn't enforced ;)
It does matter.

Now try it with this:
CREATE TABLE b (val integer NOT NULL, val2 integer NOT NULL, PRIMARY KEY
(val, val2), FOREIGN KEY val REFERENCES a(val) ON UPDATE CASCADE ON
DELETE CASCADE);
That won't inhibit this behaviour, but proberly enforces the constraint
(as one would have expected). I believe T2 will abort as in the manual.

Your statement might be correct, but it doesn't take enough account of
how the visibility rules under MVCC are played. It seems the foreign
keys (as well as primary keys) have there rules applied differently,
they see that row and will cause an abort.

- Joris

Re: complex referential integrity constraints

From

"Joris Dobbelsteen"

Date:

22 February 2007, 23:31:11

>-----Original Message-----
>From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com]
>Sent: donderdag 22 februari 2007 23:13
>To: Joris Dobbelsteen
>Cc: Martijn van Oosterhout; Robert Haas; pgsql-general@postgresql.org
>Subject: Re: [GENERAL] complex referential integrity constraints
>
>On Thu, 22 Feb 2007, Joris Dobbelsteen wrote:
>
>> >-----Original Message-----
>> >From: pgsql-general-owner@postgresql.org
>> >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
>Martijn van
>> >Oosterhout
>> >Sent: donderdag 22 februari 2007 18:17
>> >To: Joris Dobbelsteen
>> >Cc: Robert Haas; pgsql-general@postgresql.org
>> >Subject: Re: [GENERAL] complex referential integrity constraints
>> >
>> >On Thu, Feb 22, 2007 at 05:28:35PM +0100, Joris Dobbelsteen wrote:
>> >> Even worse, I don't you can guarentee that this constraint
>> >is enforced
>> >> at all times. That means, not if you are using triggers.
>> >> The only option seems using foreign keys and put in a lot of
>> >redundant
>> >> data.
>> >
>> >Err, foreign keys are implemented using triggers, so this statement
>> >is self-contradictary.
>>
>> Are you really sure they are executed under the same
>visibility rules?
>
>IIRC, the ri triggers use calls that you aren't able to get at
>in triggers written in any of the PLs, but I think you should
>be able to replicate the feat in a trigger written in C.

Why they never did that? No need or no pratical situations where it went
wrong?

IMHO it should become possible to expose this functionality to the PL
languages or as part of the trigger system? The current system can be
shown to be underpowered to enforce constraints. It seems a bit
cumbersome to have C functions for this purpose.

However I must admit that looking through the postgres code doesn't make
it much clearer to me whats actually going for these kind of tricky
problems...

- Joris

Re: complex referential integrity constraints

From

Martijn van Oosterhout

Date:

23 February 2007, 07:50:07

On Fri, Feb 23, 2007 at 12:41:25AM +0100, Joris Dobbelsteen wrote:
> >Reasonably. I have no idea what visibility rules would make
> >any difference at all. AIUI a foreign key just takes a shared
> >lock on the referenced row and all the magic of MVCC makes
> >sure the row exists when the transaction completes.
>
> Try this:
> (sorry for any typo's in SQL, if they exist)

<snip>

Well, I took a look at the RI code and the only stuff I saw that looked
interesting was this:

utils/adt/ri_triggers.c:
        if (IsXactIsoLevelSerializable && detectNewRows)
        {
                CommandCounterIncrement();              /* be sure all my own work is visible */
                test_snapshot = CopySnapshot(GetLatestSnapshot());
                crosscheck_snapshot = CopySnapshot(GetTransactionSnapshot());
        }

It then proceeds to use that snapshot to execute the query to get the share lock.

It's probably true that other PL's can't do this directly. Not sure how
to deal with that. I got confused because I thought the first version
of RI did use straight pl/pgsql functions, so I thought that was
enough.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: complex referential integrity constraints

From

Alban Hertroys

Date:

23 February 2007, 08:02:03

Robert Haas wrote:
> The idea here is that a wolf can attack a sheep, or a wolf can attack
> another wolf, but sheep can't attack anything.  I suppose I could list
> each wolf in both the predator and prey tables, but that seems a bit
> duplicative (and causes other problems).
>
> ...Robert

I'm quite certain a wolf is much more likely to attack a sheep than to
attack another wolf, and even more unlikely to attack for example a
lion. It seems to me that just the fact that it can isn't enough
information.

It looks like you need "weighted constraints"; there's 0 chance that a
sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep,
>0 chance it attacks a wolf and >0 chance it attacks a lion. The exact
numbers will vary, and I have absolutely no idea what they would be
like. It probably requires some kind of ranking system that adjusts
according to the known animals and their likelihood to attack eachother.

I'm pretty sure you can't get this done without defining some triggers.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: complex referential integrity constraints

From

Alvaro Herrera

Date:

23 February 2007, 13:36:03

Alban Hertroys wrote:
> Robert Haas wrote:
> > The idea here is that a wolf can attack a sheep, or a wolf can attack
> > another wolf, but sheep can't attack anything.  I suppose I could list
> > each wolf in both the predator and prey tables, but that seems a bit
> > duplicative (and causes other problems).
>
> I'm quite certain a wolf is much more likely to attack a sheep than to
> attack another wolf, and even more unlikely to attack for example a
> lion. It seems to me that just the fact that it can isn't enough
> information.
>
> It looks like you need "weighted constraints"; there's 0 chance that a
> sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep,
> >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact
> numbers will vary, and I have absolutely no idea what they would be
> like. It probably requires some kind of ranking system that adjusts
> according to the known animals and their likelihood to attack eachother.

Depending on what you're modelling, even this could be too simple -- for
example, while a single wolf is unlikely to attack a lion, a pack of
wolves have a lot more probability of doing so.

Do you keep packs of wolves in your barn?  If so, watch your lions.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: complex referential integrity constraints

From

"Joris Dobbelsteen"

Date:

23 February 2007, 14:09:54

>-----Original Message-----
>From: Martijn van Oosterhout [mailto:kleptog@svana.org]
>Sent: vrijdag 23 februari 2007 9:50
>To: Joris Dobbelsteen
>Cc: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] complex referential integrity constraints
>
>On Fri, Feb 23, 2007 at 12:41:25AM +0100, Joris Dobbelsteen wrote:
>> >Reasonably. I have no idea what visibility rules would make any
>> >difference at all. AIUI a foreign key just takes a shared
>lock on the
>> >referenced row and all the magic of MVCC makes sure the row exists
>> >when the transaction completes.
>>
>> Try this:
>> (sorry for any typo's in SQL, if they exist)
>
><snip>
>
>Well, I took a look at the RI code and the only stuff I saw
>that looked interesting was this:
>
>utils/adt/ri_triggers.c:
>        if (IsXactIsoLevelSerializable && detectNewRows)
>        {
>                CommandCounterIncrement();              /* be
>sure all my own work is visible */
>                test_snapshot = CopySnapshot(GetLatestSnapshot());
>                crosscheck_snapshot =
>CopySnapshot(GetTransactionSnapshot());
>        }
>
>It then proceeds to use that snapshot to execute the query to
>get the share lock.
>
>It's probably true that other PL's can't do this directly. Not
>sure how to deal with that. I got confused because I thought
>the first version of RI did use straight pl/pgsql functions,
>so I thought that was enough.

You got it right...

/*
 * SPI_execute_snapshot -- identical to SPI_execute_plan, except that we
allow
 * the caller to specify exactly which snapshots to use.  This is
currently
 * not documented in spi.sgml because it is only intended for use by RI
 * triggers.
 *
 * Passing snapshot == InvalidSnapshot will select the normal behavior
of
 * fetching a new snapshot for each query.
 */
int
SPI_execute_snapshot(void *plan,
                     Datum *Values, const char
*Nulls,
                     Snapshot snapshot, Snapshot
crosscheck_snapshot,
                     bool read_only, long tcount)

They got the point right: only intended for use by RI triggers. That's
exactly the type I'm trying to build ;)
They are exposed to the C versions (its in include/executor/spi.h), but
to me it looks a bit cumbersome to have triggers written in C.

What would be a good way to expose this to normal PL triggers? Since
this would open a new set of possibilities...

As part of a "create trigger ... for referencial integrity"?
As an extension to a statement?
Special construct in the languages?

- Joris

Re: complex referential integrity constraints

From

Stephan Szabo

Date:

23 February 2007, 14:27:40

On Fri, 23 Feb 2007, Joris Dobbelsteen wrote:

> >-----Original Message-----
> >From: Martijn van Oosterhout [mailto:kleptog@svana.org]
> >Sent: vrijdag 23 februari 2007 9:50
> >To: Joris Dobbelsteen
> >Cc: pgsql-general@postgresql.org
> >Subject: Re: [GENERAL] complex referential integrity constraints
> >
> >On Fri, Feb 23, 2007 at 12:41:25AM +0100, Joris Dobbelsteen wrote:
> >> >Reasonably. I have no idea what visibility rules would make any
> >> >difference at all. AIUI a foreign key just takes a shared
> >lock on the
> >> >referenced row and all the magic of MVCC makes sure the row exists
> >> >when the transaction completes.
> >>
> >> Try this:
> >> (sorry for any typo's in SQL, if they exist)
> >
> ><snip>
> >
> >Well, I took a look at the RI code and the only stuff I saw
> >that looked interesting was this:
> >
> >utils/adt/ri_triggers.c:
> >        if (IsXactIsoLevelSerializable && detectNewRows)
> >        {
> >                CommandCounterIncrement();              /* be
> >sure all my own work is visible */
> >                test_snapshot = CopySnapshot(GetLatestSnapshot());
> >                crosscheck_snapshot =
> >CopySnapshot(GetTransactionSnapshot());
> >        }
> >
> >It then proceeds to use that snapshot to execute the query to
> >get the share lock.
> >
> >It's probably true that other PL's can't do this directly. Not
> >sure how to deal with that. I got confused because I thought
> >the first version of RI did use straight pl/pgsql functions,
> >so I thought that was enough.
>
> You got it right...
>
> /*
>  * SPI_execute_snapshot -- identical to SPI_execute_plan, except that we
> allow
>  * the caller to specify exactly which snapshots to use.  This is
> currently
>  * not documented in spi.sgml because it is only intended for use by RI
>  * triggers.
>  *
>  * Passing snapshot == InvalidSnapshot will select the normal behavior
> of
>  * fetching a new snapshot for each query.
>  */
> int
> SPI_execute_snapshot(void *plan,
>                      Datum *Values, const char
> *Nulls,
>                      Snapshot snapshot, Snapshot
> crosscheck_snapshot,
>                      bool read_only, long tcount)
>
> They got the point right: only intended for use by RI triggers. That's
> exactly the type I'm trying to build ;)
> They are exposed to the C versions (its in include/executor/spi.h), but
> to me it looks a bit cumbersome to have triggers written in C.

I was wondering if some sort of generator might work. Something that would
take what you're trying to do and generate the triggers for you, but I
haven't really worked out what that'd look like.

> What would be a good way to expose this to normal PL triggers? Since
> this would open a new set of possibilities...
>
> As part of a "create trigger ... for referencial integrity"?
> As an extension to a statement?
> Special construct in the languages?

I think the first thing to do is to figure out what such triggers need to
do. Does such a trigger need to potentially run some queries on the normal
snapshot? Does it potentially need different snapshots for different
statements or is only one special snapshot sufficient? And other such
questions. From there, a -hackers discussion might be meaningful.

Re: complex referential integrity constraints

From

David Fetter

Date:

23 February 2007, 21:08:44

On Fri, Feb 23, 2007 at 09:39:52AM -0500, Robert Haas wrote:
> Actually, what would be really nice is if there were just a button I
> could push that would make all of my data automatically correct.
> Can that go into 8.3?  Thanks, ...Robert

Oh, no problem.  Just compile with -ldwim ;)

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!
http://www.postgresql.org/about/donate

Re: complex referential integrity constraints

From

Richard Broersma Jr

Date:

23 February 2007, 22:59:04

to attack eachother.
>
> Depending on what you're modelling, even this could be too simple -- for
> example, while a single wolf is unlikely to attack a lion, a pack of
> wolves have a lot more probability of doing so.
>
> Do you keep packs of wolves in your barn?  If so, watch your lions.

Well from the previous thread that discussed the use of the <animal> table and sub-set tables
<prey> and <preditor>, if a preditor can attach a prey item or preditor item, then a table
relation only needs to be created between <preditor> and <animal>.  This way only preditors can
attack, but they can attach any other animal preditor or prey.

Regards,
Richard Broersma Jr.

Re: complex referential integrity constraints

From

"Robert Haas"

Date:

25 February 2007, 16:11:37

I don't understand what a weighted constraint would mean.  Either the
attacker_id can be a wolf, or it can't.  Knowing that it is only 1%
likely over the long haul is insufficient to disallow any particular
transaction.

It's certainly true that the constraint as stated is insufficient to
guarantee that the table will contain good data.  For example if we
looked at the maulings table and wolves were always mauling other wolves
but never sheep, we would naturally want to dig into that a little more
and find out why they weren't picking easier targets.  But this is
neither here nor there, because NO constraint (foreign key, check, or
what have you) is ever strong enough to ensure that the data in a table
is completely clean.  At least as I understand it, the purpose of these
constraints is to allow us to write application code which relies on
certain basic invariants being true, i.e. so that we can join animal to
animal_type and not have to worry about rows dropping out because some
animals had an invalid type, or rows getting added because there are two
animal_type records with the same id.

Besides, the problem as stated is a proxy for some real problem which is
part of a non-zoological project the details of which (a) would take too
long to explain and (b) should probably not be posted to a public
mailing list.  :-)

So far, the best ideas I've seen have been:

(a) Tom Lane's idea of denormalizing by copying the animal type column
into the maulings table with ON UPDATE CASCADE, and then adding a CHECK
constraint on that column, and

(b) Creating a separate table called "wolf" and some triggers that
ensure that the wolf table will always contain the subset of IDs from
the animal table where the type_id is that of a wolf, with a foreign key
constraint from that id column back to animal with "on delete cascade".
This ensures that nobody can delete a wolf or change it into a sheep if
it has maulings, but permits it otherwise.

For what it's worth, I've adopted the latter solution for the present.
Unfortunately, it's too much work to do it everywhere it would be nice
to have, so I'm just doing it in some really critical cases and hoping
that the others don't break.

Thanks,

...Robert

-----Original Message-----
From: Alban Hertroys [mailto:alban@magproductions.nl]
Sent: Friday, February 23, 2007 4:02 AM
To: Robert Haas
Cc: David Fetter; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

Robert Haas wrote:
> The idea here is that a wolf can attack a sheep, or a wolf can attack
> another wolf, but sheep can't attack anything.  I suppose I could list
> each wolf in both the predator and prey tables, but that seems a bit
> duplicative (and causes other problems).
>
> ...Robert

I'm quite certain a wolf is much more likely to attack a sheep than to
attack another wolf, and even more unlikely to attack for example a
lion. It seems to me that just the fact that it can isn't enough
information.

It looks like you need "weighted constraints"; there's 0 chance that a
sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep,
>0 chance it attacks a wolf and >0 chance it attacks a lion. The exact
numbers will vary, and I have absolutely no idea what they would be
like. It probably requires some kind of ranking system that adjusts
according to the known animals and their likelihood to attack eachother.

I'm pretty sure you can't get this done without defining some triggers.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: complex referential integrity constraints

From

"Robert Haas"

Date:

25 February 2007, 16:11:41

Actually, what would be really nice is if there were just a button I
could push that would make all of my data automatically correct.  Can
that go into 8.3?  Thanks, ...Robert

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@commandprompt.com]
Sent: Friday, February 23, 2007 9:35 AM
To: Alban Hertroys
Cc: Robert Haas; David Fetter; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

Alban Hertroys wrote:
> Robert Haas wrote:
> > The idea here is that a wolf can attack a sheep, or a wolf can
attack
> > another wolf, but sheep can't attack anything.  I suppose I could
list
> > each wolf in both the predator and prey tables, but that seems a bit
> > duplicative (and causes other problems).
>
> I'm quite certain a wolf is much more likely to attack a sheep than to
> attack another wolf, and even more unlikely to attack for example a
> lion. It seems to me that just the fact that it can isn't enough
> information.
>
> It looks like you need "weighted constraints"; there's 0 chance that a
> sheep attacks a wolf, but there's >0 chance that a wolf attacks a
sheep,
> >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact
> numbers will vary, and I have absolutely no idea what they would be
> like. It probably requires some kind of ranking system that adjusts
> according to the known animals and their likelihood to attack
eachother.

Depending on what you're modelling, even this could be too simple -- for
example, while a single wolf is unlikely to attack a lion, a pack of
wolves have a lot more probability of doing so.

Do you keep packs of wolves in your barn?  If so, watch your lions.

--
Alvaro Herrera
http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: complex referential integrity constraints

From

Alban Hertroys

Date:

26 February 2007, 08:15:32

Robert Haas wrote:
> I don't understand what a weighted constraint would mean.  Either the
> attacker_id can be a wolf, or it can't.  Knowing that it is only 1%
> likely over the long haul is insufficient to disallow any particular
> transaction.

Basically I suggested to combine the constraint with a probability. If
the probability of one animal attacking another is 0% it can't attack
the other animal - that's a strict constraint. The other way around it
can, and you'll also immediately know how likely that is to happen.

An added bonus is that you can generalize certain constraints. If
certain animals are less than - say 25% - likely to attack other certain
 other animals you could determine that the attacked animal is not in
fact prey. An example would probably be wolves attacking other wolves
(or predators in general). For relations that depend on an animal being
prey, a constraint would be that this number be <25%.

In this discussion it is also not entirely defined what attacking means.
Is a ram defending his horde from wolves attacking (wolves)?

I realise this all started from an analogy to a real problem, so most of
this is probably not very relevant to your actual problem. No less
interesting, though.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: complex referential integrity constraints

From

Alban Hertroys

Date:

26 February 2007, 14:57:59

Robert Haas wrote:
> I sort of think that this kind of stuff belongs in a separate table
> somehow.  For example in this case I would want to:

I wasn't suggesting otherwise. A constraint is a constraint; whether it
involves an extra table or not wasn't really relevant until now
(apparently).

> CREATE TABLE attack_probability (
>     attacker_type_id integer not null references animal_type (id),
>     victim_type_id integer not null references animal_type (id),
>     percent_chance integer not null,

Personally I would prefer something that can hold decimals, a numeric
fe. And I usually add cascading behaviour definitions explicitly.

>     primary key (attacker_type_id, victim_type_id)
> );
>
> ...and then declare that the CONSTRAINT on the "maulings" table is that
> if I look up the types of the attacker and victim, that pair of types
> must be present in the attack_probability table with percent_chance > 0.

This was pretty much what I had in mind. Not _exactly_, as I didn't
concern myself with the implementation details yet. But it pretty much
boils down to what you suggested, yes.

> I guess my point here is that I think in your proposal you are letting
> domain-specific data creep into the schema.  It's the job of the schema
> to enforce integrity constraints, but not to know specifically how
> things work.  The fact (if it is a fact) that the chance of one type of
> animal attacking another can be captured as a probability (rather than,
> say, one probability for the day time and another probability for the
> night time, or one probability for each specific animal rather than each
> animal type, or I don't know what's going on and want to infer the
> probabilities from the data after I've gathered it) is domain-specific.
> I don't really want the information about attack probabilities (or
> whatever) to be something that's hardcoded in my schema; I want it to be
> part of the data in the schema, with the schema enforcing such
> constraints on that data as I may see fit to define.  I don't want to
> have to select things out of system tables to find out attack
> probabilities.  Also, as a practical matter, I suspect that such a setup
> would result in an absurdly complex constraint language.

I was merely pointing out the possibility of such a constraint - maybe
"business rule" is more appropriate (but isn't that a constraint as well?).
In the original suggestions this wasn't possible, it simply stated that
wolves cannot attack lions - which isn't necessarily true.

I suppose that's the point I was trying to make in general; the whole
who-attacks-who business isn't that black-and-white - more like
different shades of gray.

In the end you'll have to make a decision about what combinations are
possible depending on what you want to get out of your database. It may
not be necessary to get into this much detail - or it may. Not for me to
say.

> ...Robert

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: complex referential integrity constraints

From

"Robert Haas"

Date:

26 February 2007, 15:21:28

I sort of think that this kind of stuff belongs in a separate table
somehow.  For example in this case I would want to:

CREATE TABLE attack_probability (
    attacker_type_id integer not null references animal_type (id),
    victim_type_id integer not null references animal_type (id),
    percent_chance integer not null,
    primary key (attacker_type_id, victim_type_id)
);

...and then declare that the CONSTRAINT on the "maulings" table is that
if I look up the types of the attacker and victim, that pair of types
must be present in the attack_probability table with percent_chance > 0.

I guess my point here is that I think in your proposal you are letting
domain-specific data creep into the schema.  It's the job of the schema
to enforce integrity constraints, but not to know specifically how
things work.  The fact (if it is a fact) that the chance of one type of
animal attacking another can be captured as a probability (rather than,
say, one probability for the day time and another probability for the
night time, or one probability for each specific animal rather than each
animal type, or I don't know what's going on and want to infer the
probabilities from the data after I've gathered it) is domain-specific.
I don't really want the information about attack probabilities (or
whatever) to be something that's hardcoded in my schema; I want it to be
part of the data in the schema, with the schema enforcing such
constraints on that data as I may see fit to define.  I don't want to
have to select things out of system tables to find out attack
probabilities.  Also, as a practical matter, I suspect that such a setup
would result in an absurdly complex constraint language.

...Robert

-----Original Message-----
From: Alban Hertroys [mailto:alban@magproductions.nl]
Sent: Monday, February 26, 2007 4:15 AM
To: Robert Haas
Cc: David Fetter; pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex referential integrity constraints

Robert Haas wrote:
> I don't understand what a weighted constraint would mean.  Either the
> attacker_id can be a wolf, or it can't.  Knowing that it is only 1%
> likely over the long haul is insufficient to disallow any particular
> transaction.

Basically I suggested to combine the constraint with a probability. If
the probability of one animal attacking another is 0% it can't attack
the other animal - that's a strict constraint. The other way around it
can, and you'll also immediately know how likely that is to happen.

An added bonus is that you can generalize certain constraints. If
certain animals are less than - say 25% - likely to attack other certain
 other animals you could determine that the attacked animal is not in
fact prey. An example would probably be wolves attacking other wolves
(or predators in general). For relations that depend on an animal being
prey, a constraint would be that this number be <25%.

In this discussion it is also not entirely defined what attacking means.
Is a ram defending his horde from wolves attacking (wolves)?

I realise this all started from an analogy to a real problem, so most of
this is probably not very relevant to your actual problem. No less
interesting, though.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //