Thread: Referential integrity (foreign keys) across multiple tables

Referential integrity (foreign keys) across multiple tables

From
Richard Jones
Date:
Simplified schema:
 create table hosts (   id serial primary key,   hostname text not null );
 create table pages (   id serial primary key,   hostid int not null references hosts (id),   url text not null,
unique(hostid, url) );
 
 create table page_contents (   pageid int not null references pages (id),   section text not null );

(There are many hosts, many pages per host, and many page_contents
sections per page).

Now I want to add a column to page_contents, say called link_name,
which is going to reference the pages.url column for the particular
host that this page belongs to.

Something like:
 alter table page_contents add link_name text; alter table page_contents   add constraint foo foreign key (p.hostid,
link_name)  references pages (hostid, url)   where p.id = pageid;
 

Obviously that second statement isn't going to compile.

I don't want to add the hostid column to page_contents table because I
have a lot of old code accessing the database which would be hard to
change (the old code would no longer be able to insert page_contents
rows).

Is this possible somehow?  Perhaps by adding a second table?  Do I
have to use triggers, and if so is that as robust as referential
integrity?

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


Re: Referential integrity (foreign keys) across multiple tables

From
Bruno Wolff III
Date:
On Sat, Jul 22, 2006 at 14:32:57 +0100, Richard Jones <rich@annexia.org> wrote:
> 
> Now I want to add a column to page_contents, say called link_name,
> which is going to reference the pages.url column for the particular
> host that this page belongs to.

What are you trying to accomplish by this? The information is available
by doing a join. If you are trying to simplify things for applications,
you can probably do it with a view or rules depending on whether you
want to have an updatable view. If you are denormalizing for performance
and want constraints to maintain consistancy, then you probably want
to push the hostid down to page_contents as well as the url. These could
both be set with a trigger. (I think a rule could be used as well.)


Re: Referential integrity (foreign keys) across multiple tables

From
Richard Jones
Date:
On Sun, Jul 23, 2006 at 01:32:37PM -0500, Bruno Wolff III wrote:
> On Sat, Jul 22, 2006 at 14:32:57 +0100,
>   Richard Jones <rich@annexia.org> wrote:
> > 
> > Now I want to add a column to page_contents, say called link_name,
> > which is going to reference the pages.url column for the particular
> > host that this page belongs to.
> 
> What are you trying to accomplish by this?

Data integrity.

> The information is available by doing a join. If you are trying to
> simplify things for applications, you can probably do it with a view
> or rules depending on whether you want to have an updatable view. If
> you are denormalizing for performance and want constraints to
> maintain consistancy, then you probably want to push the hostid down
> to page_contents as well as the url. These could both be set with a
> trigger. (I think a rule could be used as well.)

So if I get this right, I should use a trigger to ensure that the old
code causes the hostid field to be set in page_contents?

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


Re: Referential integrity (foreign keys) across multiple tables

From
"Aaron Bono"
Date:
On 7/22/06, Richard Jones <rich@annexia.org> wrote:

Simplified schema:

  create table hosts (
    id serial primary key,
    hostname text not null
  );

  create table pages (
    id serial primary key,
    hostid int not null references hosts (id),
    url text not null,
    unique (hostid, url)
  );

  create table page_contents (
    pageid int not null references pages (id),
    section text not null
  );

(There are many hosts, many pages per host, and many page_contents
sections per page).

Now I want to add a column to page_contents, say called link_name,
which is going to reference the pages.url column for the particular
host that this page belongs to.

Something like:

  alter table page_contents add link_name text;
  alter table page_contents
    add constraint foo foreign key (p.hostid, link_name)
    references pages (hostid, url)
    where p.id = pageid;

 
The easiest, and not necessarily elegant, way to acomplish this is to create linkid rather than link_name and make it a foreign key to pages.id.  Then add a trigger that checks to make sure the pages you link to from page_contents to pages is for the same host.  If not, raise an exception.

Another option is to do this:

 create table page_contents (
   hostid int not null,
   url text not null,
   linkhostid int,
   linkurl text,
   section text not null,
   foreign key (hostid, url) references pages (hostid, url),
   foreign key (linkhostid, linkurl) references pages (hostid, url)
 );

Or if you really want to restructure things:

 create table hosts (
   id serial primary key,
   hostname text not null
 );

 create table pages (
   id serial primary key,
   url text not null,
   unique (url)
 );

 create table page_contents (
   pageid int not null references pages (id),
   hostsid int not null references hosts (id),
   linkpageid int references pages(id),
   section text not null
 );

That should give you some options to play with.

As a side comment, you should also name your primary key columns more meaningfully.  Use hosts.hostsid and pages.pagesid, not hosts.id and pages.id.  When you begin writing large queries, the use of the column name id all over the place will make your queries more prone to error, harder to read and harder to write.

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
==================================================================

Re: Referential integrity (foreign keys) across multiple tables

From
Bruno Wolff III
Date:
On Mon, Jul 24, 2006 at 09:59:07 +0100, Richard Jones <rich@annexia.org> wrote:
> On Sun, Jul 23, 2006 at 01:32:37PM -0500, Bruno Wolff III wrote:
> > On Sat, Jul 22, 2006 at 14:32:57 +0100,
> >   Richard Jones <rich@annexia.org> wrote:
> > > 
> > > Now I want to add a column to page_contents, say called link_name,
> > > which is going to reference the pages.url column for the particular
> > > host that this page belongs to.
> > 
> > What are you trying to accomplish by this?
> 
> Data integrity.

This doesn't make sense in isolation. If that is all you are trying to do,
then you don't need to do anything to the database design as the information
is already there. The application just needs to do a join when querying the
data.

> > The information is available by doing a join. If you are trying to
> > simplify things for applications, you can probably do it with a view
> > or rules depending on whether you want to have an updatable view. If
> > you are denormalizing for performance and want constraints to
> > maintain consistancy, then you probably want to push the hostid down
> > to page_contents as well as the url. These could both be set with a
> > trigger. (I think a rule could be used as well.)
> 
> So if I get this right, I should use a trigger to ensure that the old
> code causes the hostid field to be set in page_contents?

No unless you are trying to do something else in addition to maintaining
data integrity.


Re: Referential integrity (foreign keys) across multiple tables

From
Richard Jones
Date:
On Mon, Jul 24, 2006 at 12:51:48PM -0500, Bruno Wolff III wrote:
> On Mon, Jul 24, 2006 at 09:59:07 +0100,
>   Richard Jones <rich@annexia.org> wrote:
> > On Sun, Jul 23, 2006 at 01:32:37PM -0500, Bruno Wolff III wrote:
> > > On Sat, Jul 22, 2006 at 14:32:57 +0100,
> > >   Richard Jones <rich@annexia.org> wrote:
> > > > 
> > > > Now I want to add a column to page_contents, say called link_name,
> > > > which is going to reference the pages.url column for the particular
> > > > host that this page belongs to.
> > > 
> > > What are you trying to accomplish by this?
> > 
> > Data integrity.
> 
> This doesn't make sense in isolation. If that is all you are trying to do,
> then you don't need to do anything to the database design as the information
> is already there. The application just needs to do a join when querying the
> data.

I'm not sure what this means.  By "data integrity" I just meant that I
don't want applications to create page_contents.link_name fields which
could point to non-existent URLs.  (A URL consists of a particular
hostid and url, since you're not allowed to have one host pointing to
pages on another, and this is where the requirement for a foreign key
which spans two tables comes from).  Perhaps I meant "data
consistency"?  Anyway without some sort of check, be it a reference or
a trigger -- assuming a trigger is possible -- then an application
might create such a bad link.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


Re: Referential integrity (foreign keys) across multiple tables

From
Richard Jones
Date:
On Mon, Jul 24, 2006 at 12:26:15PM -0500, Aaron Bono wrote:
> On 7/22/06, Richard Jones <rich@annexia.org> wrote:
> >  create table hosts (
> >    id serial primary key,
> >    hostname text not null
> >  );
> >
> >  create table pages (
> >    id serial primary key,
> >    hostid int not null references hosts (id),
> >    url text not null,
> >    unique (hostid, url)
> >  );
> >
> >  create table page_contents (
> >    pageid int not null references pages (id),
> >    section text not null
> >  );
> >
> >(There are many hosts, many pages per host, and many page_contents
> >sections per page).
> >
> >Now I want to add a column to page_contents, say called link_name,
> >which is going to reference the pages.url column for the particular
> >host that this page belongs to.
> >
> >Something like:
> >
> >  alter table page_contents add link_name text;
> >  alter table page_contents
> >    add constraint foo foreign key (p.hostid, link_name)
> >    references pages (hostid, url)
> >    where p.id = pageid;
> 
> The easiest, and not necessarily elegant, way to acomplish this is to create
> linkid rather than link_name and make it a foreign key to pages.id.

Unfortunately this isn't possible :-(  My schema above is simplified a
little too far.  In the real schema we keep old versions of pages
around in the pages table (we move the 'url' field to a 'url_deleted'
field so that the unique (hostid, url) isn't violated by the new
version of the page).  This means that the pageid can be updated, so
link_name must store a url, not a pageid.

> Then add a trigger that checks to make sure the pages you link to
> from page_contents to pages is for the same host.  If not, raise an
> exception.

I think though that this suggestion is right.  I'm not sure what
difference it makes if it's link_name or linkid, but it looks like
I'll have to write a trigger for this.  It doesn't seem like there's a
way using just ordinary foreign keys.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


Re: Referential integrity (foreign keys) across multiple tables

From
Richard Huxton
Date:
Richard Jones wrote:
> Simplified schema:
> 
>   create table hosts (
>     id serial primary key,
>     hostname text not null
>   );
> 
>   create table pages (
>     id serial primary key,
>     hostid int not null references hosts (id),
>     url text not null,
>     unique (hostid, url)
>   );
> 
>   create table page_contents (
>     pageid int not null references pages (id),
>     section text not null
>   );
> 
> (There are many hosts, many pages per host, and many page_contents
> sections per page).
> 
> Now I want to add a column to page_contents, say called link_name,
> which is going to reference the pages.url column for the particular
> host that this page belongs to.

Ah! It's only from reading later messages that I realise you're trying 
to ensure that a link in page_contents can only reference pages on the 
same host as itself.

> Something like:
> 
>   alter table page_contents add link_name text;
>   alter table page_contents
>     add constraint foo foreign key (p.hostid, link_name)
>     references pages (hostid, url)
>     where p.id = pageid;
> 
> Obviously that second statement isn't going to compile.
> 
> I don't want to add the hostid column to page_contents table because I
> have a lot of old code accessing the database which would be hard to
> change (the old code would no longer be able to insert page_contents
> rows).

I'm sure you know, but for the benefit of the list the problem here is 
that the surrogate primary-key on "pages" has concealed valuable 
information (i.e. the hostid) from the "page_contents" table. 
Unfortunate that you can't update the application [* see rant below]

> Is this possible somehow?  Perhaps by adding a second table?  Do I
> have to use triggers, and if so is that as robust as referential
> integrity?

Write triggers to enforce the host dependency. Just make a table of all 
the columns involved on all three tables and decide what (if anything) 
should happen when a value is inserted/updated/deleted. Write your 
triggers then write a short test script. Oh, test it with concurrent 
transactions too - just to make sure it's doing what you'd expect.

The only reason hand-built triggers would be less reliable than built-in 
foreign-keys is because they'll have had less testing.

The other alternative would be to add the hostid column to page_contents 
and use a BEFORE trigger to set it based on the pageid the application 
inserts. Then you can have a straightforward foreign-key. If your 
application can't cope with having a column added to the table you might 
have to mask this with a view.

[*minor rant follows:
Why is it that when changes to the database schema break applications 
because the column-names are hard-wired, it is the RDBMS that is at 
fault? I'm prepared to bet good money (up to the value of say, a whole 
pound) that the primary key is just being used anonymously in the 
application and that if the database interface had been able to refer to 
it as "column(s) referencing pages primary-key" then Richard wouldn't 
have a problem to post about.
And it's relational databases that aren't "agile" enough for the modern 
world, not most of your programming languages.
]

--   Richard Huxton  Archonet Ltd


Re: Referential integrity (foreign keys) across multiple tables

From
Bruno Wolff III
Date:
On Mon, Jul 24, 2006 at 18:53:20 +0100, Richard Jones <rich@annexia.org> wrote:
> 
> I'm not sure what this means.  By "data integrity" I just meant that I
> don't want applications to create page_contents.link_name fields which
> could point to non-existent URLs.  (A URL consists of a particular
> hostid and url, since you're not allowed to have one host pointing to
> pages on another, and this is where the requirement for a foreign key
> which spans two tables comes from).  Perhaps I meant "data
> consistency"?  Anyway without some sort of check, be it a reference or
> a trigger -- assuming a trigger is possible -- then an application
> might create such a bad link.

I think part of the problem might have been over simplification of the
problem. In your example there was no reason to create that new column since
the information was available by doing a join between two tables and this
would prevent a problem with data being out of sync.

I see from another message in this thread that there is really more to
this example. It may be that you still don't need the new column, but I
haven't examined the new example carefully to say one way or the other.