Thread: Code tables, conditional foreign keys?

Code tables, conditional foreign keys?

From
Benjamin Smith
Date:
I have some questions about the best way to best use foreign keys in complex
schemas. It's becoming cumbersome to manage a large set of foreign keys - is
there a better way?


// FOUNDATIONAL //


Let's say that you want to keep addresses, and one of the values that you need
to keep is the state. So you have two tables defined:


create table states
( state varchar unique);
create table customers
(... state varchar not null references states(state), ...);


If you want to be a bit more "pure", you might do it like this:


create table states
(id serial primary key, state varchar(2), description varchar);
create table customers
(... states_id integer not null references states(id), ...);


So far, so good. But when you have a large number of fields with foreign key
references, you end up with a bazillion reference tables, all with very
similar layouts. EG:


create table customer_types
(id serial primary key, title varchar(4), description varchar);
create table customer_taxcode
(id serial primary key, title varchar(4), description varchar);
...
create table customers
(...
customer_types_id integer not null references customer_types(id),
customer_taxcode_id integer not null references customer_taxcode(id),
...);


Getting the appropriate code tables from all these different tables becomes
cumbersome, just because there are SO MANY tables to get these values from.


So the next idea is to create a master set of code tables and foreign key to
there, but this has its own set of problems EG:


create table codetables
(
id serial primary key,
table varchar unique not null
);
create table codevalues
(
id serial primary key,
codetables_id integer not null references codetables(id),
value varchar not null,
unique(codetables_id, value)
);
create table customers
(
customer_types_id integer not null references codevalues(id),
customer_taxcode_id integer references codevalues(id),
)


How do you know that taxcode_id references the correct set of code values? You
could use a dual foreign key, but then you have to have a field for each and
every codetable you reference, eg:


insert into codetables(33, 'customertypes');
insert into codevalues(codetables_id, value) values (33, 'Gubbmint');
create table customers
(
customer_types_id integer not null,
customer_taxcode_id integer,
custtypes not null default 33, -- the codetables.id for customer types
taxcodes not null default 34, -- the codetables.id for taxcodes
foreign key (custtypes, customer_types_id)
references codevalues(codetables_id, id),
foreign key (taxcodes, customer_taxcode_id)
references codevalues(codetables_id, id)
);


This also becomes cumbersome. Is there a better way? Is there some way to do
this not covered in the docs? What would be ideal is to treat the reference
something like a join - might be something like:


....
create table customers
(
...
customer_taxcode_id integer not null references codevalues(id) ON
codevalues.codetables_id = 33,
...
)
...



Thanks!


-Ben
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: Code tables, conditional foreign keys?

From
Scott Marlowe
Date:
On Fri, May 22, 2009 at 4:10 PM, Benjamin Smith
<lists@benjamindsmith.com> wrote:
> I have some questions about the best way to best use foreign keys in complex
> schemas. It's becoming cumbersome to manage a large set of foreign keys - is
> there a better way?
>
> // FOUNDATIONAL //
>
> Let's say that you want to keep addresses, and one of the values that you
> need
> to keep is the state. So you have two tables defined:
>
> create table states
> ( state varchar unique);
> create table customers
> (... state varchar not null references states(state), ...);
>
> If you want to be a bit more "pure", you might do it like this:
>
> create table states
> (id serial primary key, state varchar(2), description varchar);
> create table customers
> (... states_id integer not null references states(id), ...);

In this type of instance, where you're looking up mostly static, small
data sets like state names, it's usually better to FK to the actual
value.  The reason is obvious, you don't have to join to the state
table to get the state, it's right there in the customer table.

> So far, so good. But when you have a large number of fields with foreign key
> references, you end up with a bazillion reference tables, all with very
> similar layouts. EG:
>
> create table customer_types
> (id serial primary key, title varchar(4), description varchar);
> create table customer_taxcode
> (id serial primary key, title varchar(4), description varchar);
> ...
> create table customers
> (...
> customer_types_id integer not null references customer_types(id),
> customer_taxcode_id integer not null references customer_taxcode(id),
> ...);
>
> Getting the appropriate code tables from all these different tables becomes
> cumbersome, just because there are SO MANY tables to get these values from.
>
> So the next idea is to create a master set of code tables and foreign key to
> there, but this has its own set of problems EG:

You're going to a LOT of effort to use surrogate keys.

> This also becomes cumbersome. Is there a better way? Is there some way to do
> this not covered in the docs? What would be ideal is to treat the reference
> something like a join - might be something like:

Yeah, natural keys.  They're a natural fit for many of these types of data.

Re: Code tables, conditional foreign keys?

From
Rodrigo E. De León Plicet
Date:
On Fri, May 22, 2009 at 10:38 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, May 22, 2009 at 4:10 PM, Benjamin Smith
> <lists@benjamindsmith.com> wrote:
>>
>> Is there a better way?
>
> Yeah, natural keys.

+1.

Also, what Ben described reeks of EAV.

Ben, please read:

http://joecelkothesqlapprentice.blogspot.com/2006/04/using-one-table-vs-many.html

... and avoid traveling down that (ugly) path.

Good luck.

Re: Code tables, conditional foreign keys?

From
Conrad Lender
Date:
On 23/05/09 06:05, Rodrigo E. De León Plicet wrote:
>>> Is there a better way?
>>
>> Yeah, natural keys.
>
> +1.
>
> Also, what Ben described reeks of EAV.
>
> Ben, please read:
>
> http://joecelkothesqlapprentice.blogspot.com/2006/04/using-one-table-vs-many.html

Ah yes, the great Celko. I had the honor of being "lectured" by him on a
similar topic, and to tell the truth, I'm still unconvinced. I had
inherited a generally well-designed database with a clear relational
structure, apart from one table about which I was uncertain. This table
collected several groups of attributes/values that were not necessarily
related to each other, and could (should?) be split into separate
tables. It was nowhere near as bad as the example in the linked article,
and we did have check constraints in place. The values were used to
classify and sometimes "tag" rows in other tables. I hesitated to break
this table up, because that would have meant

 - doubling the number of existing tables (70 to 140)
 - significant code changes all over the place
 - having a lot of very small tables with identical layout
     "id" (surrogate key)
     "entry" (string; used to refer to this row by the application)
     "label" (string; visible on the front end)

Here's an example of the value groups that were contained in the table:

fax status:
  pending, active, sent, error
department:
  office, accounting, it, legal, experts
deadline type:
  official, unofficial
...

Using the "entry" field as natural keys would have been possible in some
places, but not everywhere, and the labels still had to be editable by
the users. Some of the "entry" strings were rather verbose (>40
characters), which would have made the other tables a lot larger. It
also didn't "feel right" to have so many tiny tables with only 2-5 rows.

Is it really advisable to put all these values into 70 separate tables
with the exact same layout? I don't quite see the benefit.


  - Conrad

Re: Code tables, conditional foreign keys?

From
Scott Marlowe
Date:
On Sat, May 23, 2009 at 12:27 AM, Conrad Lender <crlender@gmail.com> wrote:
> On 23/05/09 06:05, Rodrigo E. De León Plicet wrote:
>>>> Is there a better way?
>>>
>>> Yeah, natural keys.
>>
>> +1.
>>
>> Also, what Ben described reeks of EAV.
>>
>> Ben, please read:
>>
>> http://joecelkothesqlapprentice.blogspot.com/2006/04/using-one-table-vs-many.html
>
> Ah yes, the great Celko. I had the honor of being "lectured" by him on a
> similar topic, and to tell the truth, I'm still unconvinced. I had
> inherited a generally well-designed database with a clear relational
> structure, apart from one table about which I was uncertain. This table
> collected several groups of attributes/values that were not necessarily
> related to each other, and could (should?) be split into separate
> tables. It was nowhere near as bad as the example in the linked article,
> and we did have check constraints in place. The values were used to
> classify and sometimes "tag" rows in other tables. I hesitated to break
> this table up, because that would have meant
>
>  - doubling the number of existing tables (70 to 140)
>  - significant code changes all over the place
>  - having a lot of very small tables with identical layout
>     "id" (surrogate key)
>     "entry" (string; used to refer to this row by the application)
>     "label" (string; visible on the front end)
>
> Here's an example of the value groups that were contained in the table:
>
> fax status:
>  pending, active, sent, error
> department:
>  office, accounting, it, legal, experts
> deadline type:
>  official, unofficial
> ...
>
> Using the "entry" field as natural keys would have been possible in some
> places, but not everywhere, and the labels still had to be editable by
> the users. Some of the "entry" strings were rather verbose (>40
> characters), which would have made the other tables a lot larger. It
> also didn't "feel right" to have so many tiny tables with only 2-5 rows.
>
> Is it really advisable to put all these values into 70 separate tables
> with the exact same layout? I don't quite see the benefit.

I've used EAV for some stuff, and it definitely has its place, but if
the data set in an EAV tableset is large enough, it becomes unweildy
quickly.  For large amounts of data, it's usually better to bite the
bullet and make a table for it.  Since the data in EAV tables have no
strong meaning in and of themself, it's often hard for another
developer to grok what someone was doing with those data.

However, the whole point of using natural keys is that if you need the
attributes from the master table that's FK'ed to a natural key lookup
table, then you don't have to actually hit any other table, just the
main one.  So, small to medium finite and well defined sets like
states etc. are coherent, but cost little retrieve.

Sure there's bound to be large less well defined sets that you don't
always want, and for those a surrogate key works just fine.

PostgreSQL doesn't do as well as some dbs with huge star schemas,
where everything is in its own table and you always join for
everything, once the number of tables gets really large.

I have a great deal of respect for Celko.  I don't always agree with
everything he says, but most of it makes a lot of sense to me.

Re: Code tables, conditional foreign keys?

From
Alban Hertroys
Date:
> Here's an example of the value groups that were contained in the
> table:
>
> fax status:
>  pending, active, sent, error
> department:
>  office, accounting, it, legal, experts
> deadline type:
>  official, unofficial
> ...

> Is it really advisable to put all these values into 70 separate tables
> with the exact same layout? I don't quite see the benefit.

You could use the ENUM type for that (http://www.postgresql.org/docs/current/static/datatype-enum.html
), although that works best if these values are really static. If
users should be able to edit them they're probably not the best choice.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4a17ca8210091499713462!



Re: Code tables, conditional foreign keys?

From
Conrad Lender
Date:
On 23/05/09 09:34, Scott Marlowe wrote:
> I have a great deal of respect for Celko.  I don't always agree with
> everything he says, but most of it makes a lot of sense to me.

I didn't intend any disrespect to Joe Celko. I have read a number of his
articles, which tend to be well written and informative. Last year, when
I posted to comp.databases asking for advice on whether to refactor that
table, he wrote "You will have to throw it all out and start over with a
relational design", "Throw away the idiot who did the EAV. This is not a
good design -- in fact, it is not a design at all", and "This is basic
stuff!!" Then he copied the same EAV example that was linked earlier by
Rodrigo, claiming that "someone like me" had suggested it. With all the
respect I have for Mr. Celko, that was hardly helpful, as that example
and the situation I had described were quite different. It also did not
encourage me to follow his advice and start from scratch (and fire my
boss, who was the mentioned "idiot").

I understand the problems that can arise from bad design choices, and I
know that Celko is vehemently opposed to anything that resembles EAV,
but I felt that in our case "throwing it all away" would be excessive.
We had safeguards to ensure referential integrity, and keeping the
values in the same table allowed us to let users manage them all with
the same form. So I guess it's like Stefan Keller said in a different
thread today: "Know when to break the rules."


  - Conrad

Re: Code tables, conditional foreign keys?

From
Lew
Date:
Conrad Lender wrote:
> I didn't intend any disrespect to Joe Celko. I have read a number of his
> articles, which tend to be well written and informative. Last year, when
> I posted to comp.databases asking for advice on whether to refactor that
> table, he wrote "You will have to throw it all out and start over with a
> relational design", "Throw away the idiot who did the EAV. This is not a
> good design -- in fact, it is not a design at all", and "This is basic
> stuff!!" Then he copied the same EAV example that was linked earlier by
> Rodrigo, claiming that "someone like me" had suggested it. With all the
> respect I have for Mr. Celko, that was hardly helpful, as that example
> and the situation I had described were quite different. It also did not
> encourage me to follow his advice and start from scratch (and fire my
> boss, who was the mentioned "idiot").

If we fired every boss who actually is an idiot there would be about half the
number of bosses.

All kidding aside, why is the boss specifying a database architecture?  That
is not the boss's job.

> I understand the problems that can arise from bad design choices, and I
> know that Celko is vehemently opposed to anything that resembles EAV,

For good reasons.

> but I felt that in our case "throwing it all away" would be excessive.

Perhaps not.  I had a situation some years ago where a supervisor would not
let me normalize a database and consequently the project nearly failed.
Fortunately, the company assigned a new team lead/project manager who did the
normalization or it would have been a disaster.  Trying to make a bad approach
work is often, if not always, more expensive than replacing it with a good
approach.

> We had safeguards to ensure referential integrity, and keeping the
> values in the same table allowed us to let users manage them all with
> the same form. So I guess it's like Stefan Keller said in a different
> thread today: "Know when to break the rules."

Managing all the values in the same form is not intrinsically connected to
whether one stores the values in an EAV layout.

Telling oneself that one should know when to break the rules is not the same
as knowing when to break the rules.  They are the rules for good reason.

All I'm saying is that EAV is a very problematic approach.  I've been on
projects that tried to use it, and while that didn't make me an expert on the
matter by any means, it gave me some cause to trust Mr. Celko's opinion on the
matter.

--
Lew

Re: Code tables, conditional foreign keys?

From
Joseph Conway
Date:
Conrad Lender wrote:
> So I guess it's like Stefan Keller said in a different
> thread today: "Know when to break the rules."

Amen. As I have seen posted on another recent thread, anyone who tells
you that one way is *always* correct, and the other is *always*
incorrect, is lying.

Joe


Re: Code tables, conditional foreign keys?

From
Benjamin Smith
Date:
"A deep unwavering belief is a sure sign that you're missing something."
                                                            -- Unknown


I had no intention of sparking an ideological discussion.


I read Joe's article reference previously - a simple case for using a normalized database. I would pretty much agree with his assessment from beginning to end. However, he really doesn't address my scenario at all. I'm not trying to mash my database together into a single table, I'm trying to deal with the fact that we have hundreds of tables with nearly identical syntax, but further, deal with the concept of "code tables".


See our product has to work in many venues, and each venue has their own set of code-table data that they'd like to support. Worse, they often use similar values for the different things, so the "natural key" is just not natural. Sometimes I've seen venues "re-using" the old code table value from previous years to mean new things in current/future years. Yes, this is a bad, bad, bad idea but it was still there and it's still my job to deal with it.


Surrogate keys are used to make sure that 15 to mean "BS College Degree" in venue A aren't confused with 15 to mean "No High School Education" in another venue. They cover a similar value, EG: applicant's educational level. Some values don't translate at all, (EG: differing representations of vocational arts) so using our own code table set and then translating doesn't work consistently, either.


So we have multiple, distinct sets of data to be used within a single field. Either that, or we create massive data tables with every possible different set of otherwise similar data, each of which has a foreign key to a table with a slightly different name, which is, far and away, even uglier. (EG: applicants.ca_edlevel, applicants.or_edlevel applicants.nv_edlevel, applicants.southca_edlevel...)


educational level is one example, there are hundreds that we have to deal with!


So.... back to the first question: is there a way to have a conditional foreign key?


On Saturday 23 May 2009 17:22:36 Lew wrote:
> Conrad Lender wrote:
> > I didn't intend any disrespect to Joe Celko. I have read a number of his
> > articles, which tend to be well written and informative. Last year, when
> > I posted to comp.databases asking for advice on whether to refactor that
> > table, he wrote "You will have to throw it all out and start over with a
> > relational design", "Throw away the idiot who did the EAV. This is not a
> > good design -- in fact, it is not a design at all", and "This is basic
> > stuff!!" Then he copied the same EAV example that was linked earlier by
> > Rodrigo, claiming that "someone like me" had suggested it. With all the
> > respect I have for Mr. Celko, that was hardly helpful, as that example
> > and the situation I had described were quite different. It also did not
> > encourage me to follow his advice and start from scratch (and fire my
> > boss, who was the mentioned "idiot").
>
> If we fired every boss who actually is an idiot there would be about half
> the number of bosses.
>
> All kidding aside, why is the boss specifying a database architecture?
> That is not the boss's job.
>
> > I understand the problems that can arise from bad design choices, and I
> > know that Celko is vehemently opposed to anything that resembles EAV,
>
> For good reasons.
>
> > but I felt that in our case "throwing it all away" would be excessive.
>
> Perhaps not. I had a situation some years ago where a supervisor would not
> let me normalize a database and consequently the project nearly failed.
> Fortunately, the company assigned a new team lead/project manager who did
> the normalization or it would have been a disaster. Trying to make a bad
> approach work is often, if not always, more expensive than replacing it
> with a good approach.
>
> > We had safeguards to ensure referential integrity, and keeping the
> > values in the same table allowed us to let users manage them all with
> > the same form. So I guess it's like Stefan Keller said in a different
> > thread today: "Know when to break the rules."
>
> Managing all the values in the same form is not intrinsically connected to
> whether one stores the values in an EAV layout.
>
> Telling oneself that one should know when to break the rules is not the
> same as knowing when to break the rules. They are the rules for good
> reason.
>
> All I'm saying is that EAV is a very problematic approach. I've been on
> projects that tried to use it, and while that didn't make me an expert on
> the matter by any means, it gave me some cause to trust Mr. Celko's opinion
> on the matter.
>
> --
> Lew
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: Code tables, conditional foreign keys?

From
Conrad Lender
Date:
On 26/05/09 20:48, Benjamin Smith wrote:
> "A deep unwavering belief is a sure sign that you're missing
> something." -- Unknown
>
> I had no intention of sparking an ideological discussion.

I know, my apologies for going off-topic. I just had a deja-vu when I
saw Celko's article about EAV disasters mentioned again in a very
similar context.

> So.... back to the first question: is there a way to have a
> conditional foreign key?

I can only suggest what we've done in this situation. We had a table
setup similar to yours (tables like "customer" referencing many small
sets like customer type or education level, with the exact same
structure). All of the small sets were combined in one table (which is
what reminded people of EAV design). Using your original example ...

create table codetables
        (
        id serial primary key,
        name varchar unique not null
        );
create table codevalues
        (
        id serial primary key,
        codetables_id integer not null references codetables(id),
        value varchar not null,
        unique(codetables_id, value)
        );
create table customers
        (
        customer_types_id integer not null references codevalues(id),
        customer_taxcode_id integer references codevalues(id),
        )

... you need to make sure that customer_types_id references the correct
codetable set within codevalues. To do this, we added CHECK constraints
in our tables:

CREATE TABLE customer (
    ...
    customer_type_id  INT    NOT NULL,

    -- this is the standard FK to codevalues
    CONSTRAINT fk_customer_type_id
        FOREIGN KEY (customer_type_id)
        REFERENCES codevalues (id),

    -- this makes sure that the correct set is referenced
    CONSTRAINT check_customer_type
        CHECK (belongs_to_codetable('customer_type', customer_type_id))
);

CREATE FUNCTION belongs_to_codetable (VARCHAR(255), INT)
RETURNS BOOLEAN
AS '
    SELECT EXISTS (
        SELECT 1
          FROM codetables ct
          JOIN codevalues cv
            ON cv.codetables_id = ct.id
           AND ct.name = $1
           AND cv.id = $2
    )
' LANGUAGE 'SQL';

We used different names, so this is untested, but in principle it should
do what you require.

Whether this is a good design or not... I'm still not sure. Joe Celko
would grill me for doing something like this.


  - Conrad