Re: Distributed keys - Mailing list pgsql-sql

From Chris Travers
Subject Re: Distributed keys
Date
Msg-id 008c01c3ca91$9b3b2880$1d00053d@winxp
Whole thread Raw
In response to Distributed keys  (Michael Glaesemann <grzm@myrealbox.com>)
List pgsql-sql
It seems to me that if the inheritance/fireign key behavior was changed so
that foreign key constraints could exist within the entire inheritance tree,
this problem would be solved.

According to previous posts, the behavior will probably change at some point
but does not appear to be a priority at the moment.

If it were possible to allow FK constraints to work against the inheritance
tree rather than a single table therein you could have managers, teachers,
and subs as tables inherited tables from employees and the problem would be
solved.

Currently a workaround I can see is:
Hide the actual tables in a shadow schema, and inherit as above.
Have each table be represented as a view in the public schema joining the
table to another table storing the employee unique identifiers.  Place
unique constraints on the unique identifiers table.  Create rules for
inserting, updating, and deleting the records.  Have the Employee view
search the entire inheritance tree.

However, this is assuming that the data you are storing for the employees
differs substantially depending on position.  If this is not the case, you
would do better by having a single employee table and include a field
indicating whether the employee is a manager, teacher, or sub.

Best Wishes,
Chris Travers


----- Original Message -----
From: "Michael Glaesemann" <grzm@myrealbox.com>
To: <pgsql-sql@postgresql.org>
Sent: Wednesday, December 24, 2003 4:42 AM
Subject: [SQL] Distributed keys


Hello all!

An area of the relational database model that is not included in the
SQL standard are distributed keys and distributed foreign keys. A quick
example (Case 1):

employees (id, name);
schools (name, location);
teachers (employee_id, school_name);
subs (employee_id);
managers (employee_id, school_name);

with constraints
unique employees(id)
teachers(employee_id) references employees(id)
teachers(school_name) references schools(name)
subs(employee_id) references employees(id)
managers(employee_id) references employees(id)

The idea is that employees must be in one (and only one) of either
teachers, subs, or managers. Currently, this might be represented in
something like (Case 2)

employees (id, name, employee_type, school_name);
schools (name, location);
employee_types (type);

with constraints
employees(employee_type) references employee_types(type)
employees(school_name) references schools(name)

where employee_types includes "teacher", "sub", and "manager"

Or it might be represented with a number of rules or triggers that
perform all of the necessary checking.

employees(school_name) can't have a not null constraint because if the
employee is a sub, they aren't associated with a school.

Using the terms "distributed key" and "foreign distributed key", in the
first case employee_id is a "distributed key" in that it must occur in
only one of the tables teachers, subs, or managers. Distributed keys
are similar in concept to primary keys—they must be unique. This
guarantees an employee_id in teachers is not found in subs or managers,
an employee_id in subs is not found in managers or teachers, and an
employee_id in managers is not found in subs or teachers.

employees(id) is a foreign distributed key in teachers, subs, and
managers (as employee_id). Foreign distributed keys are similar in
concept to foreign keys in that employees(id) must be referenced by a
single tuple in one of teachers, subs, or managers.

Another use would be in this situation (something I'm working on right
now): I want to link comments by employees by employee_id, but comments
from non-employees by name (as they don't have an id).

comments(id, comment);
comments_nonemployees(comment_id, name);
comments_employees(comment_id, employee_id);

with constraints
comments_nonemployees(comment_id) references comments(id)
comments_employees(comment_id) references comments(id)
and comments(id) must be listed in either
comments_nonemployees(comment_id) or comments_employees(comment_id)

I haven't looked very far into how to implement distributed keys and
foreign distributed keys in PostgreSQL beyond briefly looking at the
pg_constraint system table, thinking a distributed key would be
something making employee_id unique in teachers(employee_id) UNION
subs(employee_id) UNION managers(employee_id). A distributed key is
distributed over a number of tables, rather than a single one, so
there'd have to be a list of relid-attnum pairs, rather than a single
relid-attnum pair, such as conrelid and conkey in pg_constraint. Here's
a brief sketch of the idea:

pg_distributed
distname name the name of the distributed key constraint
distrelid oid the relid of one of the tables involved in the
distributed keys
distkey int2[] a list of the attnum of the columns of the table with
oid distrelid involved in the distributed key
distforkey bool true if foreign distributed key
distfrelid oid if a foreign distributed key, the relid of the the
referenced table, else 0
distfkey int2[] if a foreign distributed key, a list of the attnum of
the columns of the table with oid distfrelid referenced by the foreign
distributed key, else 0

In pg_distributed, distname, distfrelid, distfkey would be the same in
every tuple involved in the distributed key, while distrelid and
distkey would vary. Basically a one-to-many relation. These are just
some thoughts, and the first time I've looked at the PostgreSQL
internals, so I wouldn't be suprised if this approach is wrongheaded.

I know PostgreSQL endeavors to adhere closely to the SQL standard, and
I think this is important as SQL *is* a standard. One of the things
that makes PostgreSQL great in my opinion is that it goes beyond the
standard in some areas where users and developers have found useful,
such as the PostgreSQL rule system. I think distributed key support
would extend the usefulness of PostgreSQL without hindering SQL
conformance.

I'm interested in hearing what others have to say, especially along the
lines of implementation. I haven't seen much discussion of distributed
keys on the lists (other than Josh Berkus), so perhaps there isn't much
interest. Would there be any foreseeable opposition if I (and/or
others) worked on this?

Regards,

Michael Glaesemann
grzm myrealbox com


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)





pgsql-sql by date:

Previous
From: "Iain"
Date:
Subject: not in vs not exists - vastly diferent performance
Next
From: "Andy Lewis"
Date:
Subject: Point and function help