Distributed keys - Mailing list pgsql-sql
From | Michael Glaesemann |
---|---|
Subject | Distributed keys |
Date | |
Msg-id | E537DC48-3590-11D8-97F0-000A95C88220@myrealbox.com Whole thread Raw |
Responses |
Re: Distributed keys
|
List | pgsql-sql |
Hello all! An area of the relational database model that is not included in the SQL standard are distributed keys and distributed foreign keys. A quick example (Case 1): employees (id, name); schools (name, location); teachers (employee_id, school_name); subs (employee_id); managers (employee_id, school_name); with constraints unique employees(id) teachers(employee_id) references employees(id) teachers(school_name) references schools(name) subs(employee_id) references employees(id) managers(employee_id) references employees(id) The idea is that employees must be in one (and only one) of either teachers, subs, or managers. Currently, this might be represented in something like (Case 2) employees (id, name, employee_type, school_name); schools (name, location); employee_types (type); with constraints employees(employee_type) references employee_types(type) employees(school_name) references schools(name) where employee_types includes "teacher", "sub", and "manager" Or it might be represented with a number of rules or triggers that perform all of the necessary checking. employees(school_name) can't have a not null constraint because if the employee is a sub, they aren't associated with a school. Using the terms "distributed key" and "foreign distributed key", in the first case employee_id is a "distributed key" in that it must occur in only one of the tables teachers, subs, or managers. Distributed keys are similar in concept to primary keys—they must be unique. This guarantees an employee_id in teachers is not found in subs or managers, an employee_id in subs is not found in managers or teachers, and an employee_id in managers is not found in subs or teachers. employees(id) is a foreign distributed key in teachers, subs, and managers (as employee_id). Foreign distributed keys are similar in concept to foreign keys in that employees(id) must be referenced by a single tuple in one of teachers, subs, or managers. Another use would be in this situation (something I'm working on right now): I want to link comments by employees by employee_id, but comments from non-employees by name (as they don't have an id). comments(id, comment); comments_nonemployees(comment_id, name); comments_employees(comment_id, employee_id); with constraints comments_nonemployees(comment_id) references comments(id) comments_employees(comment_id) references comments(id) and comments(id) must be listed in either comments_nonemployees(comment_id) or comments_employees(comment_id) I haven't looked very far into how to implement distributed keys and foreign distributed keys in PostgreSQL beyond briefly looking at the pg_constraint system table, thinking a distributed key would be something making employee_id unique in teachers(employee_id) UNION subs(employee_id) UNION managers(employee_id). A distributed key is distributed over a number of tables, rather than a single one, so there'd have to be a list of relid-attnum pairs, rather than a single relid-attnum pair, such as conrelid and conkey in pg_constraint. Here's a brief sketch of the idea: pg_distributeddistname name the name of the distributed key constraintdistrelid oid the relid of one of the tablesinvolved in the distributed keysdistkey int2[] a list of the attnum of the columns of the table with oid distrelid involved in the distributed keydistforkey bool true if foreign distributed keydistfrelid oid ifa foreign distributed key, the relid of the the referenced table, else 0distfkey int2[] if a foreign distributed key, a list of the attnum of the columns of the table with oid distfrelid referenced by the foreign distributed key, else 0 In pg_distributed, distname, distfrelid, distfkey would be the same in every tuple involved in the distributed key, while distrelid and distkey would vary. Basically a one-to-many relation. These are just some thoughts, and the first time I've looked at the PostgreSQL internals, so I wouldn't be suprised if this approach is wrongheaded. I know PostgreSQL endeavors to adhere closely to the SQL standard, and I think this is important as SQL *is* a standard. One of the things that makes PostgreSQL great in my opinion is that it goes beyond the standard in some areas where users and developers have found useful, such as the PostgreSQL rule system. I think distributed key support would extend the usefulness of PostgreSQL without hindering SQL conformance. I'm interested in hearing what others have to say, especially along the lines of implementation. I haven't seen much discussion of distributed keys on the lists (other than Josh Berkus), so perhaps there isn't much interest. Would there be any foreseeable opposition if I (and/or others) worked on this? Regards, Michael Glaesemann grzm myrealbox com