Distributed keys - Mailing list pgsql-sql

From Michael Glaesemann
Subject Distributed keys
Date
Msg-id E537DC48-3590-11D8-97F0-000A95C88220@myrealbox.com
Whole thread Raw
Responses Re: Distributed keys
List pgsql-sql
Hello all!

An area of the relational database model that is not included in the
SQL standard are distributed keys and distributed foreign keys. A quick
example (Case 1):

employees (id, name);
schools (name, location);
teachers (employee_id, school_name);
subs (employee_id);
managers (employee_id, school_name);

with constraints
unique employees(id)
teachers(employee_id) references employees(id)
teachers(school_name) references schools(name)
subs(employee_id) references employees(id)
managers(employee_id) references employees(id)

The idea is that employees must be in one (and only one) of either
teachers, subs, or managers. Currently, this might be represented in
something like (Case 2)

employees (id, name, employee_type, school_name);
schools (name, location);
employee_types (type);

with constraints
employees(employee_type) references employee_types(type)
employees(school_name) references schools(name)

where employee_types includes "teacher", "sub", and "manager"

Or it might be represented with a number of rules or triggers that
perform all of the necessary checking.

employees(school_name) can't have a not null constraint because if the
employee is a sub, they aren't associated with a school.

Using the terms "distributed key" and "foreign distributed key", in the
first case employee_id is a "distributed key" in that it must occur in
only one of the tables teachers, subs, or managers. Distributed keys
are similar in concept to primary keys—they must be unique. This
guarantees an employee_id in teachers is not found in subs or managers,
an employee_id in subs is not found in managers or teachers, and an
employee_id in managers is not found in subs or teachers.

employees(id) is a foreign distributed key in teachers, subs, and
managers (as employee_id). Foreign distributed keys are similar in
concept to foreign keys in that employees(id) must be referenced by a
single tuple in one of teachers, subs, or managers.

Another use would be in this situation (something I'm working on right
now): I want to link comments by employees by employee_id, but comments
from non-employees by name (as they don't have an id).

comments(id, comment);
comments_nonemployees(comment_id, name);
comments_employees(comment_id, employee_id);

with constraints
comments_nonemployees(comment_id) references comments(id)
comments_employees(comment_id) references comments(id)
and comments(id) must be listed in either
comments_nonemployees(comment_id) or comments_employees(comment_id)

I haven't looked very far into how to implement distributed keys and
foreign distributed keys in PostgreSQL beyond briefly looking at the
pg_constraint system table, thinking a distributed key would be
something making employee_id unique in teachers(employee_id) UNION
subs(employee_id) UNION managers(employee_id). A distributed key is
distributed over a number of tables, rather than a single one, so
there'd have to be a list of relid-attnum pairs, rather than a single
relid-attnum pair, such as conrelid and conkey in pg_constraint. Here's
a brief sketch of the idea:

pg_distributeddistname    name    the name of the distributed key constraintdistrelid    oid    the relid of one of the
tablesinvolved in the  
distributed keysdistkey    int2[]    a list of the attnum of the columns of the table with
oid distrelid involved in the distributed keydistforkey    bool    true if foreign distributed keydistfrelid    oid
ifa foreign distributed key, the relid of the the  
referenced table, else 0distfkey    int2[]    if a foreign distributed key, a list of the attnum of
the columns of the table with oid distfrelid referenced by the foreign
distributed key, else 0

In pg_distributed, distname, distfrelid, distfkey would be the same in
every tuple involved in the distributed key, while distrelid and
distkey would vary. Basically a one-to-many relation. These are just
some thoughts, and the first time I've looked at the PostgreSQL
internals, so I wouldn't be suprised if this approach is wrongheaded.

I know PostgreSQL endeavors to adhere closely to the SQL standard, and
I think this is important as SQL *is* a standard. One of the things
that makes PostgreSQL great in my opinion is that it goes beyond the
standard in some areas where users and developers have found useful,
such as the PostgreSQL rule system. I think distributed key support
would extend the usefulness of PostgreSQL without hindering SQL
conformance.

I'm interested in hearing what others have to say, especially along the
lines of implementation. I haven't seen much discussion of distributed
keys on the lists (other than Josh Berkus), so perhaps there isn't much
interest. Would there be any foreseeable opposition if I (and/or
others) worked on this?

Regards,

Michael Glaesemann
grzm myrealbox com



pgsql-sql by date:

Previous
From: Chris Gamache
Date:
Subject: Historic Query using a view/function ?
Next
From: Tom Lane
Date:
Subject: Re: Distributed keys