Determining logically unique entities across many partially complete rows where at least one column matches - Mailing list pgsql-sql
From | Jamie Tufnell |
---|---|
Subject | Determining logically unique entities across many partially complete rows where at least one column matches |
Date | |
Msg-id | b0a4f3350908102319w2e536f1eyd29609f0effff77c@mail.gmail.com Whole thread Raw |
Responses |
Re: Determining logically unique entities across many partially
complete rows where at least one column matches
|
List | pgsql-sql |
Hi, I am faced with a modeling problem and thought I'd see if anyone has run into something similar and can offer some advice. Basically my problem domain is cataloguing "snippets of information" about "entities" which are loosely identified. Entities can be identified up to 3 different methods (email, phone or openid.) Entities can have zero or many emails, phone numbers and openids. The only restriction is they must have at least one value in one of those three columns. Some sample data: snippet #1 email: null phone: +1234567890 openid: john@myopenid.net information: This is snippet #1 snippet #2 email: foo@bar.com phone: null openid: johnny.name information: This is snippet #2 At this point snippet #1 and #2 could refer to different entities. snippet #3 email: bar@baz.com phone: +1234567890 openid: johnny.name information: This is snippet #3 But now all three snippets definitely refer to the same entity, as far as we're concerned: Entity: 1 OpenIDs: johnny.name, john@myopenid.net Phones: +1234567890 Emails: foo@bar.com, bar@baz.com So as far as modeling this goes, I'm stuck between: 1. Normalizing as usual with some serious triggers to maintain the relationships. 2. Just having a snippets table with these fields inline and make these inferences at query time. 3. Something in between. 4. Using a document store like CouchDB. The kinds of queries I need to do right now (which will no doubt change): * Return all snippets. * Return all distinct entities. * Find all id for a distinct entity given a single piece of id. * Find all snippets for a distinct entity. To do it in one table, I am thinking something like this: create table snippets ( id serial not null primary key, email text, phone_number text, openid text, information text notnull, check (email is not null or phone_number is not null or openid is not null) ); with queries like: * Find all snippets for one distinct entity, searching by openid: select * from snippets where phone_number = (select phone_number from snippets where openid = 'john@myopenid.net') or email = (select email from snippets where openid = 'john@myopenid.net') or openid in (select openid from snippets where phone_number = (select phone_number from snippets where openid = 'john@myopenid.net') or email = (select email from snippets where openid = 'john@myopenid.net')); Or if I was to model as usual I am thinking something like this: create table entities ( id serial not null primary key ); create table entity_has_email ( entity_id integer not null references entities (id) on delete cascade on update cascade,email text not null unique ); create table entity_has_phone_number ( entity_id integer not null references entities (id) on delete cascade on updatecascade, phone_number text not null unique ); create table entity_has_openid ( entity_id integer not null references entities (id) on delete cascade on update cascade,openid text not null unique ); create table snippets ( id serial not null primary key, entity_id integer not null references entities (id) on delete cascadeon update cascade, information text not null ); (followed by a mass of on insert/update/delete triggers) with queries like: * Find all snippets for a distinct entity, by one identifying field: select s.* from snippets s join entity_has_email e on s.entity_id = e.id join entity_has_phone_number p on s.entity_id =p.id join entity_has_openid o on s.entity_id = o.id where o.openid = 'john@myopenid.net'; Another option, sort of half way between the two could be: create table snippets ( id serial not null primary key, entity_id integer not null references entities (id) on delete cascadeon update cascade, information text not null ); create table entities ( id serial not null primary key, email text, phone_number text, openid text, check (email is not nullor phone_number is not null or openid is not null) ); * Find all snippets for a distinct entity, by openid = 'john@myopenid.net' select * from snippets where entity_id in ( select id from entities where phone_number = (select phone_number from entities where openid = 'john@myopenid.net')or email = (select email from entities where openid = 'john@myopenid.net') or openid in (select openidfrom entities where phone_number = (select phone_number from entities where openid = 'john@myopenid.net' or email = (select email from entities where openid = 'john@myopenid.net') )); At this point I am leaning towards that last method. Has anyone had to model something similar? Did you use one of these methods or something else? Any/all comments appreciated! J