Determining logically unique entities across many partially complete rows where at least one column matches - Mailing list pgsql-sql

From Jamie Tufnell
Subject Determining logically unique entities across many partially complete rows where at least one column matches
Date
Msg-id b0a4f3350908102319w2e536f1eyd29609f0effff77c@mail.gmail.com
Whole thread Raw
Responses Re: Determining logically unique entities across many partially complete rows where at least one column matches  (Rob Sargent <robjsargent@gmail.com>)
List pgsql-sql
Hi,

I am faced with a modeling problem and thought I'd see if anyone has run
into something similar and can offer some advice.

Basically my problem domain is cataloguing "snippets of information" about
"entities" which are loosely identified.

Entities can be identified up to 3 different methods (email, phone or
openid.)

Entities can have zero or many emails, phone numbers and openids.  The
only restriction is they must have at least one value in one of those three
columns.


Some sample data:

snippet #1
email: null
phone: +1234567890
openid: john@myopenid.net
information: This is snippet #1

snippet #2
email: foo@bar.com
phone: null
openid: johnny.name
information: This is snippet #2

At this point snippet #1 and #2 could refer to different entities.

snippet #3
email: bar@baz.com
phone: +1234567890
openid: johnny.name
information: This is snippet #3

But now all three snippets definitely refer to the same entity, as far as
we're concerned:

Entity: 1
OpenIDs: johnny.name, john@myopenid.net
Phones: +1234567890
Emails: foo@bar.com, bar@baz.com

So as far as modeling this goes, I'm stuck between:

1. Normalizing as usual with some serious triggers to maintain the  relationships.
2. Just having a snippets table with these fields inline and make these  inferences at query time.
3. Something in between.
4. Using a document store like CouchDB.

The kinds of queries I need to do right now (which will no doubt change):

* Return all snippets.
* Return all distinct entities.
* Find all id for a distinct entity given a single piece of id.
* Find all snippets for a distinct entity.

To do it in one table, I am thinking something like this:

create table snippets ( id serial not null primary key, email text, phone_number text, openid text, information text
notnull, check (email is not null or        phone_number is not null or openid is not null)
 
);

with queries like:

* Find all snippets for one distinct entity, searching by openid:

select * from snippets
where phone_number = (select phone_number from snippets where openid = 'john@myopenid.net')
or email = (select email from snippets where openid = 'john@myopenid.net')
or openid in (select openid from snippets  where phone_number =   (select phone_number from snippets where openid =
'john@myopenid.net') or email =   (select email from snippets where openid = 'john@myopenid.net'));
 


Or if I was to model as usual I am thinking something like this:

create table entities ( id serial not null primary key
);

create table entity_has_email ( entity_id integer not null   references entities (id) on delete cascade on update
cascade,email text not null unique
 
);

create table entity_has_phone_number ( entity_id integer not null   references entities (id) on delete cascade on
updatecascade, phone_number text not null unique
 
);

create table entity_has_openid ( entity_id integer not null   references entities (id) on delete cascade on update
cascade,openid text not null unique
 
);

create table snippets ( id serial not null primary key, entity_id integer not null   references entities (id) on delete
cascadeon update cascade, information text not null
 
);

(followed by a mass of on insert/update/delete triggers)

with queries like:

* Find all snippets for a distinct entity, by one identifying field:

select s.* from snippets s join entity_has_email e on s.entity_id = e.id join entity_has_phone_number p on s.entity_id
=p.id join entity_has_openid o on s.entity_id = o.id
 
where o.openid = 'john@myopenid.net';

Another option, sort of half way between the two could be:

create table snippets ( id serial not null primary key, entity_id integer not null   references entities (id) on delete
cascadeon update cascade, information text not null
 
);

create table entities ( id serial not null primary key, email text, phone_number text, openid text, check (email is not
nullor        phone_number is not null or openid is not null)
 
);

* Find all snippets for a distinct entity, by openid = 'john@myopenid.net'

select * from snippets
where entity_id in ( select id from entities where phone_number =   (select phone_number from entities where openid =
'john@myopenid.net')or email =   (select email from entities where openid = 'john@myopenid.net') or openid in   (select
openidfrom entities    where phone_number =      (select phone_number from entities where openid = 'john@myopenid.net'
 or email =      (select email from entities where openid = 'john@myopenid.net') ));
 

At this point I am leaning towards that last method.

Has anyone had to model something similar?  Did you use one of these
methods or something else?

Any/all comments appreciated!

J


pgsql-sql by date:

Previous
From: "A. Kretschmer"
Date:
Subject: Re: Import (.CVS File) to postgreSql
Next
From: "D'Arcy J.M. Cain"
Date:
Subject: Re: Updating a specific number of rows in pl/pgsql