Thread: Schema design question

Schema design question

From
Ben
Date:
I'm working on a project which requires me to keep track of objects,
each of which can have an arbitrary number of attributes. Although
there will be many attributes that an object can have, the data types
of those attributes won't be all that varried (int, float, text,
boolean, date, etc.). My question is: what tradeoffs have you seen in
picking a solution to this problem?

In other words:

create table attrs (id serial primary key, name text);
create table obj (id serial primary key, name text);
create table att (oid int references obj.id, aid int references
attrs.id, value text);

(everything smashed down to text by the application)

versus:

create table attrs (id serial primary key, name text);
create table obj (id serial primary key, name text);
create table att_int (oid int references obj.id, aid int references
attrs.id, value int);
create table att_float (oid int references obj.id, aid int references
attrs.id, value float);
create table att_text (oid int references obj.id, aid int references
attrs.id, value text);
create table att_bool (oid int references obj.id, aid int references
attrs.id, value boolean);
create table att_date (oid int references obj.id, aid int references
attrs.id, value date);

(everything kept distinct, probably with the application using stored
procs)

or even versus:

create table attrs (id serial primary key, name text);
create table obj (id serial primary key, name text);
create table att (oid int references obj.id, aid int references
attrs.id,
    value_int int, value_float float, value_text text, value_bool bool,
value_date date);

(the worst or the best of both worlds?)

The first certainly seems simplest, while the second seems like it
offers much better ability for constraint checking and probably more
compact storage. The second also seems like a perfect example for
inheritence, except that I don't believe inheritence allows for some
usuful things like uniqueness, so if I wanted to make my primary key
in the att_* tables
(oid,aid), then I dont' think I could use inheritence.

The third option doesn't seem all that different than the second
option to me, though maybe it has benefits I'm not seeing.


Anyway, this seems like a common problem without a perfect solution,
and I'm sure people must have hindsight opinions on how they solved
it. Your thoughts?

Re: Schema design question

From
Craig Ringer
Date:
Ben wrote:

> create table attrs (id serial primary key, name text);
> create table obj (id serial primary key, name text);
> create table att (oid int references obj.id, aid int references attrs.id,
>     value_int int, value_float float, value_text text, value_bool bool,
> value_date date);

I think I saw mention here of the DB using a bitmap in the tuple header
to avoid storing NULL fields.

If that's the case (don't trust my word on it), then combined with a
CHECK constraint that ensures that at most one of your typed fields may
be not null, this option might at least prove to be the most efficient.
However, it won't be fun to query.

Storing them all as text won't be much fun to query, which I'd consider
another argument for the many-types tuple. '2' > '11' =  't',  '002' <>
'2', etc.

--
Craig Ringer

Re: Schema design question

From
Ron Mayer
Date:
Ben wrote:
> I'm working on a project which requires me to keep track of objects,
> each of which can have an arbitrary number of attributes. Although there
> will be many attributes that an object can have,...
> Anyway, this seems like a common problem without a perfect solution, and
> I'm sure people must have hindsight opinions on how they solved it. Your
> thoughts?

If each attribute can only occur once for any object, and if your
queries are based more on exact matches of the values rather than
range queries, you might want to look into the hstore module:
http://www.postgresql.org/docs/current/static/hstore.html

If attributes can occur more than once, you might even want to
serialize the whole object as some text format (yaml, xml, etc)
instead.





Re: Schema design question

From
"Andrej Ricnik-Bay"
Date:
On 29/03/2008, Ben <bench@silentmedia.com> wrote:
> I'm working on a project which requires me to keep track of objects,
>  each of which can have an arbitrary number of attributes. Although
>  there will be many attributes that an object can have, the data types
>  of those attributes won't be all that varried (int, float, text,
>  boolean, date, etc.).

And a somewhat unorthodox suggestion for the list ... would it
be worthwhile considering a different storage mechanism all
together, like maybe an LDAP directory type of thing?  The query
language is admittedly very limited.

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: Schema design question

From
Ben
Date:
On Tue, 1 Apr 2008, Andrej Ricnik-Bay wrote:

> On 29/03/2008, Ben <bench@silentmedia.com> wrote:
>> I'm working on a project which requires me to keep track of objects,
>>  each of which can have an arbitrary number of attributes. Although
>>  there will be many attributes that an object can have, the data types
>>  of those attributes won't be all that varried (int, float, text,
>>  boolean, date, etc.).
>
> And a somewhat unorthodox suggestion for the list ... would it
> be worthwhile considering a different storage mechanism all
> together, like maybe an LDAP directory type of thing?  The query
> language is admittedly very limited.

Hm.... worthwhile for some, perhaps, but not for this project. This is
only a small part of a much larger whole, most of which fits quite well
into SQL. Thanks though! I think I'm leaning towards the
one-table-with-many-columns approach, as it seems to require the simpliest
set of constraints while still giving me type-safety.