Thread: Schema design question
I'm working on a project which requires me to keep track of objects, each of which can have an arbitrary number of attributes. Although there will be many attributes that an object can have, the data types of those attributes won't be all that varried (int, float, text, boolean, date, etc.). My question is: what tradeoffs have you seen in picking a solution to this problem? In other words: create table attrs (id serial primary key, name text); create table obj (id serial primary key, name text); create table att (oid int references obj.id, aid int references attrs.id, value text); (everything smashed down to text by the application) versus: create table attrs (id serial primary key, name text); create table obj (id serial primary key, name text); create table att_int (oid int references obj.id, aid int references attrs.id, value int); create table att_float (oid int references obj.id, aid int references attrs.id, value float); create table att_text (oid int references obj.id, aid int references attrs.id, value text); create table att_bool (oid int references obj.id, aid int references attrs.id, value boolean); create table att_date (oid int references obj.id, aid int references attrs.id, value date); (everything kept distinct, probably with the application using stored procs) or even versus: create table attrs (id serial primary key, name text); create table obj (id serial primary key, name text); create table att (oid int references obj.id, aid int references attrs.id, value_int int, value_float float, value_text text, value_bool bool, value_date date); (the worst or the best of both worlds?) The first certainly seems simplest, while the second seems like it offers much better ability for constraint checking and probably more compact storage. The second also seems like a perfect example for inheritence, except that I don't believe inheritence allows for some usuful things like uniqueness, so if I wanted to make my primary key in the att_* tables (oid,aid), then I dont' think I could use inheritence. The third option doesn't seem all that different than the second option to me, though maybe it has benefits I'm not seeing. Anyway, this seems like a common problem without a perfect solution, and I'm sure people must have hindsight opinions on how they solved it. Your thoughts?
Ben wrote: > create table attrs (id serial primary key, name text); > create table obj (id serial primary key, name text); > create table att (oid int references obj.id, aid int references attrs.id, > value_int int, value_float float, value_text text, value_bool bool, > value_date date); I think I saw mention here of the DB using a bitmap in the tuple header to avoid storing NULL fields. If that's the case (don't trust my word on it), then combined with a CHECK constraint that ensures that at most one of your typed fields may be not null, this option might at least prove to be the most efficient. However, it won't be fun to query. Storing them all as text won't be much fun to query, which I'd consider another argument for the many-types tuple. '2' > '11' = 't', '002' <> '2', etc. -- Craig Ringer
Ben wrote: > I'm working on a project which requires me to keep track of objects, > each of which can have an arbitrary number of attributes. Although there > will be many attributes that an object can have,... > Anyway, this seems like a common problem without a perfect solution, and > I'm sure people must have hindsight opinions on how they solved it. Your > thoughts? If each attribute can only occur once for any object, and if your queries are based more on exact matches of the values rather than range queries, you might want to look into the hstore module: http://www.postgresql.org/docs/current/static/hstore.html If attributes can occur more than once, you might even want to serialize the whole object as some text format (yaml, xml, etc) instead.
On 29/03/2008, Ben <bench@silentmedia.com> wrote: > I'm working on a project which requires me to keep track of objects, > each of which can have an arbitrary number of attributes. Although > there will be many attributes that an object can have, the data types > of those attributes won't be all that varried (int, float, text, > boolean, date, etc.). And a somewhat unorthodox suggestion for the list ... would it be worthwhile considering a different storage mechanism all together, like maybe an LDAP directory type of thing? The query language is admittedly very limited. Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm
On Tue, 1 Apr 2008, Andrej Ricnik-Bay wrote: > On 29/03/2008, Ben <bench@silentmedia.com> wrote: >> I'm working on a project which requires me to keep track of objects, >> each of which can have an arbitrary number of attributes. Although >> there will be many attributes that an object can have, the data types >> of those attributes won't be all that varried (int, float, text, >> boolean, date, etc.). > > And a somewhat unorthodox suggestion for the list ... would it > be worthwhile considering a different storage mechanism all > together, like maybe an LDAP directory type of thing? The query > language is admittedly very limited. Hm.... worthwhile for some, perhaps, but not for this project. This is only a small part of a much larger whole, most of which fits quite well into SQL. Thanks though! I think I'm leaning towards the one-table-with-many-columns approach, as it seems to require the simpliest set of constraints while still giving me type-safety.