Re: Research and EAV models - Mailing list pgsql-general

From Greg Smith
Subject Re: Research and EAV models
Date
Msg-id alpine.GSO.2.01.0910231811090.16414@westnet.com
Whole thread Raw
In response to Research and EAV models  ("Leif B. Kristensen" <leif@solumslekt.org>)
Responses Re: Research and EAV models
List pgsql-general
On Fri, 23 Oct 2009, Leif B. Kristensen wrote:

> I'm a researcher type, and I've made an EAV model that suits me well in
> my genealogy research. How can you associate an essentially unknown
> number of sundry "events" to a "person" without an EAV model?

CREATE TABLE events(person text,key text,value text);

You seem to think there's some sort of fundamental difference between the
two models.  There isn't.  The tuples of a standard relational row are
simply a fixed set of key/value pairs, but you can always store EAV data
explicitly like this simple example.  Similarly, you can always decompose
relational data into an equivalent EAV set, where the keys are the column
names.

The difference between the two data models is that knowing the keys in
advance allows you to do optimizations when looking up the data that
result in faster queries than any really generic system can do.  Trying to
optimize queries against huge EAV sets will just end up reinventing
concepts like indexes if you want them to run well, and then you're back
to having to specify which are the important keys/columns in advance.

Your thinking this is a business/research distinction isn't right, it
comes down to the size of the data set and how fast/expressive the queries
against them are.  In the research realm, you can watch this struggle play
in things like bioinformatics, where there's giant stacks of very similar
data they need to sort through.  What often ends up happening is you have
scientists start with a simple EAV system, then watch it completely fall
down under real-world load and complicated queries once it gets past
prototype.  Then they try to patch it for a while by reinventing SQL query
and storage concepts (badly).

Eventually someone with traditional database background comes along, notes
that the data you really need is for the most part predetermined, rewrites
that into something more amenable to standard SQL, and then the whole
thing performs better for that subset.  But now you've got stuff like
ALTER TABLE to add a column every time you want to track something new,
and people don't like that when the schema is still fluid.  So they start
putting stuff into EAV stores for their next project...and the cycle
begins anew.

Ultimately you can be really flexible in how your store your data, or you
can get good query performance, but it's quite hard to do both.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

Previous
From: Karsten Hilbert
Date:
Subject: Re: Research and EAV models
Next
From: Sydney Puente
Date:
Subject: drop view and recreate - for sync