Thread: [OT] "advanced" database design (long)

[OT] "advanced" database design (long)

From
vladimir konrad
Date:
Hello,

I think that I understand basic relational theory but then I had an
idea. What I would like to know if this is sometimes done or that I am
possibly mad... Also, I do not know the terminology for this kind of
thing so I do not know where and what to look for.

Basically, instead of adding field to a table every time there is a
need for it, have a table split in two: one holds identity (id) and one
holds the attributes (linked to this id). For example, modelling
subject (person):

subject:
    id
    entered_on (date when entered to the database)

sadg (subject attribute group):
    id
    value (string - name for group of attributes, e.g. postal
address)

sad (subject attribute definition):
    id
    sadg (references sadg(id))
    name (the name of the attribute - e.g. e-mail address, gender)
    app_type (string - what data type to use in application for
sa.value)
    db_type (string - what cast to use in database for sa.value)

sa (subject attribute):
    id
    subject (references subject(id))
    sad    (references sad(id)
    value    (string, holds the attribute value)

sads (subject attribute definition set) - what values the sa can have:
    id
    sad (references sad(id))
    value (string)

Basically, if in the future user decides that the subject should have
a new attribute, he can simply add "attribute definition" and
attribute_definition_set (if any) and the application would handle it
without re-designing table, queries, and screens (i.e. the user
interface when displaying sadg: "postal address" would always pull all
attributes in that group and lay-out would automatically handle the new
attributes).

The advantage I see, is that to add new fields (here "attributes") no
db development would be needed and user could do this.

The disadvantages I see is that the model is hard to work with (i.e. how
do I see subject (with attributes) as a table - could cross join be
used for this?. Also, hand writing the queries for this would be hard
(possibly needed if user would like to write custom reports).

Do people do this kind of thing (or I took it too far)? If yes, I would
be grateful for pointers to examples or any other info on this...

Vlad

Re: [OT] "advanced" database design (long)

From
Thomas Pundt
Date:
Hi,

vladimir konrad wrote:
> I think that I understand basic relational theory but then I had an
> idea. What I would like to know if this is sometimes done or that I am
> possibly mad... Also, I do not know the terminology for this kind of
> thing so I do not know where and what to look for.
>
> Basically, instead of adding field to a table every time there is a
> need for it, have a table split in two: one holds identity (id) and one
> holds the attributes (linked to this id). For example, modelling
> subject (person):

[example stripped]

> The advantage I see, is that to add new fields (here "attributes") no
> db development would be needed and user could do this.
>
> The disadvantages I see is that the model is hard to work with (i.e. how
> do I see subject (with attributes) as a table - could cross join be
> used for this?. Also, hand writing the queries for this would be hard
> (possibly needed if user would like to write custom reports).
>
> Do people do this kind of thing (or I took it too far)? If yes, I would
> be grateful for pointers to examples or any other info on this...

Yes, this is known as eg. Entity-Attribute-Value model (cf. wikipedia).

IMO most times its disadvantages (it can be very hard to write
performant queries compared to the traditional row based model) weigh
higher than you gain (in flexibility) in relational databases. But it
sure has its uses cases.

Ciao,
Thomas

Re: [OT] "advanced" database design (long)

From
vladimir konrad
Date:
> Yes, this is known as eg. Entity-Attribute-Value model (cf.
> wikipedia).

Thank you for the pointer and term. This will get me started.

> IMO most times its disadvantages (it can be very hard to write
> performant queries compared to the traditional row based model) weigh
> higher than you gain (in flexibility) in relational databases. But it
> sure has its uses cases.

Obviously, I will have to think it through more...

Vlad

Re: [OT] "advanced" database design (long)

From
Lewis Cunningham
Date:
--- vladimir konrad <vk@dsl.pipex.com> wrote:

> I think that I understand basic relational theory but then I had an
> idea.
> Basically, instead of adding field to a table every time there is a
> need for it, have a table split in two: one holds identity (id) and
> one holds the attributes (linked to this id).
> Basically, if in the future user decides that the subject should
> have a new attribute, he can simply add "attribute definition" and
> attribute_definition_set (if any) and the application would handle

Basically, you would be creating your own data dictionary (i.e.
system catalog) on top of the db data dictionary.  The database
already comes with a way to easily add columns: ddl.  I have seen
newbie database designers reinvent this method a hundred times.  The
performance hits and complexity of querying data would far out weigh
any perceived maintenance gain.

My .02.

LewisC




Lewis R Cunningham

An Expert's Guide to Oracle Technology
http://blogs.ittoolbox.com/oracle/guide/

LewisC's Random Thoughts
http://lewiscsrandomthoughts.blogspot.com/



Re: [OT] "advanced" database design (long)

From
Bill Moran
Date:
vladimir konrad <vk@dsl.pipex.com> wrote:
>
> Hello,
>
> I think that I understand basic relational theory but then I had an
> idea. What I would like to know if this is sometimes done or that I am
> possibly mad... Also, I do not know the terminology for this kind of
> thing so I do not know where and what to look for.
>
> Basically, instead of adding field to a table every time there is a
> need for it, have a table split in two: one holds identity (id) and one
> holds the attributes (linked to this id). For example, modelling
> subject (person):
>
> subject:
>     id
>     entered_on (date when entered to the database)
>
> sadg (subject attribute group):
>     id
>     value (string - name for group of attributes, e.g. postal
> address)
>
> sad (subject attribute definition):
>     id
>     sadg (references sadg(id))
>     name (the name of the attribute - e.g. e-mail address, gender)
>     app_type (string - what data type to use in application for
> sa.value)
>     db_type (string - what cast to use in database for sa.value)
>
> sa (subject attribute):
>     id
>     subject (references subject(id))
>     sad    (references sad(id)
>     value    (string, holds the attribute value)
>
> sads (subject attribute definition set) - what values the sa can have:
>     id
>     sad (references sad(id))
>     value (string)
>
> Basically, if in the future user decides that the subject should have
> a new attribute, he can simply add "attribute definition" and
> attribute_definition_set (if any) and the application would handle it
> without re-designing table, queries, and screens (i.e. the user
> interface when displaying sadg: "postal address" would always pull all
> attributes in that group and lay-out would automatically handle the new
> attributes).

Sounds like LDAP.

--
Bill Moran
http://www.potentialtech.com

Re: [OT] "advanced" database design (long)

From
vladimir konrad
Date:
> Basically, you would be creating your own data dictionary (i.e.
> system catalog) on top of the db data dictionary.  The database
> already comes with a way to easily add columns: ddl.  I have seen
> newbie database designers reinvent this method a hundred times.  The
> performance hits and complexity of querying data would far out weigh
> any perceived maintenance gain.

And I thought that this would make me a famous database designer ;-),

well, given the deadline for this + other concerns it looks like
implementing my own type system is over-kill (+ it would be impossible
for "normal" users to write report queries).

Thank you all for input, i will do it the "normal" way.

Vlad

Re: [OT] "advanced" database design (long)

From
David Fetter
Date:
On Sat, Feb 02, 2008 at 01:38:19PM +0100, Thomas Pundt wrote:
> Hi,
>
> vladimir konrad wrote:
>> I think that I understand basic relational theory but

Clearly, you'll have to revisit that thought.

> [example stripped]
>
> Yes, this is known as eg. Entity-Attribute-Value model (cf.
> wikipedia).
>
> IMO most times its disadvantages (it can be very hard to write
> performant queries compared to the traditional row based model)

Make that, "impossible."  The "flexibility" stems from fear of making
a design decision.

The second and smaller price is having the system bog down entirely
and have to be scrapped, whether it's 3 months down the line, or 3
years.

The math beneath this is that query complexity goes up like O(E!A!V!)
for Entity, Attribute and Value.

The first price, though, and by far the biggest, is that it's
impossible to maintain any kind of data integrity in such a system, as
such constraints, by their nature, are application-dependent.  Two
applications means you're violating the SPOT (Single Point of Truth)
Rule, and that in turn means your data turns quickly into
incomprehensible gibberish.

> weigh higher than you gain (in flexibility) in relational databases.
> But it sure has its uses cases.

Why, yes.  I encourage all my competitors to use it. ;)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [OT] "advanced" database design (long)

From
Shane Ambler
Date:
Lewis Cunningham wrote:
> --- vladimir konrad <vk@dsl.pipex.com> wrote:
>
>> I think that I understand basic relational theory but then I had an
>> idea.
>> Basically, instead of adding field to a table every time there is a
>> need for it, have a table split in two: one holds identity (id) and
>> one holds the attributes (linked to this id).
>> Basically, if in the future user decides that the subject should
>> have a new attribute, he can simply add "attribute definition" and
>> attribute_definition_set (if any) and the application would handle
>
> Basically, you would be creating your own data dictionary (i.e.
> system catalog) on top of the db data dictionary.  The database
> already comes with a way to easily add columns: ddl.

If you have some part of your app that needs to "select" the list of
columns in a table you should look at
http://www.postgresql.org/docs/8.2/interactive/catalogs.html
particularly pg_class and pg_attribute



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: [OT] "advanced" database design (long)

From
vladimir konrad
Date:
> If you have some part of your app that needs to "select" the list of
> columns in a table you should look at
> http://www.postgresql.org/docs/8.2/interactive/catalogs.html
> particularly pg_class and pg_attribute

Thanks, this could come handy.

Vlad

Re: [OT] "advanced" database design (long)

From
"Karsten Hilbert"
Date:
David Fetter wrote:

> The math beneath this is that query complexity goes up like O(E!A!V!)
> for Entity, Attribute and Value.
Makes sense.

> The first price, though, and by far the biggest, is that it's
> impossible to maintain any kind of data integrity in such a system, as
> such constraints, by their nature, are application-dependent.  Two
> applications means you're violating the SPOT (Single Point of Truth)
> Rule, and that in turn means your data turns quickly into
> incomprehensible gibberish.

Starts making sense, too.

Could you give advice (or pointers what I should look at) on how one
would (roughly) "properly" schemafy the following requirements:

- unknown number of differing paper forms to print data on
- user fills in on-screen masks to aggregate data for printing

Intent: use the database to store a) definitions for on-screen masks,
b) definitions for printout (how to place data), c) the denormalized
data eventually put into form instances (the normalized source data
already is in the database).

There seem to be three basic approaches:

- one table per form def plus one per form type holding content
- one table holding form defs as, say, XML to be parsed client-side
  plus another table holding form data as XML, too
- EAV: tables holding form defs, field defs, form instances pointing
  to form defs, and field data pointing to field defs and form instances
  each with all the relevant foreign keys

The first requires DDL whenever a form is added by a user.

The second requires client-side logic making form reuse across clients
a lot harder (SPOT violation ?).

The third sounds OK -- but seems to be of the apparently dreaded EAV type.

What am I missing ? Where should I get a clue ?

Thanks,
Karsten Hilbert, MD
wiki.gnumed.de

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Re: [OT] "advanced" database design (long)

From
vladimir konrad
Date:
Hello,

>> vladimir konrad wrote:
>>> I think that I understand basic relational theory but

> Clearly, you'll have to revisit that thought.

Usually I have one table per "entity" modelled (and the table holds
fields describing that entity).

E.g. subject would have name fields and date of birth field (fields
related directly to subject), postal_address would be separate table and
subject_postal_address would be linking table between postal_address
and subject:

subject <-- subject_postal_address --> postal_address

This way, the postal_address can be made unique (with constrains) and
linked to other entities where the postal_address is needed.

The system I am developing has to handle "tests" (for rowing athletes):
1. how many meters athlete did in 10 minutes (result is float)
2. how long it took the athlete to do 5 kilo-meters (the result is time)

So it looks that I need table for each of 1 and 2 (because of different
data types describing the test).

> The math beneath this is that query complexity goes up like O(E!A!V!)
> for Entity, Attribute and Value.

Well, when I thought about it, the gut feeling was that I am opening a
can of worms - it would push the complexity into code (the bad place to
have it in).

> The first price, though, and by far the biggest, is that it's
> impossible to maintain any kind of data integrity in such a system, as
> such constraints, by their nature, are application-dependent.  Two
> applications means you're violating the SPOT (Single Point of Truth)
> Rule, and that in turn means your data turns quickly into
> incomprehensible gibberish.
It could be implemented inside of the database server (stored
procedures, views and such), but it would still be complex, hard, long
and as you said badly performing (your point about complexity made
that clear).

Vlad


Re: [OT] "advanced" database design (long)

From
Shane Ambler
Date:
vladimir konrad wrote:

> The system I am developing has to handle "tests" (for rowing athletes):
> 1. how many meters athlete did in 10 minutes (result is float)
> 2. how long it took the athlete to do 5 kilo-meters (the result is time)
>
> So it looks that I need table for each of 1 and 2 (because of different
> data types describing the test).
>

Are the tests that different that you need to segregate the data?
I see them both as being the time taken to travel a distance. The only
difference is whether the time or distance is used to end the measurement.

Personally I would think that one table that has the athlete's id and a
date (or timestamp to allow more than one a day) of the event as well as
a time interval and distance would suffice.

For 1. the time interval would always be 10mins, for 2. the distance
would always be 5km. To get individual test stats you can use WHERE
time=10mins or WHERE distance=5.0

You could even create test_views with the where clause pre-determined.

Worst case would be another column flagging the test type.

 From there you can also add in a 5, 15, 20, 30, 40 minutes or even 2,
2.5, 7.5, 10 km tests as well without changing your structure.

I also see multiple samples for a single run. Time at 1km, 2km, 3km,
4km, 5km (or at 2min, 4 min, 6min...) - you could see whether they can
maintain the speed over the distance or at what distance/time they wear
out and slow down. (maybe they give too much in the first 2km so that
they just crawl in the last 2)

Maybe sub-times can be a second table.




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: [OT] "advanced" database design (long)

From
vladimir konrad
Date:
Hello, and thanks

> Are the tests that different that you need to segregate the data?
> I see them both as being the time taken to travel a distance. The
> only difference is whether the time or distance is used to end the
> measurement.

Good point (I have realised this after posting, when I dug deeper into
the design). What I got now looks like this (linking fields omitted):

[subject]
  1
  |
  n
[ergo: distance(float), time(interval), taken_on(date)]
  n                       n
  |                       |
  1                       1
[ergo_rate]             [ergo_set]


The same test can be taken at different rates (e.g. 20 strokes per
minute), therefore the [ergo_rate] (there are only few rates they
always use).

The [ergo_set] determines what value was "set" before the test (bad
name but maybe i think of something better). So, it the distance is
set, it points to (1, 'distance') in [ergo_set]; if time, then to (2,
'time'). User chooses what is "set" before recording the test.
Also it is possible to ask "give me all 2000m ergo test results done
at this rate".

> Worst case would be another column flagging the test type.

Why do you think this is the worst case?

> I also see multiple samples for a single run. Time at 1km, 2km, 3km,
> 4km, 5km (or at 2min, 4 min, 6min...) - you could see whether they
> can maintain the speed over the distance or at what distance/time
> they wear out and slow down. (maybe they give too much in the first
> 2km so that they just crawl in the last 2)
> Maybe sub-times can be a second table.

They call it splits (have to check if it is done for ergos but it is
definitely done for racing on the water).

In ergo case, I would have extra table [ergo_split]:

[ergo] <-- [ergo_split: distance(float), clock(time)]

(they record the time a watch shows them, therefore time and not
interval)

Vlad

Re: [OT] "advanced" database design (long)

From
Shane Ambler
Date:
vladimir konrad wrote:

>> Worst case would be another column flagging the test type.
>
> Why do you think this is the worst case?
>

Bad choice of words - just referring to using the where clause to
extract one particular test - if that is insufficient you can use a test
column to track what test it refers to.



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: [OT] "advanced" database design (long)

From
"Alex Turner"
Date:
I"m not a database expert, but wouldn't

create table attribute (
  attribute_id int
  attribute text
)

create table value (
  value_id int
  value text
)

create table attribute_value (
  entity_id int
  attribute_id int
  value_id int
)

give you a lot less  pages to load than building a table with say 90 columns in it that are all null, which would result in better rather than worse performance?

Alex

On Feb 2, 2008 9:15 AM, Lewis Cunningham <lewisc@rocketmail.com> wrote:

--- vladimir konrad <vk@dsl.pipex.com> wrote:

> I think that I understand basic relational theory but then I had an
> idea.
> Basically, instead of adding field to a table every time there is a
> need for it, have a table split in two: one holds identity (id) and
> one holds the attributes (linked to this id).
> Basically, if in the future user decides that the subject should
> have a new attribute, he can simply add "attribute definition" and
> attribute_definition_set (if any) and the application would handle

Basically, you would be creating your own data dictionary (i.e.
system catalog) on top of the db data dictionary.  The database
already comes with a way to easily add columns: ddl.  I have seen
newbie database designers reinvent this method a hundred times.  The
performance hits and complexity of querying data would far out weigh
any perceived maintenance gain.

My .02.

LewisC




Lewis R Cunningham

An Expert's Guide to Oracle Technology
http://blogs.ittoolbox.com/oracle/guide/

LewisC's Random Thoughts
http://lewiscsrandomthoughts.blogspot.com/



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [OT] "advanced" database design (long)

From
"Masse Jacques"
Date:

 

Hello
________________________________

        De : pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] De la part de Alex Turner
        Envoyé : lundi 4 février 2008 05:14
        À : Lewis Cunningham
        Cc : vladimir konrad; pgsql-general@postgresql.org
        Objet : Re: [GENERAL] [OT] "advanced" database design (long)
       
       
        I"m not a database expert, but wouldn't
       
        create table attribute (
          attribute_id int
          attribute text
        )
       
        create table value (
          value_id int
          value text
        )
       
        create table attribute_value (
          entity_id int
          attribute_id int
          value_id int
        )
       
        give you a lot less  pages to load than building a table with say 90 columns in it that are all null, which would result in better rather than worse performance?

       
        Alex
       
       
        On Feb 2, 2008 9:15 AM, Lewis Cunningham <lewisc@rocketmail.com> wrote:
       

                --- vladimir konrad <vk@dsl.pipex.com> wrote:
               
                > I think that I understand basic relational theory but then I had an
                > idea.
               
                > Basically, instead of adding field to a table every time there is a
                > need for it, have a table split in two: one holds identity (id) and
                > one holds the attributes (linked to this id).
               
                > Basically, if in the future user decides that the subject should
                > have a new attribute, he can simply add "attribute definition" and
                > attribute_definition_set (if any) and the application would handle
               
               
                Basically, you would be creating your own data dictionary (i.e.
                system catalog) on top of the db data dictionary.  The database
                already comes with a way to easily add columns: ddl.  I have seen
                newbie database designers reinvent this method a hundred times.  The
                performance hits and complexity of querying data would far out weigh
                any perceived maintenance gain.

               
This model is known as Entity-Value-Attribute and not well appreciated by relational designers. I think it is not relational, but I use it as storage for data (in some case, I don't know the database structure -relational- where data will be stored). It's like a truck container used for office removal :).

Imho, don't use it with a complex database structure; as a minimum, use lookup tables for Value (to avoid uncontrolled new parameters) and add a column to store the attribute type.

Jacques Massé

Re: [OT] "advanced" database design (long)

From
"Dawid Kuroczko"
Date:
On Feb 4, 2008 5:14 AM, Alex Turner <armtuk@gmail.com> wrote:
> I"m not a database expert, but wouldn't
>
[...]
>
> give you a lot less  pages to load than building a table with say 90 columns
> in it that are all null, which would result in better rather than worse
> performance?

Well, but PostgreSQL's NULLs occupy almost no space, or rather a bit of space,
that is one bit exactly. ;-)  I am pretty much sure that storage-wise
looking NULLs
are more efficient.

   Regards,
      Dawid

Re: [OT] "advanced" database design (long)

From
Jorge Godoy
Date:
Em Monday 04 February 2008 07:03:47 Dawid Kuroczko escreveu:
> On Feb 4, 2008 5:14 AM, Alex Turner <armtuk@gmail.com> wrote:
> > I"m not a database expert, but wouldn't
>
> [...]
>
> > give you a lot less  pages to load than building a table with say 90
> > columns in it that are all null, which would result in better rather than
> > worse performance?
>
> Well, but PostgreSQL's NULLs occupy almost no space, or rather a bit of
> space, that is one bit exactly. ;-)  I am pretty much sure that
> storage-wise looking NULLs
> are more efficient.

I'd say 1 byte every 8 NULLs instead of 1 bit.  If you only have 1 NULL, it
will cost you 1 byte (not 1 bit).  If you have 9, it will cost you 2 bytes
(not 9 bits).

--
Jorge Godoy      <jgodoy@gmail.com>


Re: [OT] "advanced" database design (long)

From
"Scott Marlowe"
Date:
On Feb 3, 2008 10:14 PM, Alex Turner <armtuk@gmail.com> wrote:
> I"m not a database expert, but wouldn't
>
> create table attribute (
>   attribute_id int
>   attribute text
> )
>
> create table value (
>   value_id int
>   value text
> )
>
> create table attribute_value (
>    entity_id int
>   attribute_id int
>   value_id int
> )
>
> give you a lot less  pages to load than building a table with say 90 columns
> in it that are all null, which would result in better rather than worse
> performance?

But you're giving us a choice between two bad methodologies.

Properly normalized, you'd not have a table with 90 nullable columns,
but a set of related tables where you'd only need to store things in
the subordinate tables for the relative data points.

The worst thing about EAV is that it makes it very hard to figure out
what the heck is going on by just looking at the database schema.
It's easy to develop and hard to maintain.  We had a person do
something like that last place I worked and it took weeks for a new
developer to figure it out and replace it with a properly relational
model, because there were little corner cases all through the code
that kept popping up.

Re: [OT] "advanced" database design (long)

From
Ivan Sergio Borgonovo
Date:
On Sat, 2 Feb 2008 09:45:57 -0800
David Fetter <david@fetter.org> wrote:

> On Sat, Feb 02, 2008 at 01:38:19PM +0100, Thomas Pundt wrote:
> > Hi,
> >
> > vladimir konrad wrote:
> >> I think that I understand basic relational theory but
>
> Clearly, you'll have to revisit that thought.
>
> > [example stripped]
> >
> > Yes, this is known as eg. Entity-Attribute-Value model (cf.
> > wikipedia).
> >
> > IMO most times its disadvantages (it can be very hard to write
> > performant queries compared to the traditional row based model)
>
> Make that, "impossible."  The "flexibility" stems from fear of
> making a design decision.

> The second and smaller price is having the system bog down entirely
> and have to be scrapped, whether it's 3 months down the line, or 3
> years.
>
> The math beneath this is that query complexity goes up like
> O(E!A!V!) for Entity, Attribute and Value.

> The first price, though, and by far the biggest, is that it's
> impossible to maintain any kind of data integrity in such a system,
> as such constraints, by their nature, are application-dependent.
> Two applications means you're violating the SPOT (Single Point of
> Truth) Rule, and that in turn means your data turns quickly into
> incomprehensible gibberish.
>
> > weigh higher than you gain (in flexibility) in relational
> > databases. But it sure has its uses cases.
>
> Why, yes.  I encourage all my competitors to use it. ;)

There should be some standard refactoring technique *and* tool to
move from EAV model to something saner on the long run.

Say you still don't know what the users will need and you want to
give them a chance to experiment with which proprieties they find
themselves more comfortable with (build up a better model) and you
don't want to give users access to ddl, or suppose this part is not a
critical part of the application and you want to let the users
customise it without access to the ddl...

I didn't take the time to really think how you could stay far from
major pain when you start from EAV model and you want to normalise
stuff...
I'd go for a naive/brute force approach.
As usual I'd try to encapsulate this kind of stuff and centralise the
"access points" and then once I've to refactor this I'd just use grep
for some kind of signature and fix it case by case.

But well... if I knew in advance there is a standard technique
(tool?) to build up EAV models in a way they can be "easily"
refactored, I'd be happier ;)


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: [OT] "advanced" database design (long)

From
Gregory Stark
Date:
"Scott Marlowe" <scott.marlowe@gmail.com> writes:

> On Feb 3, 2008 10:14 PM, Alex Turner <armtuk@gmail.com> wrote:
>
>> give you a lot less  pages to load than building a table with say 90 columns
>> in it that are all null, which would result in better rather than worse
>> performance?

Fwiw Postgres stores NULLs quite efficiently. Those 90 columns, all of which
are null would take 12 bytes.

> But you're giving us a choice between two bad methodologies.

But I probably agree with Scott. It depends though. There are cases which are
inherently awkward for which you will end up with either EAV or 90 mostly NULL
columns.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: [OT] "advanced" database design (long)

From
David Fetter
Date:
On Mon, Feb 04, 2008 at 10:49:51AM +0100, Masse Jacques wrote:
> Hello

[much garbage including HTML trimmed.]

Please set your mail client to text-only.  HTML does not make your
point better, and it annoys the heck out of people whose mail readers
use text.

> This model is known as Entity-Value-Attribute and not well
> appreciated by relational designers.  I think it is not relational,
> but I use it as storage for data (in some case, I don't know the
> database structure -relational- where data will be stored). It's
> like a truck container used for office removal :).

It's more like sitting on a land mine and hoping it won't go off.

> Imho, don't use it with a complex database structure; as a minimum,
> use lookup tables for Value (to avoid uncontrolled new parameters)
> and add a column to store the attribute type.

Putting lipstick on the EAV pig does not help.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [OT] "advanced" database design (long)

From
Tom Lane
Date:
Jorge Godoy <jgodoy@gmail.com> writes:
> Em Monday 04 February 2008 07:03:47 Dawid Kuroczko escreveu:
>> Well, but PostgreSQL's NULLs occupy almost no space, or rather a bit of
>> space, that is one bit exactly. ;-)  I am pretty much sure that
>> storage-wise looking NULLs
>> are more efficient.

> I'd say 1 byte every 8 NULLs instead of 1 bit.  If you only have 1 NULL, it
> will cost you 1 byte (not 1 bit).  If you have 9, it will cost you 2 bytes
> (not 9 bits).

This is not quite right --- the amount of space used doesn't change if
you have more or fewer nulls in a row.  A null bitmap is present in a
row if there are any nulls at all in the row, and its size will be equal
to the defined number of columns in the table.  As you say, there's
padding overhead too ...

            regards, tom lane

Re: [OT] "advanced" database design (long)

From
"Alex Turner"
Date:
How do you normalize 90 arbitrary attributes away into subordinate tables?  There will still be 90 of them, you can split them up into multiple tables, but it would just make joins a pain and potentially bog down the query planner I would think.

Alex

On Feb 4, 2008 7:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Feb 3, 2008 10:14 PM, Alex Turner <armtuk@gmail.com> wrote:
> I"m not a database expert, but wouldn't
>
> create table attribute (
>   attribute_id int
>   attribute text
> )
>
> create table value (
>   value_id int
>   value text
> )
>
> create table attribute_value (
>    entity_id int
>   attribute_id int
>   value_id int
> )
>
> give you a lot less  pages to load than building a table with say 90 columns
> in it that are all null, which would result in better rather than worse
> performance?

But you're giving us a choice between two bad methodologies.

Properly normalized, you'd not have a table with 90 nullable columns,
but a set of related tables where you'd only need to store things in
the subordinate tables for the relative data points.

The worst thing about EAV is that it makes it very hard to figure out
what the heck is going on by just looking at the database schema.
It's easy to develop and hard to maintain.  We had a person do
something like that last place I worked and it took weeks for a new
developer to figure it out and replace it with a properly relational
model, because there were little corner cases all through the code
that kept popping up.

Re: [OT] "advanced" database design (long)

From
"Alex Turner"
Date:
That is a very awesome system.  I am constantly impressed at the
awesomeness of Postgresql.

Alex

On Feb 4, 2008 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jorge Godoy <jgodoy@gmail.com> writes:
> > Em Monday 04 February 2008 07:03:47 Dawid Kuroczko escreveu:
> >> Well, but PostgreSQL's NULLs occupy almost no space, or rather a bit of
> >> space, that is one bit exactly. ;-)  I am pretty much sure that
> >> storage-wise looking NULLs
> >> are more efficient.
>
> > I'd say 1 byte every 8 NULLs instead of 1 bit.  If you only have 1 NULL, it
> > will cost you 1 byte (not 1 bit).  If you have 9, it will cost you 2 bytes
> > (not 9 bits).
>
> This is not quite right --- the amount of space used doesn't change if
> you have more or fewer nulls in a row.  A null bitmap is present in a
> row if there are any nulls at all in the row, and its size will be equal
> to the defined number of columns in the table.  As you say, there's
> padding overhead too ...
>
>                         regards, tom lane
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>

Re: [OT] "advanced" database design (long)

From
"Alex Turner"
Date:
I just thought of another problem, the system can have multiple values
for a single attribute.  How do you normalise that without basically
adding a link table that's just the same thing as given below (I know
there are array types in Postgresql, but there aren't in other DBs and
I'm a fan of keeping products as DB neutral as possible)?

Alex

On Feb 4, 2008 7:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Feb 3, 2008 10:14 PM, Alex Turner <armtuk@gmail.com> wrote:
> > I"m not a database expert, but wouldn't
> >
> > create table attribute (
> >   attribute_id int
> >   attribute text
> > )
> >
> > create table value (
> >   value_id int
> >   value text
> > )
> >
> > create table attribute_value (
> >    entity_id int
> >   attribute_id int
> >   value_id int
> > )
> >
> > give you a lot less  pages to load than building a table with say 90 columns
> > in it that are all null, which would result in better rather than worse
> > performance?
>
> But you're giving us a choice between two bad methodologies.
>
> Properly normalized, you'd not have a table with 90 nullable columns,
> but a set of related tables where you'd only need to store things in
> the subordinate tables for the relative data points.
>
> The worst thing about EAV is that it makes it very hard to figure out
> what the heck is going on by just looking at the database schema.
> It's easy to develop and hard to maintain.  We had a person do
> something like that last place I worked and it took weeks for a new
> developer to figure it out and replace it with a properly relational
> model, because there were little corner cases all through the code
> that kept popping up.
>

Re: [OT] "advanced" database design (long)

From
Erik Jones
Date:
On Feb 5, 2008, at 10:17 AM, Alex Turner wrote:

> On Feb 4, 2008 7:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Feb 3, 2008 10:14 PM, Alex Turner <armtuk@gmail.com> wrote:
>>> I"m not a database expert, but wouldn't
>>>
>>> create table attribute (
>>>   attribute_id int
>>>   attribute text
>>> )
>>>
>>> create table value (
>>>   value_id int
>>>   value text
>>> )
>>>
>>> create table attribute_value (
>>>    entity_id int
>>>   attribute_id int
>>>   value_id int
>>> )
>>>
>>> give you a lot less  pages to load than building a table with say
>>> 90 columns
>>> in it that are all null, which would result in better rather than
>>> worse
>>> performance?
>>
>> But you're giving us a choice between two bad methodologies.
>>
>> Properly normalized, you'd not have a table with 90 nullable columns,
>> but a set of related tables where you'd only need to store things in
>> the subordinate tables for the relative data points.
>>
>> The worst thing about EAV is that it makes it very hard to figure out
>> what the heck is going on by just looking at the database schema.
>> It's easy to develop and hard to maintain.  We had a person do
>> something like that last place I worked and it took weeks for a new
>> developer to figure it out and replace it with a properly relational
>> model, because there were little corner cases all through the code
>> that kept popping up.
>>
> I just thought of another problem, the system can have multiple values
> for a single attribute.  How do you normalise that without basically
> adding a link table that's just the same thing as given below (I know
> there are array types in Postgresql, but there aren't in other DBs and
> I'm a fan of keeping products as DB neutral as possible)?

No, it wouldn't.  You're confusing mechanics with semantics.  In that
case the data would have meaning and context inherent in the schema
relationship between the two tables that would have the one-to-many
relationship you describe.  In addition, you'd be able to take
advantage of foreign keys and other kinds of constraints to enforce
data integrity.  Those are the intended goals (at a high level) of
relational data theory in the first place.

Seriously, though, go pick up a book on relational database design,
most cover this very topic at some point.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: "advanced" database design (long)

From
SunWuKung
Date:
On febr. 2, 15:15, lew...@rocketmail.com (Lewis Cunningham) wrote:
> --- vladimir konrad <v...@dsl.pipex.com> wrote:
>
> > I think that I understand basic relational theory but then I had an
> > idea.
> > Basically, instead of adding field to a table every time there is a
> > need for it, have a table split in two: one holds identity (id) and
> > one holds the attributes (linked to this id).
> > Basically, if in the future user decides that the subject should
> > have a new attribute, he can simply add "attribute definition" and
> > attribute_definition_set (if any) and the application would handle
>
> Basically, you would be creating your own data dictionary (i.e.
> system catalog) on top of the db data dictionary.  The database
> already comes with a way to easily add columns: ddl.  I have seen
> newbie database designers reinvent this method a hundred times.  The
> performance hits and complexity of querying data would far out weigh
> any perceived maintenance gain.
>
> My .02.
>
> LewisC
>
> Lewis R Cunningham
>
> An Expert's Guide to Oracle Technologyhttp://blogs.ittoolbox.com/oracle/guide/
>
> LewisC's Random Thoughtshttp://lewiscsrandomthoughts.blogspot.com/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

I always thought that having nullable columns in a table is a Bad
Thing (http://technet.microsoft.com/en-us/library/ms191178.aspx) and
shows that you try to put different type of entities into the same
table - having 90 in a column ... brrrrr.
I think its much better to avoid it whenever you have the info but
when you don't you just have to use the EAV model.
E.g. If I knew what info I wanted to store on a person I could create
columns for that, but since in our application users create the
questionnaires that is used to store info on persons I see little
choice - I must have a subjectID, questionID, value table.

SWK



SWK

Re: [OT] "advanced" database design (long)

From
ptjm@interlog.com (Patrick TJ McPhee)
Date:
In article <33c6269f0802032014i3878ec3co4488b4835ef1e3d8@mail.gmail.com>,
Alex Turner <armtuk@gmail.com> wrote:
%
% I"m not a database expert, but wouldn't
%
% create table attribute (
%   attribute_id int
%   attribute text
% )
%
% create table value (
%   value_id int
%   value text
% )
%
% create table attribute_value (
%   entity_id int
%   attribute_id int
%   value_id int
% )
%
% give you a lot less  pages to load than building a table with say 90 columns
% in it that are all null, which would result in better rather than worse
% performance?

Suppose you want one row of data. Say it's one of the ones where the
columns aren't all nulls. You look up 90 rows in attribute_value, then
90 rows in attribute, then 90 rows in value. You're probably looking at
3-6 pages of index data, and then somewhere between 3 and 270 pages of
data from the database, for one logical row of data.
--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com

Re: [OT] "advanced" database design (long)

From
"Christopher Browne"
Date:
On Feb 3, 2008 11:14 PM, Alex Turner <armtuk@gmail.com> wrote:
> I"m not a database expert, but wouldn't
>
> create table attribute (
>   attribute_id int
>   attribute text
> )
>
> create table value (
>   value_id int
>   value text
> )
>
> create table attribute_value (
>    entity_id int
>   attribute_id int
>   value_id int
> )
>
> give you a lot less  pages to load than building a table with say 90 columns
> in it that are all null, which would result in better rather than worse
> performance?

Definitely not.  90 null values will require about 12 bytes of memory
to represent their absence in the "all in one" table.  That's not very
much space.

In contrast, if you need to join out to 80 tables, possibly folded
into some smaller number, you'll *at least* have an index scan,
reading a few pages of data from the secondary table, and then need to
read the pages containing those values that *are* joined in.

That quickly grows to way more than 12 bytes :-)

--
http://linuxfinances.info/info/linuxdistributions.html
"The definition of insanity is doing the same thing over and over and
expecting different results."  -- assortedly attributed to Albert
Einstein, Benjamin Franklin, Rita Mae Brown, and Rudyard Kipling

Re: "advanced" database design (long)

From
"Masse Jacques"
Date:
> On febr. 2, 15:15, lew...@rocketmail.com (Lewis Cunningham) wrote:
> > --- vladimir konrad <v...@dsl.pipex.com> wrote:
> >
> > > I think that I understand basic relational theory but
> then I had an
> > > idea.
> > > Basically, instead of adding field to a table every time
> there is a
> > > need for it, have a table split in two: one holds
> identity (id) and
> > > one holds the attributes (linked to this id).
> > > Basically, if in the future user decides that the subject should
> > > have a new attribute, he can simply add "attribute
> definition" and
> > > attribute_definition_set (if any) and the application would handle
> >
> > Basically, you would be creating your own data dictionary (i.e.
> > system catalog) on top of the db data dictionary.  The database
> > already comes with a way to easily add columns: ddl.  I have seen
> > newbie database designers reinvent this method a hundred
> times.  The
> > performance hits and complexity of querying data would far
> out weigh
> > any perceived maintenance gain.
> >
> > My .02.
> >
> > LewisC
> >
> > Lewis R Cunningham
> >
> > An Expert's Guide to Oracle
> > Technologyhttp://blogs.ittoolbox.com/oracle/guide/
> >
> > LewisC's Random Thoughtshttp://lewiscsrandomthoughts.blogspot.com/
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's
> datatypes do not
> >        match
>
> I always thought that having nullable columns in a table is a
> Bad Thing
> (http://technet.microsoft.com/en-us/library/ms191178.aspx)
> and shows that you try to put different type of entities into
> the same table - having 90 in a column ... brrrrr.
> I think its much better to avoid it whenever you have the
> info but when you don't you just have to use the EAV model.
> E.g. If I knew what info I wanted to store on a person I
> could create columns for that, but since in our application
> users create the questionnaires that is used to store info on
> persons I see little choice - I must have a subjectID,
> questionID, value table.
>
> SWK
>
>
>
> SWK
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org
> so that your
>        message can get through to the mailing list cleanly
>
>
I use it in the same manner : at the time of recording, I just know one
kind of entity (words) with a value as varchar. Maybe better with xml,
but i use for long time to do the following job with SQL.

Re: "advanced" database design (long)

From
Lew
Date:
SunWuKung wrote:
> I always thought that having nullable columns in a table is a Bad
> Thing (http://technet.microsoft.com/en-us/library/ms191178.aspx) and

Ridiculous.  The argument provided in that article is specious and likely SQL
Server-specific.  NULLable columns should occur wherever your data model calls
for them, typically when you want to have a marker for "unknown" data.  The
advice in that article to move NULLable columns off to a separate table will
actually cause worse, manual "special handling that increases the complexity
of data operations" than the built-in and optimized handling the engine
provides for NULLs.  You should ignore this terrible advice.

> shows that you try to put different type of entities into the same
> table - having 90 in a column ... brrrrr.

Is that a technical evaluation?  As another respondent stated upthread, 90
NULLable columns is possibly a sign of a bad data model.

> I think its much better to avoid it whenever you have the info but
> when you don't you just have to use the EAV model.

Also ridiculous.  You should never "have to use" the EAV so-called "model".

> E.g. If I knew what info I wanted to store on a person I could create
> columns for that, but since in our application users create the
> questionnaires that is used to store info on persons I see little
> choice - I must have a subjectID, questionID, value table.

That's not EAV.  When you're modeling a questionnaire, "subject", "question"
and "answer" (as I interpret your meaning for "value" here) is natural.  EAV
would have a row with "question" as a value in a column, not the name of a
column as you suggest.

It's very hard to actually think in EAV.  The mind naturally thinks of things
like "question" being a column, but in EAV that wouldn't be; "question" would
be a value of a generic column in some row that represents a fragment of the
question being described.  The difficulty of conceptualizing data structures
as EAV is one of the big strikes against it.  The quoted citation evidences
that difficulty quite well - even trying to come up with an example of an EAV
structure wound up with a non-EAV description.

--
Lew

Re: "advanced" database design (long)

From
Jeff Davis
Date:
On Sun, 2008-02-10 at 03:08 -0500, Lew wrote:
> SunWuKung wrote:
> > I always thought that having nullable columns in a table is a Bad
> > Thing (http://technet.microsoft.com/en-us/library/ms191178.aspx) and
>
> Ridiculous.  The argument provided in that article is specious and likely SQL
> Server-specific.  NULLable columns should occur wherever your data model calls
> for them, typically when you want to have a marker for "unknown" data.  The
> advice in that article to move NULLable columns off to a separate table will
> actually cause worse, manual "special handling that increases the complexity
> of data operations" than the built-in and optimized handling the engine
> provides for NULLs.  You should ignore this terrible advice.

I disagree that it's ridiculous. There are good arguments for avoiding
NULLs, not the least of which is that they can lead to very unintuitive
results from queries due to 3VL.

I think the passage in question (from the above URL) is reasonably good
advice. They recommend vertical partitioning to avoid NULLs, and I think
that is a very good design strategy in many cases.

> > I think its much better to avoid it whenever you have the info but
> > when you don't you just have to use the EAV model.
>
> Also ridiculous.  You should never "have to use" the EAV so-called "model".

Agreed.

> question being described.  The difficulty of conceptualizing data structures
> as EAV is one of the big strikes against it.  The quoted citation evidences

Agreed. It has many other strikes as well: for instance, it's difficult
to form any kind of meaningful predicate for a relation in an EAV
design.

Regards,
    Jeff Davis