Thread: Dynamic table

Dynamic table

From
A B
Date:
Hi.
I have a little problem (and a suggestion for a solution) that I
wondered if anyone would care to comment on.

I have a standard table filled with customers (with a unique customer
id, names etc.) and for each customer I need to store some integer
values. The problem is that the number of integer values that should
be stored WILL change over time (it can both increase and decrease).
It will most likely grow from zero to 10-18 and then perhaps add 1 or
remove one value per year but basically be about 10-18 or so.

I must be able to add a new integer value to all customers, and remove
an integer value from all customers  Altering the table by adding and
deleting columns would theoretically solve it, but since columns are
not  really dropped, just hidden away, and there is a 1600 column
limit on tables as I understand it, this would crash the application
at some time in the future, or at least it will waste a lot of
discspace.

Other things I must be able to do is to alter values for a specific
integer value for one customer or all customers simultaneous. This can
be like "change all value 4 to 8 for the third integer value".
And I must be able to quickly select all the integers for a specific
customer for further processing. There will also be access to single
integer values for a specific customer. It's hard to say how much
compared to "get all integer values" for a specific customer. All
customers will be equally accessed.

As I see it I have these options.

Method A) a "add-only" dynamic table that re-uses columns that are not
used any more.

First create a metatable
CREATE TABLE Columnspec (id integer, columnname varchar(10), name
varchar(30), defaultvalue integer, active boolean default true );
That will keep track of the dynamically added columns.

and a table for all the integer values
CREATE TABLE customerValues (customer_id integer references customer);
Initially it will be empty

and adding a new integer value called "shoe size" with a default value
of 4 would mean to run these two queries.
insert into Columnspec (id,columnname,name,defaultvalue) values (1,
'col1', 'shoe size', 4);
alter table customerValues add column col1 integer default 4;

removing a column would mean an operation like
update Columnspec set active=false where id=5;

then adding a new column would mean to reuse an inactive column, like
the following operations
update Columnspec set active=true, name='hair length', defaultvalue=10
where id=5;
update customerValues set col5=10; -- to give all customers the default value.

One could perhaps alter the default value of the column called 'col5'
in the example above and also alter the names of the columns but that
are just cosmetic.
The only drawback I can see with this rightnow is that if the number
of columns decreases, it will waste space.

Method B)
The EAV way.
With a meta table
CREATE TABLE metadata (id integer primary key, name varchar(30),
defaultvalue integer);

and then the values
CREATE TABLE eav (customer_id references customer, key integer
references metadata on delete cascade , value integer,
unique(customer_id, key)  );

With c customers, k integer fields method A requires  at least c*(k+1)
integers to store,  B requires c*k*3. A factor about 3 sounds not that
much wasted space, but of course it can make a difference.

Method C)
Have a metadata table as usual, and then store the values in an array.
I must admit that I have not looked enough at arrays yet.
The drawback I can see right now will be the access to a specific
value in the array, but I guess PostgreSQL is fast...

Method D)
As method B) but add a view that aggregates the values into a complete
set for each customers if that will speed up anything....

Any comments on the pros/cons with these strategies?

Re: Dynamic table

From
Sam Mason
Date:
On Tue, Jun 16, 2009 at 09:11:20AM +0200, A B wrote:
> I have a standard table filled with customers (with a unique customer
> id, names etc.) and for each customer I need to store some integer
> values. The problem is that the number of integer values that should
> be stored WILL change over time (it can both increase and decrease).
> It will most likely grow from zero to 10-18 and then perhaps add 1 or
> remove one value per year but basically be about 10-18 or so.
>
> I must be able to add a new integer value to all customers, and remove
> an integer value from all customers  Altering the table by adding and
> deleting columns would theoretically solve it, but since columns are
> not  really dropped, just hidden away, and there is a 1600 column
> limit on tables as I understand it, this would crash the application
> at some time in the future, or at least it will waste a lot of
> discspace.

> Method B)
> The EAV way.
> With a meta table
> CREATE TABLE metadata (id integer primary key, name varchar(30),
> defaultvalue integer);
>
> and then the values
> CREATE TABLE eav (customer_id references customer, key integer
> references metadata on delete cascade , value integer,
> unique(customer_id, key)  );

The way you described the problem the EAV solution sounds like the best
match--not sure if I'd use your synthetic keys though, they will save a
bit of space on disk but queries will be much more complicated to write.
EAV style solutions are rarely good/easy to maintain when the problem
changes so maybe you can take a step back from the problem and solve it
some other way.

The examples you gave (i.e. shoe size, hair length) would fit normal
table columns much better.  Maybe just use a set of tables for each set
of related attributes (i.e. physical attributes, personal preferences,
not sure what would fit here).  You'd also be able to use appropriate
data types/constraints, for example shoe size could have two columns one
for the measurement system (i.e. European, US, UK...) and the other for
the actual measurement.

Just had a quick flick through your previous posts; and I'd probably
stick with the multiple tables approach.  It's the most natural fit to
relational databases and until you know more about the problem (i.e.
you've experienced the data your going to be getting and the ways it's
going to change) you can't do much better.

--
  Sam  http://samason.me.uk/

Re: Dynamic table

From
A B
Date:
> The way you described the problem the EAV solution sounds like the best
> match--not sure if I'd use your synthetic keys though, they will save a
> bit of space on disk but queries will be much more complicated to write.
I guess I'll have to build procedures for all the complicated queries
when ever I add or remove an integer value.


> EAV style solutions are rarely good/easy to maintain when the problem
> changes so maybe you can take a step back from the problem and solve it
> some other way.
That's what I keep reading about EAV :-(


> The examples you gave (i.e. shoe size, hair length) would fit normal
> table columns much better.
Sorry, shoe size was not a good example, think of it as  <random
string>  instead of shoe size. The data/name is nothing you can relate
to in any way or build special columns for or treat in other ways.


> Just had a quick flick through your previous posts; and I'd probably
> stick with the multiple tables approach.  It's the most natural fit to
> relational databases and until you know more about the problem (i.e.
> you've experienced the data your going to be getting and the ways it's
> going to change) you can't do much better.

One table per integer is one way that I have not thought about. Thanks!

Re: Dynamic table

From
Greg Stark
Date:
On Tue, Jun 16, 2009 at 12:21 PM, A B<gentosaker@gmail.com> wrote:
>
>> Just had a quick flick through your previous posts; and I'd probably
>> stick with the multiple tables approach.  It's the most natural fit to
>> relational databases and until you know more about the problem (i.e.
>> you've experienced the data your going to be getting and the ways it's
>> going to change) you can't do much better.
>
> One table per integer is one way that I have not thought about. Thanks!

you said you wanted to quickly select all the values for a given user.
That does not jibe well with having to use a separate table for each
integer.

I don't think think it's fair to call this EAV actually. It sounds
like the integers are a collection of things which represent the same
thing. Ie, they're all bank balances or all distances driven, just for
different time periods. Storing all objects representing the same
thing in the same column is just a regular normalized table, not EAV.

You might want to consider partitioning the table of integers by type
to facilitate dropping the old ones. But that's an administrative
trick, not a fundamental schema design decision.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Dynamic table

From
A B
Date:


2009/6/16 Greg Stark <gsstark@mit.edu>
On Tue, Jun 16, 2009 at 12:21 PM, A B<gentosaker@gmail.com> wrote:

I don't think think it's fair to call this EAV actually. It sounds
like the integers are a collection of things which represent the same
thing. Ie, they're all bank balances or all distances driven, just for
different time periods. Storing all objects representing the same
thing in the same column is just a regular normalized table, not EAV.

You might want to consider partitioning the table of integers by type
to facilitate dropping the old ones. But that's an administrative
trick, not a fundamental schema design decision.

Unfortunatly there is no "type" here. One can make no distinction between the integers and one can not tell when they are added or dropped.

Re: Dynamic table

From
A B
Date:
2009/6/16 A B <gentosaker@gmail.com>:
>
> 2009/6/16 Greg Stark <gsstark@mit.edu>
>>
>> I don't think think it's fair to call this EAV actually. It sounds
>> like the integers are a collection of things which represent the same
>> thing. Ie, they're all bank balances or all distances driven, just for
>> different time periods. Storing all objects representing the same
>> thing in the same column is just a regular normalized table, not EAV.
>>
>> You might want to consider partitioning the table of integers by type
>> to facilitate dropping the old ones. But that's an administrative
>> trick, not a fundamental schema design decision.
>
> Unfortunatly there is no "type" here. One can make no distinction between
> the integers and one can not tell when they are added or dropped.
>

Or did I misunderstand you Greg?

Re: Dynamic table

From
Sam Mason
Date:
On Tue, Jun 16, 2009 at 01:21:42PM +0200, A B wrote:
> > The examples you gave (i.e. shoe size, hair length) would fit normal
> > table columns much better.
> Sorry, shoe size was not a good example, think of it as  <random
> string>  instead of shoe size. The data/name is nothing you can relate
> to in any way or build special columns for or treat in other ways.

It's this decision that makes me think you've backed yourself into a
corner where EAV is your only solution.  EAV is "nice" because it allows
you express arbitrary parings between keys and values (i.e. it's a "map"
of arbitrary size, like a hashtable in other languages) but it has its
downsides in that all values are constrained to be of some specific type
and it doesn't allow you to exploit the power/expressiveness of databases
very easily.

> > Just had a quick flick through your previous posts; and I'd probably
> > stick with the multiple tables approach.  It's the most natural fit to
> > relational databases and until you know more about the problem (i.e.
> > you've experienced the data your going to be getting and the ways it's
> > going to change) you can't do much better.
>
> One table per integer is one way that I have not thought about. Thanks!

That's not what I meant at all; I was meaning to group several related
integer values into a single table.  This would be a more natural fit
for the example names you gave.

Your problem is currently sounding very much like an exam question; you
seem to be arbitrarily making decisions without showing any real data.
When you deal with real problems in the real world you're normally
making compromises when you model things and hence the decisions
wouldn't be as forced as you're making it.  Design is about picking and
choosing between compromises and without knowing what the choices are
you can't design anything, this is one of the problems with tests.

--
  Sam  http://samason.me.uk/

Re: Dynamic table

From
Erik Jones
Date:
On Jun 16, 2009, at 12:11 AM, A B wrote:

> Hi.
> I have a little problem (and a suggestion for a solution) that I
> wondered if anyone would care to comment on.
>
> I have a standard table filled with customers (with a unique customer
> id, names etc.) and for each customer I need to store some integer
> values. The problem is that the number of integer values that should
> be stored WILL change over time (it can both increase and decrease).
> It will most likely grow from zero to 10-18 and then perhaps add 1 or
> remove one value per year but basically be about 10-18 or so.
>
> I must be able to add a new integer value to all customers, and remove
> an integer value from all customers  Altering the table by adding and
> deleting columns would theoretically solve it, but since columns are
> not  really dropped, just hidden away, and there is a 1600 column
> limit on tables as I understand it, this would crash the application
> at some time in the future, or at least it will waste a lot of
> discspace.
>
> Other things I must be able to do is to alter values for a specific
> integer value for one customer or all customers simultaneous. This can
> be like "change all value 4 to 8 for the third integer value".
> And I must be able to quickly select all the integers for a specific
> customer for further processing. There will also be access to single
> integer values for a specific customer. It's hard to say how much
> compared to "get all integer values" for a specific customer. All
> customers will be equally accessed.
>
> As I see it I have these options.
>
> Method C)
> Have a metadata table as usual, and then store the values in an array.
> I must admit that I have not looked enough at arrays yet.
> The drawback I can see right now will be the access to a specific
> value in the array, but I guess PostgreSQL is fast...

You may want to try this.

pagila=# create table test (id serial primary key, a int[]);
NOTICE:  CREATE TABLE will create implicit sequence "test_id_seq" for
serial column "test.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE
Time: 12.844 ms

-- let's populate it with some starter data
pagila=# insert into test (a) values ('{1}'), ('{2}'), ('{3}');
INSERT 0 3
Time: 2.127 ms
pagila=# select * from test;
  id |  a
----+-----
   1 | {1}
   2 | {2}
   3 | {3}
(3 rows)

Time: 2.823 ms

-- so, lets say you want to add a second integer value of 5 to all
rows (remember, postgres arrays are 1-based):
pagila=# update test set a[2] = 5;
UPDATE 3
Time: 1.157 ms
pagila=# select * from test;
  id |   a
----+-------
   1 | {1,5}
   2 | {2,5}
   3 | {3,5}
(3 rows)

Time: 0.445 ms

-- delete the first integer value for just id=1, the key thing here is
that you use update to delete an individual value in an array
pagila=# update test set a[1] = null where id = 1;
UPDATE 1
Time: 1.688 ms
pagila=# select * from test;
  id |    a
----+----------
   2 | {2,5}
   3 | {3,5}
   1 | {NULL,5}
(3 rows)

Time: 0.527 ms

-- get integer value 1 for all rows
pagila=# select a[1] from test;
  a
----
   2
   3
  \N
(3 rows)

Time: 0.489 ms

-- you can even skip positions
pagila=# update test set a[5] = 10;
UPDATE 3
Time: 1.180 ms
pagila=# select * from test;
  id |           a
----+-----------------------
   2 | {2,5,NULL,NULL,10}
   3 | {3,5,NULL,NULL,10}
   1 | {NULL,5,NULL,NULL,10}
(3 rows)

Time: 0.431 ms

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: Dynamic table

From
A B
Date:
> Your problem is currently sounding very much like an exam question; you
> seem to be arbitrarily making decisions without showing any real data.
> When you deal with real problems in the real world you're normally
> making compromises when you model things and hence the decisions
> wouldn't be as forced as you're making it.  Design is about picking and
> choosing between compromises and without knowing what the choices are
> you can't design anything, this is one of the problems with tests.

I wish it was just an exam question....

Show real data?
Well here it is:

customer | value1 | value2 | value3|
    1             5           3          4
    2             8           2          10

I hope you can believe me when I say that the names
value1,value2,value3 really are impossible to relate. I will not
decide upon those. Someone else is going to do that in the future. I
just have to make a structure that can handle all cases from "shoe
size" to "number of atoms in persons body"   (well, perhaps I can say
that the values will be in the 0...100 range, but that's about it.
There is really nothing else I know about how the system will be used
by other people. :-( I just know that I have to make them enter data
like this since it is a third party that need this kind of data.

<confession>yes I took some random values for the "real data" since I
don't know anything else about the data. </confession>

Re: Dynamic table

From
Jasen Betts
Date:
On 2009-06-17, A B <gentosaker@gmail.com> wrote:
>> Your problem is currently sounding very much like an exam question; you
>> seem to be arbitrarily making decisions without showing any real data.
>> When you deal with real problems in the real world you're normally
>> making compromises when you model things and hence the decisions
>> wouldn't be as forced as you're making it.  Design is about picking and
>> choosing between compromises and without knowing what the choices are
>> you can't design anything, this is one of the problems with tests.
>
> I wish it was just an exam question....
>
> Show real data?
> Well here it is:
>
> customer | value1 | value2 | value3|
>     1             5           3          4
>     2             8           2          10
>
> I hope you can believe me when I say that the names
> value1,value2,value3 really are impossible to relate. I will not
> decide upon those. Someone else is going to do that in the future. I
> just have to make a structure that can handle all cases from "shoe
> size" to "number of atoms in persons body"   (well, perhaps I can say
> that the values will be in the 0...100 range, but that's about it.
> There is really nothing else I know about how the system will be used
> by other people. :-( I just know that I have to make them enter data
> like this since it is a third party that need this kind of data.
>
><confession>yes I took some random values for the "real data" since I
> don't know anything else about the data. </confession>

I like the table-per-column approach

And also the array approach, arrays of numbers of the sizes you discuss
are much faster than hard disks

what sorts of querys will be most common




Re: Dynamic table

From
"Hartman, Matthew"
Date:
Isn't a many-to-one relationship the classic example of a child table?

Have one parent table that assigns a primary key. "PARENT" with "PARENT_ID". Have a child table that has a name and
valuecolumn. "CHILD" with "PARENT_ID", "COLUMN_NAME", and "COLUMN_VALUE". Perform joins as you see fit to build up the
structureat times. Use arrays if you'd like or perform multiple joins, which ever. 


Matthew Hartman
Programmer/Analyst
Information Management
Kingston General Hospital, ICP
(613) 544-2631 x4294

________________________________

From: pgsql-general-owner@postgresql.org on behalf of Jasen Betts
Sent: Sat 20-Jun-09 12:14 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Dynamic table



On 2009-06-17, A B <gentosaker@gmail.com> wrote:
>> Your problem is currently sounding very much like an exam question; you
>> seem to be arbitrarily making decisions without showing any real data.
>> When you deal with real problems in the real world you're normally
>> making compromises when you model things and hence the decisions
>> wouldn't be as forced as you're making it.  Design is about picking and
>> choosing between compromises and without knowing what the choices are
>> you can't design anything, this is one of the problems with tests.
>
> I wish it was just an exam question....
>
> Show real data?
> Well here it is:
>
> customer | value1 | value2 | value3|
>     1             5           3          4
>     2             8           2          10
>
> I hope you can believe me when I say that the names
> value1,value2,value3 really are impossible to relate. I will not
> decide upon those. Someone else is going to do that in the future. I
> just have to make a structure that can handle all cases from "shoe
> size" to "number of atoms in persons body"   (well, perhaps I can say
> that the values will be in the 0...100 range, but that's about it.
> There is really nothing else I know about how the system will be used
> by other people. :-( I just know that I have to make them enter data
> like this since it is a third party that need this kind of data.
>
><confession>yes I took some random values for the "real data" since I
> don't know anything else about the data. </confession>

I like the table-per-column approach

And also the array approach, arrays of numbers of the sizes you discuss
are much faster than hard disks

what sorts of querys will be most common




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general