Thread: proper use of array datatype

proper use of array datatype

From
"Eric Andrews"
Date:
hello all,

I am not much of a schema designer and have a general questoin about
the proper use of the array datatype. In my example, I have
destinations, and destinations can have multiple boxes, and inside
those boxes are a set of contents. what I want to do is search and
basically "mine" data from the content sets. do I use an array
datatype for the content column, or is there a better more efficient
way to go about this?

From http://www.postgresql.org/docs/8.1/interactive/arrays.html

Tip:  Arrays are not sets; searching for specific array elements may
be a sign of database misdesign. Consider using a separate table with
a row for each item that would be an array element. This will be
easier to search, and is likely to scale up better to large numbers of
elements.

thanks,
eric

Re: proper use of array datatype

From
"Nikolay Samokhvalov"
Date:
On 8/1/06, Eric Andrews <eric.m.andrews@gmail.com> wrote:
> ... and inside
> those boxes are a set of contents. what I want to do is search and
> basically "mine" data from the content sets. do I use an array
> datatype for the content column, or is there a better more efficient
> way to go about this?

What kind of content?
Is it possible to design regular table for it (a set of properties is
clear a priori)?

BTW, there is some interesting constrib module - contrib/hstore - that
allows to work with structures similar to perl's hashes. And, one more
- contrib/ltree - that provides the tree-like structures. Both are
based on GiST - that means support of index and, therefore, good
perfomance. Consider using these extensions.

--
Best regards,
Nikolay

Re: proper use of array datatype

From
"Eric Andrews"
Date:
On 8/1/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
> On 8/1/06, Eric Andrews <eric.m.andrews@gmail.com> wrote:
> > ... and inside
> > those boxes are a set of contents. what I want to do is search and
> > basically "mine" data from the content sets. do I use an array
> > datatype for the content column, or is there a better more efficient
> > way to go about this?
>
> What kind of content?
> Is it possible to design regular table for it (a set of properties is
> clear a priori)?

a set of ID numbers and no not really. the boxes are unique to a
destination and the content list/set is unique to the box. I'd have a
bajillion tables :(

> BTW, there is some interesting constrib module - contrib/hstore - that
> allows to work with structures similar to perl's hashes. And, one more
> - contrib/ltree - that provides the tree-like structures. Both are
> based on GiST - that means support of index and, therefore, good
> perfomance. Consider using these extensions.
id like to stay away from addons if i can avoid it you know?

thanks,
eric

Re: proper use of array datatype

From
Erik Jones
Date:
Eric Andrews wrote:
> hello all,
>
> I am not much of a schema designer and have a general questoin about
> the proper use of the array datatype. In my example, I have
> destinations, and destinations can have multiple boxes, and inside
> those boxes are a set of contents. what I want to do is search and
> basically "mine" data from the content sets. do I use an array
> datatype for the content column, or is there a better more efficient
> way to go about this?
>
> From http://www.postgresql.org/docs/8.1/interactive/arrays.html
>
> Tip:  Arrays are not sets; searching for specific array elements may
> be a sign of database misdesign. Consider using a separate table with
> a row for each item that would be an array element. This will be
> easier to search, and is likely to scale up better to large numbers of
> elements.
Yeah, I've never considered arrays to be good data-types for columns.
One possible solution to what (I think) you're trying to do, is to have
a text or varchar column in which you store multiple values separated by
some delimiter (such as ::) that will not occur in the actual option
names.  Then you can write rules to handle
inserting/updating/selecting/deleting options (which would boil down to
string operations).  Or, you could just do the string manipulation
directly in your queries, whichever is easiest for you.

Here's a link to an article that discusses using inheritance for dynamic
content questionnaires (but, it may be overkill for what you  need):

http://www.varlena.com/GeneralBits/110.php

--
erik jones <erik@myemma.com>
software development
emma(r)


Re: proper use of array datatype

From
Reece Hart
Date:
Eric Andrews wrote:
> I am not much of a schema designer and have a general questoin about
> the proper use of the array datatype. In my example, I have
> destinations, and destinations can have multiple boxes, and inside
> those boxes are a set of contents. what I want to do is search and
> basically "mine" data from the content sets.

I would use arrays exclusively for data sets for which each datum is meaningless by itself (for example, a single coordinate in 3D, although there are better ways to handle points in postgresql). I would recommend against using arrays for any data you wish to mine, and instead recast these has-a relationships as many-to-one joins across at least two tables. For example, a row from the table destination has-a (joins to) rows from boxes, and a box has-a (joins to) contents.

The same argument goes for a similar representation such as concatenated values in a text field. The fundamental principle is that it's relatively easy to turn join separate data into a set of values or concatenated list, but it's quite cumbersome to turn a set of values into easily searchable data (i.e., it's often expensive to "peek" inside the structure of the data for a single value). Furthermore, it's difficult or impossible to write check or foreign key constraints on data within such a structure.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Re: proper use of array datatype

From
"Eric Andrews"
Date:


On 8/1/06, Reece Hart <reece@harts.net> wrote:
Eric Andrews wrote:
> I am not much of a schema designer and have a general questoin about
> the proper use of the array datatype. In my example, I have
> destinations, and destinations can have multiple boxes, and inside
> those boxes are a set of contents. what I want to do is search and
> basically "mine" data from the content sets.

I would use arrays exclusively for data sets for which each datum is meaningless by itself (for example, a single coordinate in 3D, although there are better ways to handle points in postgresql). I would recommend against using arrays for any data you wish to mine, and instead recast these has-a relationships as many-to-one joins across at least two tables. For example, a row from the table destination has-a (joins to) rows from boxes, and a box has-a (joins to) contents.


how would these tables look though? I cant have a table for each set of contents in a box...
 

The same argument goes for a similar representation such as concatenated values in a text field. The fundamental principle is that it's relatively easy to turn join separate data into a set of values or concatenated list, but it's quite cumbersome to turn a set of values into easily searchable data (i.e., it's often expensive to "peek" inside the structure of the data for a single value). Furthermore, it's difficult or impossible to write check or foreign key constraints on data within such a structure.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Re: proper use of array datatype

From
Roman Neuhauser
Date:
# eric.m.andrews@gmail.com / 2006-08-02 10:49:01 -0700:
> On 8/1/06, Reece Hart <reece@harts.net> wrote:
> >
> > Eric Andrews wrote:
> >> I am not much of a schema designer and have a general questoin about
> >> the proper use of the array datatype. In my example, I have
> >> destinations, and destinations can have multiple boxes, and inside
> >> those boxes are a set of contents. what I want to do is search and
> >> basically "mine" data from the content sets.
> >
> >I would use arrays exclusively for data sets for which each datum is
> >meaningless by itself (for example, a single coordinate in 3D, although
> >there are better ways to handle points in postgresql). I would recommend
> >against using arrays for any data you wish to mine, and instead recast
> >these
> >has-a relationships as many-to-one joins across at least two tables. For
> >example, a row from the table destination has-a (joins to) rows from boxes,
> >and a box has-a (joins to) contents.
> >
>
>
> how would these tables look though? I cant have a table for each set of
> contents in a box...

    You need to rotate your brains 90 degrees. You cant have a distinct
    set of columns (a table) for each set, but you can have have a
    distinct set of rows (a set) for each, ummm, set. The language suggests
    it's a better model, and indeed it is:

    CREATE TABLE destination (
      destid SERIAL PRIMARY KEY,
      destname VARCHAR
      -- ...
    );
    CREATE TABLE box (
      boxid SERIAL PRIMARY KEY,
      destid INT REFERENCES destination (destid)
      -- ...
    );
    CREATE TABLE box_contents (
      boxid SERIAL REFERENCES box (boxid),
      thing TEXT
      -- ...
    );

    SELECT * FROM box_contents
    JOIN box USING (boxid)
    JOIN destination USING (destid)
    WHERE destination.destname = 'foo';

--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991