Thread: proper use of array datatype
hello all, I am not much of a schema designer and have a general questoin about the proper use of the array datatype. In my example, I have destinations, and destinations can have multiple boxes, and inside those boxes are a set of contents. what I want to do is search and basically "mine" data from the content sets. do I use an array datatype for the content column, or is there a better more efficient way to go about this? From http://www.postgresql.org/docs/8.1/interactive/arrays.html Tip: Arrays are not sets; searching for specific array elements may be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale up better to large numbers of elements. thanks, eric
On 8/1/06, Eric Andrews <eric.m.andrews@gmail.com> wrote: > ... and inside > those boxes are a set of contents. what I want to do is search and > basically "mine" data from the content sets. do I use an array > datatype for the content column, or is there a better more efficient > way to go about this? What kind of content? Is it possible to design regular table for it (a set of properties is clear a priori)? BTW, there is some interesting constrib module - contrib/hstore - that allows to work with structures similar to perl's hashes. And, one more - contrib/ltree - that provides the tree-like structures. Both are based on GiST - that means support of index and, therefore, good perfomance. Consider using these extensions. -- Best regards, Nikolay
On 8/1/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > On 8/1/06, Eric Andrews <eric.m.andrews@gmail.com> wrote: > > ... and inside > > those boxes are a set of contents. what I want to do is search and > > basically "mine" data from the content sets. do I use an array > > datatype for the content column, or is there a better more efficient > > way to go about this? > > What kind of content? > Is it possible to design regular table for it (a set of properties is > clear a priori)? a set of ID numbers and no not really. the boxes are unique to a destination and the content list/set is unique to the box. I'd have a bajillion tables :( > BTW, there is some interesting constrib module - contrib/hstore - that > allows to work with structures similar to perl's hashes. And, one more > - contrib/ltree - that provides the tree-like structures. Both are > based on GiST - that means support of index and, therefore, good > perfomance. Consider using these extensions. id like to stay away from addons if i can avoid it you know? thanks, eric
Eric Andrews wrote: > hello all, > > I am not much of a schema designer and have a general questoin about > the proper use of the array datatype. In my example, I have > destinations, and destinations can have multiple boxes, and inside > those boxes are a set of contents. what I want to do is search and > basically "mine" data from the content sets. do I use an array > datatype for the content column, or is there a better more efficient > way to go about this? > > From http://www.postgresql.org/docs/8.1/interactive/arrays.html > > Tip: Arrays are not sets; searching for specific array elements may > be a sign of database misdesign. Consider using a separate table with > a row for each item that would be an array element. This will be > easier to search, and is likely to scale up better to large numbers of > elements. Yeah, I've never considered arrays to be good data-types for columns. One possible solution to what (I think) you're trying to do, is to have a text or varchar column in which you store multiple values separated by some delimiter (such as ::) that will not occur in the actual option names. Then you can write rules to handle inserting/updating/selecting/deleting options (which would boil down to string operations). Or, you could just do the string manipulation directly in your queries, whichever is easiest for you. Here's a link to an article that discusses using inheritance for dynamic content questionnaires (but, it may be overkill for what you need): http://www.varlena.com/GeneralBits/110.php -- erik jones <erik@myemma.com> software development emma(r)
Eric Andrews wrote:
> I am not much of a schema designer and have a general questoin about
> the proper use of the array datatype. In my example, I have
> destinations, and destinations can have multiple boxes, and inside
> those boxes are a set of contents. what I want to do is search and
> basically "mine" data from the content sets.
I would use arrays exclusively for data sets for which each datum is meaningless by itself (for example, a single coordinate in 3D, although there are better ways to handle points in postgresql). I would recommend against using arrays for any data you wish to mine, and instead recast these has-a relationships as many-to-one joins across at least two tables. For example, a row from the table destination has-a (joins to) rows from boxes, and a box has-a (joins to) contents.
The same argument goes for a similar representation such as concatenated values in a text field. The fundamental principle is that it's relatively easy to turn join separate data into a set of values or concatenated list, but it's quite cumbersome to turn a set of values into easily searchable data (i.e., it's often expensive to "peek" inside the structure of the data for a single value). Furthermore, it's difficult or impossible to write check or foreign key constraints on data within such a structure.
-Reece
> I am not much of a schema designer and have a general questoin about
> the proper use of the array datatype. In my example, I have
> destinations, and destinations can have multiple boxes, and inside
> those boxes are a set of contents. what I want to do is search and
> basically "mine" data from the content sets.
I would use arrays exclusively for data sets for which each datum is meaningless by itself (for example, a single coordinate in 3D, although there are better ways to handle points in postgresql). I would recommend against using arrays for any data you wish to mine, and instead recast these has-a relationships as many-to-one joins across at least two tables. For example, a row from the table destination has-a (joins to) rows from boxes, and a box has-a (joins to) contents.
The same argument goes for a similar representation such as concatenated values in a text field. The fundamental principle is that it's relatively easy to turn join separate data into a set of values or concatenated list, but it's quite cumbersome to turn a set of values into easily searchable data (i.e., it's often expensive to "peek" inside the structure of the data for a single value). Furthermore, it's difficult or impossible to write check or foreign key constraints on data within such a structure.
-Reece
-- Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0 |
On 8/1/06, Reece Hart <reece@harts.net> wrote:
how would these tables look though? I cant have a table for each set of contents in a box...
Eric Andrews wrote:
> I am not much of a schema designer and have a general questoin about
> the proper use of the array datatype. In my example, I have
> destinations, and destinations can have multiple boxes, and inside
> those boxes are a set of contents. what I want to do is search and
> basically "mine" data from the content sets.I would use arrays exclusively for data sets for which each datum is meaningless by itself (for example, a single coordinate in 3D, although there are better ways to handle points in postgresql). I would recommend against using arrays for any data you wish to mine, and instead recast these has-a relationships as many-to-one joins across at least two tables. For example, a row from the table destination has-a (joins to) rows from boxes, and a box has-a (joins to) contents.
how would these tables look though? I cant have a table for each set of contents in a box...
The same argument goes for a similar representation such as concatenated values in a text field. The fundamental principle is that it's relatively easy to turn join separate data into a set of values or concatenated list, but it's quite cumbersome to turn a set of values into easily searchable data (i.e., it's often expensive to "peek" inside the structure of the data for a single value). Furthermore, it's difficult or impossible to write check or foreign key constraints on data within such a structure.
-Reece
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
# eric.m.andrews@gmail.com / 2006-08-02 10:49:01 -0700: > On 8/1/06, Reece Hart <reece@harts.net> wrote: > > > > Eric Andrews wrote: > >> I am not much of a schema designer and have a general questoin about > >> the proper use of the array datatype. In my example, I have > >> destinations, and destinations can have multiple boxes, and inside > >> those boxes are a set of contents. what I want to do is search and > >> basically "mine" data from the content sets. > > > >I would use arrays exclusively for data sets for which each datum is > >meaningless by itself (for example, a single coordinate in 3D, although > >there are better ways to handle points in postgresql). I would recommend > >against using arrays for any data you wish to mine, and instead recast > >these > >has-a relationships as many-to-one joins across at least two tables. For > >example, a row from the table destination has-a (joins to) rows from boxes, > >and a box has-a (joins to) contents. > > > > > how would these tables look though? I cant have a table for each set of > contents in a box... You need to rotate your brains 90 degrees. You cant have a distinct set of columns (a table) for each set, but you can have have a distinct set of rows (a set) for each, ummm, set. The language suggests it's a better model, and indeed it is: CREATE TABLE destination ( destid SERIAL PRIMARY KEY, destname VARCHAR -- ... ); CREATE TABLE box ( boxid SERIAL PRIMARY KEY, destid INT REFERENCES destination (destid) -- ... ); CREATE TABLE box_contents ( boxid SERIAL REFERENCES box (boxid), thing TEXT -- ... ); SELECT * FROM box_contents JOIN box USING (boxid) JOIN destination USING (destid) WHERE destination.destname = 'foo'; -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991