Re: [GENERAL] arrays - Mailing list pgsql-sql

From Josh Berkus
Subject Re: [GENERAL] arrays
Date
Msg-id web-1734564@davinci.ethosmedia.com
Whole thread Raw
In response to Re: [GENERAL] arrays  (Mike Sosteric <mikes@athabascau.ca>)
Responses Re: [GENERAL] arrays  ("Dan Langille" <dan@langille.org>)
Re: [GENERAL] arrays  (Mike Sosteric <mikes@athabascau.ca>)
Re: [GENERAL] arrays  (Roland Roberts <roland@astrofoto.org>)
List pgsql-sql
Mike,

> We are currently developing a database to host some complicated, XMl
> layered data. We have chosen postgres because of its ability to store
> multidimensional arrays. We feel that using these will allow us to
> simplify the database structure considerably by storing some data in
> multidimensional arrays. 

Hmmm ... I'm curious; what kind of data do you feel could be
*simplified* by multi-dimensional arrays?   

> However, we currently have some dissenters who believe that using the
> multidimensional arrays will make queries slower and unneccesarily
> complicated. 

They're correct, especially about the latter.

> 1) are SQL queries slower when extracting data from multidimensional
> arrays

Yes, but this is fixable; see the Intarray package in /contrib.

> 2) are table joins more difficult or unneccesarily complicated

Yes.

> 3) can you do selects on only a portion of a multidimensional array.

Yes.

> That
> is, if you were storing multilanguage titles in a two dimensional
> array, 
> 
> [en], "english title"
> [fr], "french title"
> 
> could you select where title[0] = 'en'

Yes.

> I know these may sound like terribily stupid questions. but we need
> some
> quick guidance before proceeding with a schema that relies on these
> advanced data features of postgres

The problem you will be facing is that Arrays are one of the
fundamentally *Non-Relational* features that Postgresql supports for a
limited set of specialized purposes (mostly buffer tables, procedures,
and porting from MySQL).   As such, incorporating arrays into any kind
of complex schema will drive you to drink ... and is 95% likely more
easily done through tables and sub-tables, in any case.  

Let's take your example of "title", and say we wanted to use it in a
join:

SELECT movie.name, movie.show_date, movie.title_lang, title.translation
FROM movies JOIN title_langs ON (movie.title_lang[1] = title_langs.lang OR movie.title_lang[2] =
title_langs.lang OR movie.title_lang[3] = title_langs.lang ... )

... as you can see, the join is extremely painful.   Let alone
constructing a query like "Select all movies with titles only in
English and French and one other language."  (try it, really)

Then there's the not insignificant annoyance of getting data into and
out of multi-dimensional arrays,  which must constantly be parsed into
text strings.  And the fact that you will have to keep track, in your
middleware code, of what the ordinal numbers of arrays mean, since
array elements are fundamentally ordered.   (BTW, Postgres arrays begin
at 1, not 0)

Now, I know at least one person who is using arrays to store scientific
data.  However, that data arrives in his lab in the form of matrices,
and is not used for joins or query criteria beyond a simple "where"
clause.

As such, I'd reccommend one of two approaches for you:

1) Post some of your schema ideas here, and let us show you how they
are better done relationally.   The relational data model has 30 years
of thought behind it -- it can solve a lot of problems.

2) Shift over to an XML database or a full-blown OODB (like Cache').

Good luck.

-Josh Berkus





pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] arrays
Next
From: "Dan Langille"
Date:
Subject: Re: [GENERAL] arrays