On 1/26/07, Bruno Wolff III <bruno@wolff.to> wrote:
> On Thu, Jan 25, 2007 at 10:47:50 -0700,
> Isaac Ben <ib.zero@gmail.com> wrote:
> >
> > The data is gene expression data with 20,000 dimensions. Part of the
> > project I'm working on is to discover what dimensions are truly
> > independent. But to start with I need to have
> > all of the data available in a master table to do analysis on. After
> > the analysis I hope to derive subsets of much lower dimensionality.
>
> Are you actually planning to do the analysis in Postgres? This doesn't seem
> like a real good fit for that kind of task. (Though I haven't played with
> the R stuff, and that might be good for doing that kind of analysis.)
I plan on accessing the data with postgres via python and R. The main
reason for putting the data in postgres is that postgres handles large
data sets well and it will allow me to pull subsets easily if slowly.
>
> If you do put this in postgres, it seems the two most natural things are
> to use arrays to store the dimension values or to have table with a key
> of the gene and the dimension and have another column with the value of
> that dimension for that gene.
Yeah, I received a tip from someone regarding the use of arrays, and I
think that I will be using that. Thanks for the tips.
IB