Thread: column limit
Hi, I'm trying to create a table with 20,000 columns of type int2, but I keep getting the error message that the limit is 1600. According to this message http://archives.postgresql.org/pgsql-admin/2001-01/msg00199.php it can be increased, but only up to about 6400. Can anyone tell me how to get 20,000 columns? Thanks, IB
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/25/07 09:34, Isaac Ben wrote: > Hi, > I'm trying to create a table with 20,000 columns of type int2, but I > keep getting the error message that the limit is 1600. According to > this message > http://archives.postgresql.org/pgsql-admin/2001-01/msg00199.php > it can be increased, but only up to about 6400. Can anyone tell me > how to get 20,000 columns? Why the heck do you need 20 *thousand* columns? Assuming, though, that you know what you're doing, and the design isn't horribly botched, then arrays might be what you want. Normalizing the table might be better, and vertically partitioning it would be a big performance win if you don't need all 20 *thousand* columns at the same time. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFuNE0S9HxQb37XmcRAoxxAKCLrX0WaekNH8N8ghAzMkhgMmZ43ACg7F0K 0pqcprs/suZ/1xmK73PAdOE= =WLdL -----END PGP SIGNATURE-----
Hi, Sorry, I forgot to post back to the list instead of just replying individual responders. The data is gene expression data with 20,000 dimensions. Part of the project I'm working on is to discover what dimensions are truly independent. But to start with I need to have all of the data available in a master table to do analysis on. After the analysis I hope to derive subsets of much lower dimensionality. IB Isaac Ben Jeppsen On 1/25/07, David Brain <dbrain@bandwidth.com> wrote: > Hi, > > Seeing as how no one has asked this question yet - I have to ask, why do > you need 20,000 columns? I'm sure I'm not the only one who is curious. > > It's hard to think of a situation where this couldn't be solved by using > a schema with parent/child tables to have 20000 rows per record rather > than 20000 columns. > > David. > > Isaac Ben wrote: > > Hi, > > I'm trying to create a table with 20,000 columns of type int2, but I > > keep getting the error message that the limit is 1600. According to > > this message > > http://archives.postgresql.org/pgsql-admin/2001-01/msg00199.php > > it can be increased, but only up to about 6400. Can anyone tell me > > how to get 20,000 columns? > > > > Thanks, > > IB > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > -- > David Brain - bandwidth.com > dbrain@bandwidth.com > 919.297.1078 >
On Thu, Jan 25, 2007 at 08:34:08 -0700, Isaac Ben <ib.zero@gmail.com> wrote: > Hi, > I'm trying to create a table with 20,000 columns of type int2, but I > keep getting the error message that the limit is 1600. According to > this message http://archives.postgresql.org/pgsql-admin/2001-01/msg00199.php > it can be increased, but only up to about 6400. Can anyone tell me > how to get 20,000 columns? Can you explain what you are really trying to do? It is unlikely that using 20000 columns is the best way to solve your problem. If we know what you are really trying to do we may be able to make some other suggestions. One thing you might start looking at is using an array or arrays.
On Thu, Jan 25, 2007 at 10:47:50AM -0700, Isaac Ben wrote: > The data is gene expression data with 20,000 dimensions. Part of the > project I'm working on is to discover what dimensions are truly > independent. But to start with I need to have > all of the data available in a master table to do analysis on. After > the analysis I hope to derive subsets of much lower dimensionality. Even if you managed to hack the server enough to make that work (which is debatable) performance is going to suck. The system is simply not setup for that kind of thing. Use either arrays, or normalize the data into a seperate table. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Thu, Jan 25, 2007 at 10:47:50 -0700, Isaac Ben <ib.zero@gmail.com> wrote: > > The data is gene expression data with 20,000 dimensions. Part of the > project I'm working on is to discover what dimensions are truly > independent. But to start with I need to have > all of the data available in a master table to do analysis on. After > the analysis I hope to derive subsets of much lower dimensionality. Are you actually planning to do the analysis in Postgres? This doesn't seem like a real good fit for that kind of task. (Though I haven't played with the R stuff, and that might be good for doing that kind of analysis.) If you do put this in postgres, it seems the two most natural things are to use arrays to store the dimension values or to have table with a key of the gene and the dimension and have another column with the value of that dimension for that gene.
On 1/26/07, Bruno Wolff III <bruno@wolff.to> wrote: > On Thu, Jan 25, 2007 at 10:47:50 -0700, > Isaac Ben <ib.zero@gmail.com> wrote: > > > > The data is gene expression data with 20,000 dimensions. Part of the > > project I'm working on is to discover what dimensions are truly > > independent. But to start with I need to have > > all of the data available in a master table to do analysis on. After > > the analysis I hope to derive subsets of much lower dimensionality. > > Are you actually planning to do the analysis in Postgres? This doesn't seem > like a real good fit for that kind of task. (Though I haven't played with > the R stuff, and that might be good for doing that kind of analysis.) I plan on accessing the data with postgres via python and R. The main reason for putting the data in postgres is that postgres handles large data sets well and it will allow me to pull subsets easily if slowly. > > If you do put this in postgres, it seems the two most natural things are > to use arrays to store the dimension values or to have table with a key > of the gene and the dimension and have another column with the value of > that dimension for that gene. Yeah, I received a tip from someone regarding the use of arrays, and I think that I will be using that. Thanks for the tips. IB
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/26/07 13:37, Isaac Ben wrote: > On 1/26/07, Bruno Wolff III <bruno@wolff.to> wrote: >> On Thu, Jan 25, 2007 at 10:47:50 -0700, Isaac Ben >> <ib.zero@gmail.com> wrote: [snip] > > I plan on accessing the data with postgres via python and R. The > main reason for putting the data in postgres is that postgres > handles large data sets well and it will allow me to pull subsets > easily if slowly. I wonder if sed/grep/awk (or, just perl) could rapidly do your row and column pre-filtering? >> If you do put this in postgres, it seems the two most natural >> things are to use arrays to store the dimension values or to >> have table with a key of the gene and the dimension and have >> another column with the value of that dimension for that gene. > > Yeah, I received a tip from someone regarding the use of arrays, > and I think that I will be using that. Thanks for the tips. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFulz7S9HxQb37XmcRArBQAKCZc1Eusg/HtsdMKs8A6z8MTT6FgACg1GuU yOjqrCxi8CIPX3rCjrDcX6U= =0xY2 -----END PGP SIGNATURE-----