Thread: Beyond the 1600 columns limit on windows
Beyond the 1600 columns limit on windows
From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
Hi guys,
I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres.
Regards
-Evandro
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: > Hi guys, > I would like to know if it is possible to have more than 1600 columns on > windows without recompiling postgres. I don't think so. Are you sure you need more than 1600 columns? That's many more than I've ever wanted or needed. If you can share some details of the problem you are trying to solve, perhaps someone can see a different solution for you. -- Richard Huxton Archonet Ltd
Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: > Hi guys, > > I would like to know if it is possible to have more than 1600 columns on > windows without recompiling postgres. > I would like to know who on earth needs 1600 columns and even beyond? Hint: you can have practically unlimited rows in your n:m table :-)
Re: Beyond the 1600 columns limit on windows
From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
I'm doing a PhD in data mining and I need more than 1600 columns. I got an error message saying that I can not use more than 1600 columns.
It is happening because I have to change categorical values to binary creating new columns. Do you know if oracle can handle it?
--
Evandro M Leite Jr.
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Mobile 079 068 70740 Office 023 8055 3644 Home 023 8055 9160
Evandro M Leite Jr.
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Mobile 079 068 70740 Office 023 8055 3644 Home 023 8055 9160
On 11/8/05, Tino Wildenhain <tino@wildenhain.de> wrote:
Evandro's mailing lists (Please, don't send personal messages to this
address) schrieb:
> Hi guys,
>
> I would like to know if it is possible to have more than 1600 columns on
> windows without recompiling postgres.
>
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: > I'm doing a PhD in data mining and I need more than 1600 columns. I got an > error message saying that I can not use more than 1600 columns. > It is happening because I have to change categorical values to binary > creating new columns. Perhaps you don't want a relational database at all if you are stretching it to match your client application in this way. Do I have it right that you have something like Table: bird_sighting_facts (bird, category, value) 1 | wingspan | 120mm 2 | beak-colour | red 3 | chest-colour| blue ... And are converting it into: expanded_bird_facts (bird, cat_wingspan, cat_beak_colour, cat_chest_colour, ...) In which case since you'll almost certainly be throwing away any relational integrity you had in the first case I'd just throw a lightweight wrapper around some dbfile files or similar. -- Richard Huxton Archonet Ltd
"Evandro's mailing lists (Please, don't send personal messages to this address)" <listasjr@gmail.com> writes: > I'm doing a PhD in data mining and I need more than 1600 columns. I don't think so --- consider redesigning your data model instead. For instance, maybe you could combine similar columns into an array. Or split the table into an m:n linking structure. Even coming close to that implementation limit suggests bad SQL design; if we thought it was a realistic problem we would have increased it long ago... regards, tom lane
Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: > I'm doing a PhD in data mining and I need more than 1600 columns. I > got an error message saying that I can not use more than 1600 columns. > > It is happening because I have to change categorical values to > binary creating new columns. Do you know if oracle can handle it? pardon, but as PhD you should be able to do sensible database design. Even if you would have more then 1600 columns, you cannot expect very good performance with it (on nearly any database). I'd strongly recommend to replan your table layout. You can get help here if you provide more information on your plans.
Alex Stapleton schrieb: > > On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: > >> Evandro's mailing lists (Please, don't send personal messages to this >> address) schrieb: >> >>> Hi guys, >>> I would like to know if it is possible to have more than 1600 >>> columns on windows without recompiling postgres. >>> >>> >> I would like to know who on earth needs 1600 columns and even beyond? >> Hint: you can have practically unlimited rows in your n:m table :-) >> > > Well this screams random arbitrary limit to me. Why does this limit > exist? What ever happened to the holy 0,1,infinity triumvirate? I guess it eases implementation and there is no reason to go so high on columns either. The limit could even be lower w/o and hurts but 1600 seems skyrocket high enough (read unlimited :-)
On Tue, 2005-11-08 at 09:45, Tino Wildenhain wrote: > Alex Stapleton schrieb: > > > > On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: > > > >> Evandro's mailing lists (Please, don't send personal messages to this > >> address) schrieb: > >> > >>> Hi guys, > >>> I would like to know if it is possible to have more than 1600 > >>> columns on windows without recompiling postgres. > >>> > >>> > >> I would like to know who on earth needs 1600 columns and even beyond? > >> Hint: you can have practically unlimited rows in your n:m table :-) > >> > > > > Well this screams random arbitrary limit to me. Why does this limit > > exist? What ever happened to the holy 0,1,infinity triumvirate? > > I guess it eases implementation and there is no reason to go so high > on columns either. The limit could even be lower w/o and hurts but > 1600 seems skyrocket high enough (read unlimited :-) I'd have to vote with Tino here. Why worry about an arbitrary limit you should never really be approaching anyway. If a table has more than several dozen columns, you've likely missed some important step of normalization. Once you near 100 columns, something is usually horribly wrong. I cannot imagine having a table that actually needed 1600 or more columns. And, Evandro, nothing is free. If someone went to the trouble of removing the limit of 1600, we'd probably pay in some other way, most likely with poor performance. There are other, far more important features to work on, I'd think.
Tino Wildenhain <tino@wildenhain.de> writes: > Alex Stapleton schrieb: >> Well this screams random arbitrary limit to me. Why does this limit >> exist? What ever happened to the holy 0,1,infinity triumvirate? > I guess it eases implementation and there is no reason to go so high > on columns either. The limit could even be lower w/o and hurts but > 1600 seems skyrocket high enough (read unlimited :-) The rationale is laid out in excruciating detail in src/include/access/htup.h: /* * MaxTupleAttributeNumber limits the number of (user) columns in a tuple. * The key limit on this value is that the size of the fixed overhead for * a tuple, plus the size of the null-values bitmap (at 1 bit per column), * plus MAXALIGN alignment, must fit into t_hoff which is uint8. On most * machines the upper limit without making t_hoff wider would be a little * over 1700. We use round numbers here and for MaxHeapAttributeNumber * so that alterations in HeapTupleHeaderData layout won't change the * supported max number of columns. */ #define MaxTupleAttributeNumber 1664 /* 8 * 208 */ /*---------- * MaxHeapAttributeNumber limits the number of (user) columns in a table. * This should be somewhat less than MaxTupleAttributeNumber. It must be * at least one less, else we will fail to do UPDATEs on a maximal-width * table (because UPDATE has to form working tuples that include CTID). * In practice we want some additional daylight so that we can gracefully * support operations that add hidden "resjunk" columns, for example * SELECT * FROM wide_table ORDER BY foo, bar, baz. * In any case, depending on column data types you will likely be running * into the disk-block-based limit on overall tuple size if you have more * than a thousand or so columns. TOAST won't help. *---------- */ #define MaxHeapAttributeNumber 1600 /* 8 * 200 */ regards, tom lane
>>>>> "Evandro's" == Evandro's mailing lists (Please, don't send personal messages to this address) <listasjr@gmail.com>writes: [I would have replied to your personal address, but I'm not about to copy it from a footer.] Evandro's> I'm doing a PhD in data mining and I need more than 1600 columns. I got an Evandro's> error message saying that I can not use more than 1600 columns. Evandro's> It is happening because I have to change categorical values to binary Evandro's> creating new columns. Do you know if oracle can handle it? /me boggles You are doing a PhD in data mining, and you have a table that needs more than 1600 columns? /me gasps What are they *teaching* these days? If you have a design that has more than 20 or so columns, you're probably already not normalizing properly. There just aren't *that* many attributes of a object before you should start factoring parts of it out, even if it means creating some 1-1 tables. In programming, if I ever see someone name a sequence of variables, like "thing1" and "thing2", I know there's going to be trouble ahead, because that should have been a different data structure. Similarly, I bet some of your columns are "foo1" and "foo2". Signs of brokenness in the design. Or do you really have 1600 *different* attributes, none of which have a number in their name? That requires a serious amount of creativity. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: > Evandro's mailing lists (Please, don't send personal messages to > this address) schrieb: > >> Hi guys, >> I would like to know if it is possible to have more than 1600 >> columns on windows without recompiling postgres. >> >> > I would like to know who on earth needs 1600 columns and even beyond? > Hint: you can have practically unlimited rows in your n:m table :-) > Well this screams random arbitrary limit to me. Why does this limit exist? What ever happened to the holy 0,1,infinity triumvirate?
On 8 Nov 2005, at 16:06, Scott Marlowe wrote: > On Tue, 2005-11-08 at 09:45, Tino Wildenhain wrote: > >> Alex Stapleton schrieb: >> >>> >>> On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: >>> >>> >>>> Evandro's mailing lists (Please, don't send personal messages >>>> to this >>>> address) schrieb: >>>> >>>> >>>>> Hi guys, >>>>> I would like to know if it is possible to have more than 1600 >>>>> columns on windows without recompiling postgres. >>>>> >>>>> >>>>> >>>> I would like to know who on earth needs 1600 columns and even >>>> beyond? >>>> Hint: you can have practically unlimited rows in your n:m table :-) >>>> >>>> >>> >>> Well this screams random arbitrary limit to me. Why does this limit >>> exist? What ever happened to the holy 0,1,infinity triumvirate? >>> >> >> I guess it eases implementation and there is no reason to go so high >> on columns either. The limit could even be lower w/o and hurts but >> 1600 seems skyrocket high enough (read unlimited :-) >> > > I'd have to vote with Tino here. Why worry about an arbitrary > limit you > should never really be approaching anyway. If a table has more than > several dozen columns, you've likely missed some important step of > normalization. Once you near 100 columns, something is usually > horribly > wrong. I cannot imagine having a table that actually needed 1600 or > more columns. > > And, Evandro, nothing is free. If someone went to the trouble of > removing the limit of 1600, we'd probably pay in some other way, most > likely with poor performance. There are other, far more important > features to work on, I'd think. > Oh wait, PG is written in C isn't it. I guess fixed size things are a bit easier to deal with. Pardon me then :)
Re: Beyond the 1600 columns limit on windows
From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
Sorry,
It has nothing to do with normalisation. It is a program for scientific applications.
Data values are broken into column to allow multiple linear regression and multivariate regression trees computations.
Even SPSS the most well-known statistic sw uses the same approach and data structure that my software uses.
Probably I should use another data structure but would not be as eficient and practical as the one I use now.
Many thanks
-Evandro
On 08 Nov 2005 05:30:07 -0800, Randal L. Schwartz <merlyn@stonehenge.com> wrote:
>>>>> "Evandro's" == Evandro's mailing lists (Please, don't send personal messages to this address) < listasjr@gmail.com> writes:
[I would have replied to your personal address, but I'm not about
to copy it from a footer.]
Evandro's> I'm doing a PhD in data mining and I need more than 1600 columns. I got an
Evandro's> error message saying that I can not use more than 1600 columns.
Evandro's> It is happening because I have to change categorical values to binary
Evandro's> creating new columns. Do you know if oracle can handle it?
/me boggles
You are doing a PhD in data mining, and you have a table that needs
more than 1600 columns?
/me gasps
What are they *teaching* these days?
If you have a design that has more than 20 or so columns, you're
probably already not normalizing properly. There just aren't *that*
many attributes of a object before you should start factoring parts of
it out, even if it means creating some 1-1 tables.
In programming, if I ever see someone name a sequence of variables,
like "thing1" and "thing2", I know there's going to be trouble ahead,
because that should have been a different data structure. Similarly,
I bet some of your columns are "foo1" and "foo2". Signs of brokenness
in the design.
Or do you really have 1600 *different* attributes, none of which have
a number in their name? That requires a serious amount of
creativity. :)
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
> >>> > >>>Well this screams random arbitrary limit to me. Why does this limit > >>>exist? What ever happened to the holy 0,1,infinity triumvirate? > >>> > >> > >>I guess it eases implementation and there is no reason to go so high > >>on columns either. The limit could even be lower w/o and hurts but > >>1600 seems skyrocket high enough (read unlimited :-) It is probably what fits in a single block.
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: > Sorry, > It has nothing to do with normalisation. It is a program for scientific > applications. It has everything to do with normalisation. You appear to be pushing application presentation issues into the structure of your database. If SQL allowed you, this would break 1NF. > Data values are broken into column to allow multiple linear regression and > multivariate regression trees computations. Sounds like you want an array then (or perhaps several arrays). > Even SPSS the most well-known statistic sw uses the same approach and data > structure that my software uses. Ah - and they've made a good choice? > Probably I should use another data structure but would not be as eficient > and practical as the one I use now. The structure you use inside your application and the data definition used by the database are two separate things. You presumably are doing some transformation of data on fetching it anyway - I'd switch rows-columns over then. -- Richard Huxton Archonet Ltd
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: > It has nothing to do with normalisation. It is a program for > scientific applications. > Data values are broken into column to allow multiple linear regression > and multivariate regression trees computations. Having done similar things in the past, I wonder if your current DB design includes a column for every feature-value combination: instanceID color=red color=blue color=yellow ... height=71 height=72 ------------------------------------------------- 42 True False False 43 False True False 44 False False True ... This is likely to be extremely sparse, and you might use a sparse representation accordingly. As several folks have suggested, the representation in the database needn't be the same as in your code. > Even SPSS the most well-known statistic sw uses the same approach and > data structure that my software uses. > Probably I should use another data structure but would not be as > eficient and practical as the one I use now. The point is that, if you want to use Postgres, this is not in fact efficient and practical. In fact, it might be the case that mapping from a sparse DB representation to your internal data structures is =more= efficient than naively using the same representation in both places. - John D. Burger MITRE
On Tue, Nov 08, 2005 at 02:14:58PM -0500, John D. Burger wrote: > Evandro's mailing lists (Please, don't send personal messages to this > address) wrote: > > >It has nothing to do with normalisation.? It is a program for > >scientific applications. > >Data?values are broken into column to allow multiple linear regression > >and multivariate regression trees computations. > > Having done similar things in the past, I wonder if your current DB > design includes a column for every feature-value combination: > > instanceID color=red color=blue color=yellow ... height=71 > height=72 > ------------------------------------------------- > 42 True False False > 43 False True False > 44 False False True > ... > > This is likely to be extremely sparse, and you might use a sparse > representation accordingly. As several folks have suggested, the > representation in the database needn't be the same as in your code. > > >Even SPSS?the most well-known statistic sw uses the same approach and > >data structure that my software uses. > >Probably I should use another data structure but would not be as > >eficient and practical as the one I use now. > > The point is that, if you want to use Postgres, this is not in fact > efficient and practical. In fact, it might be the case that mapping > from a sparse DB representation to your internal data structures is > =more= efficient than naively using the same representation in both > places. s/Postgres/just about any database/ BTW, even if you're doing logic in the database that doesn't mean you have to stick with the way you're representing things. There's ways to get the same info via conventional SQL that doesn't involve building a huge crosstab. Something interesting is that the data structure presented here looks a hell of a lot like a bitmap index, something new in 8.1 (well, at least bitmap index scans). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Re: Beyond the 1600 columns limit on windows
From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
Yes it is exactly that. I will follow you advice and create a abstraction layer for the data access that will return the sparse dataset using the standard dataset as input.
There is just one thing I disagree you said it that the performance is not good, right. However, it is practical! Nothing is easier and more practical than keeping the sparse representation inside of the database for my application.
On 11/8/05, John D. Burger <john@mitre.org> wrote:
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:
> It has nothing to do with normalisation. It is a program for
> scientific applications.
> Datavalues are broken into column to allow multiple linear regression
> and multivariate regression trees computations.
Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:
instanceID color=red color=blue color=yellow ... height=71
height=72
-------------------------------------------------
42 True False False
43 False True False
44 False False True
...
This is likely to be extremely sparse, and you might use a sparse
representation accordingly. As several folks have suggested, the
representation in the database needn't be the same as in your code.
> Even SPSSthe most well-known statistic sw uses the same approach and
> data structure that my software uses.
> Probably I should use another data structure but would not be as
> eficient and practical as the one I use now.
The point is that, if you want to use Postgres, this is not in fact
efficient and practical. In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.
- John D. Burger
MITRE
--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages
Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: > Yes it is exactly that. I will follow you advice and create a > abstraction layer for the data access that will return the sparse > dataset using the standard dataset as input. > > There is just one thing I disagree you said it that the performance is > not good, right. However, it is practical! Nothing is easier and more > practical than keeping the sparse representation inside of the database > for my application. I bet even your application would profit from not handling sparse data. You could just "not" insert them into your tree instead of having to jump over empty elements. And there is always a way to lazily abstract the data to some frontend (While I doubt anybody can actuall scroll wide enough on a screen to see all the 1600 colums ;)
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: > Yes it is exactly that. I will follow you advice and create a > abstraction layer for the data access that will return the sparse > dataset using the standard dataset as input. > > There is just one thing I disagree you said it that the performance is > not good, right. However, it is practical! Nothing is easier and more > practical than keeping the sparse representation inside of the > database for my application. I think that you misunderstand the problems that come from doing it that way. The basic issue is this: the point of creating a system using an RDBMS is to separate your data from your application. This allows for various methods of data mining later. If you are presenting information in your DB this way, you are breaking that assumption and so you get very little (if any) benefit from using PostgreSQL instead of something like BDB. Database design is usually about concise and unambiguous representation of data and normalization is a part of this. This allows various applications or ad hoc queries to be able to draw against the data in meaningful ways. The traditional way of representing sparse data is to use a join. CREATE TABLE bird ( id SERIAL PRIMARY KEY, description TEXT, bname TEXT); CREATE TABLE bird_color ( bird_id INT REFERENCES bird (id), color TEXT, is_color BOOL, UNIQUE (bird_id, color)); Now, you can select bname from bird where (select is_color from bird_color where color = 'red' and bird_id = bird.id) The point is not that this makes it easier or harder from the stand point of your application but that it makes it easier to build secondary apps against the same data set, and that it avoids various ambiguities that could result from secondary data entry/analysis apps. People have suggested using an array, and that would be possible as well (undecided whether this breaks first normal form) but this will certainly cause more headache with secondary apps. Please understand that PostgreSQL is designed around a world where these secondary apps inevitably get built while a lot of commerical, off the shelf software assumes that they won't be. This is why many of us question the DB design of these apps. I hope that this helps explain some of the why's of this thread. Best Wishes, Chris Travers Metatron Technology Consulting