Thread: Beyond the 1600 columns limit on windows

Beyond the 1600 columns limit on windows

From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
Hi guys,
 
I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres.
 
Regards
-Evandro

--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages

Re: Beyond the 1600 columns limit on windows

From
Richard Huxton
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:
> Hi guys,
>  I would like to know if it is possible to have more than 1600 columns on
> windows without recompiling postgres.

I don't think so. Are you sure you need more than 1600 columns? That's
many more than I've ever wanted or needed.

If you can share some details of the problem you are trying to solve,
perhaps someone can see a different solution for you.
--
   Richard Huxton
   Archonet Ltd

Re: Beyond the 1600 columns limit on windows

From
Tino Wildenhain
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) schrieb:
> Hi guys,
>
> I would like to know if it is possible to have more than 1600 columns on
> windows without recompiling postgres.
>
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)

Re: Beyond the 1600 columns limit on windows

From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
I'm doing a PhD in data mining and I need more than 1600 columns. I got an error message saying that I can not use more than 1600 columns.
 
It is happening because I have to change categorical values to binary creating new columns. Do you know if oracle can handle it?
 
 
 --
Evandro M Leite Jr.
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Mobile 079 068 70740 Office 023 8055 3644 Home 023 8055 9160

 
On 11/8/05, Tino Wildenhain <tino@wildenhain.de> wrote:
Evandro's mailing lists (Please, don't send personal messages to this
address) schrieb:
> Hi guys,
>
> I would like to know if it is possible to have more than 1600 columns on
> windows without recompiling postgres.
>
I would like to know who on earth needs 1600 columns and even beyond?
Hint: you can have practically unlimited rows in your n:m table :-)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org



--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages

Re: Beyond the 1600 columns limit on windows

From
Richard Huxton
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:
> I'm doing a PhD in data mining and I need more than 1600 columns. I got an
> error message saying that I can not use more than 1600 columns.
>  It is happening because I have to change categorical values to binary
> creating new columns.

Perhaps you don't want a relational database at all if you are
stretching it to match your client application in this way. Do I have it
right that you have something like

Table: bird_sighting_facts (bird, category, value)
   1 | wingspan    | 120mm
   2 | beak-colour | red
   3 | chest-colour| blue
...

And are converting it into:
  expanded_bird_facts (bird, cat_wingspan, cat_beak_colour,
cat_chest_colour, ...)

In which case since you'll almost certainly be throwing away any
relational integrity you had in the first case I'd just throw a
lightweight wrapper around some dbfile files or similar.

--
   Richard Huxton
   Archonet Ltd

Re: Beyond the 1600 columns limit on windows

From
Tom Lane
Date:
"Evandro's mailing lists (Please, don't send personal messages to this address)" <listasjr@gmail.com> writes:
> I'm doing a PhD in data mining and I need more than 1600 columns.

I don't think so --- consider redesigning your data model instead.
For instance, maybe you could combine similar columns into an array.
Or split the table into an m:n linking structure.  Even coming close
to that implementation limit suggests bad SQL design; if we thought
it was a realistic problem we would have increased it long ago...

            regards, tom lane

Re: Beyond the 1600 columns limit on windows

From
Tino Wildenhain
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) schrieb:
> I'm doing a PhD in data mining and I need more than 1600 columns. I
> got an error message saying that I can not use more than 1600 columns.
>
> It is happening because I have to change categorical values to
> binary creating new columns. Do you know if oracle can handle it?

pardon, but as PhD you should be able to do sensible database design.
Even if you would have more then 1600 columns, you cannot expect
very good performance with it (on nearly any database).

I'd strongly recommend to replan your table layout. You can get help
here if you provide more information on your plans.


Re: Beyond the 1600 columns limit on windows

From
Tino Wildenhain
Date:
Alex Stapleton schrieb:
>
> On 8 Nov 2005, at 12:50, Tino Wildenhain wrote:
>
>> Evandro's mailing lists (Please, don't send personal messages to  this
>> address) schrieb:
>>
>>> Hi guys,
>>>  I would like to know if it is possible to have more than 1600
>>> columns on windows without recompiling postgres.
>>>
>>>
>> I would like to know who on earth needs 1600 columns and even beyond?
>> Hint: you can have practically unlimited rows in your n:m table :-)
>>
>
> Well this screams random arbitrary limit to me. Why does this limit
> exist? What ever happened to the holy 0,1,infinity triumvirate?

I guess it eases implementation and there is no reason to go so high
on columns either. The limit could even be lower w/o and hurts but
1600 seems skyrocket high enough (read unlimited :-)


Re: Beyond the 1600 columns limit on windows

From
Scott Marlowe
Date:
On Tue, 2005-11-08 at 09:45, Tino Wildenhain wrote:
> Alex Stapleton schrieb:
> >
> > On 8 Nov 2005, at 12:50, Tino Wildenhain wrote:
> >
> >> Evandro's mailing lists (Please, don't send personal messages to  this
> >> address) schrieb:
> >>
> >>> Hi guys,
> >>>  I would like to know if it is possible to have more than 1600
> >>> columns on windows without recompiling postgres.
> >>>
> >>>
> >> I would like to know who on earth needs 1600 columns and even beyond?
> >> Hint: you can have practically unlimited rows in your n:m table :-)
> >>
> >
> > Well this screams random arbitrary limit to me. Why does this limit
> > exist? What ever happened to the holy 0,1,infinity triumvirate?
>
> I guess it eases implementation and there is no reason to go so high
> on columns either. The limit could even be lower w/o and hurts but
> 1600 seems skyrocket high enough (read unlimited :-)

I'd have to vote with Tino here.  Why worry about an arbitrary limit you
should never really be approaching anyway.  If a table has more than
several dozen columns, you've likely missed some important step of
normalization.  Once you near 100 columns, something is usually horribly
wrong.  I cannot imagine having a table that actually needed 1600 or
more columns.

And, Evandro, nothing is free.  If someone went to the trouble of
removing the limit of 1600, we'd probably pay in some other way, most
likely with poor performance.  There are other, far more important
features to work on, I'd think.

Re: Beyond the 1600 columns limit on windows

From
Tom Lane
Date:
Tino Wildenhain <tino@wildenhain.de> writes:
> Alex Stapleton schrieb:
>> Well this screams random arbitrary limit to me. Why does this limit
>> exist? What ever happened to the holy 0,1,infinity triumvirate?

> I guess it eases implementation and there is no reason to go so high
> on columns either. The limit could even be lower w/o and hurts but
> 1600 seems skyrocket high enough (read unlimited :-)

The rationale is laid out in excruciating detail in src/include/access/htup.h:

/*
 * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
 * The key limit on this value is that the size of the fixed overhead for
 * a tuple, plus the size of the null-values bitmap (at 1 bit per column),
 * plus MAXALIGN alignment, must fit into t_hoff which is uint8.  On most
 * machines the upper limit without making t_hoff wider would be a little
 * over 1700.  We use round numbers here and for MaxHeapAttributeNumber
 * so that alterations in HeapTupleHeaderData layout won't change the
 * supported max number of columns.
 */
#define MaxTupleAttributeNumber 1664    /* 8 * 208 */

/*----------
 * MaxHeapAttributeNumber limits the number of (user) columns in a table.
 * This should be somewhat less than MaxTupleAttributeNumber.  It must be
 * at least one less, else we will fail to do UPDATEs on a maximal-width
 * table (because UPDATE has to form working tuples that include CTID).
 * In practice we want some additional daylight so that we can gracefully
 * support operations that add hidden "resjunk" columns, for example
 * SELECT * FROM wide_table ORDER BY foo, bar, baz.
 * In any case, depending on column data types you will likely be running
 * into the disk-block-based limit on overall tuple size if you have more
 * than a thousand or so columns.  TOAST won't help.
 *----------
 */
#define MaxHeapAttributeNumber    1600    /* 8 * 200 */

            regards, tom lane

Re: Beyond the 1600 columns limit on windows

From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Evandro's" == Evandro's mailing lists (Please, don't send personal messages to this address)
<listasjr@gmail.com>writes: 

[I would have replied to your personal address, but I'm not about
to copy it from a footer.]

Evandro's> I'm doing a PhD in data mining and I need more than 1600 columns. I got an
Evandro's> error message saying that I can not use more than 1600 columns.
Evandro's>  It is happening because I have to change categorical values to binary
Evandro's> creating new columns. Do you know if oracle can handle it?

/me boggles

You are doing a PhD in data mining, and you have a table that needs
more than 1600 columns?

/me gasps

What are they *teaching* these days?

If you have a design that has more than 20 or so columns, you're
probably already not normalizing properly.  There just aren't *that*
many attributes of a object before you should start factoring parts of
it out, even if it means creating some 1-1 tables.

In programming, if I ever see someone name a sequence of variables,
like "thing1" and "thing2", I know there's going to be trouble ahead,
because that should have been a different data structure.  Similarly,
I bet some of your columns are "foo1" and "foo2".  Signs of brokenness
in the design.

Or do you really have 1600 *different* attributes, none of which have
a number in their name?  That requires a serious amount of
creativity. :)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Beyond the 1600 columns limit on windows

From
Alex Stapleton
Date:
On 8 Nov 2005, at 12:50, Tino Wildenhain wrote:

> Evandro's mailing lists (Please, don't send personal messages to
> this address) schrieb:
>
>> Hi guys,
>>  I would like to know if it is possible to have more than 1600
>> columns on windows without recompiling postgres.
>>
>>
> I would like to know who on earth needs 1600 columns and even beyond?
> Hint: you can have practically unlimited rows in your n:m table :-)
>

Well this screams random arbitrary limit to me. Why does this limit
exist? What ever happened to the holy 0,1,infinity triumvirate?

Re: Beyond the 1600 columns limit on windows

From
Alex Stapleton
Date:
On 8 Nov 2005, at 16:06, Scott Marlowe wrote:

> On Tue, 2005-11-08 at 09:45, Tino Wildenhain wrote:
>
>> Alex Stapleton schrieb:
>>
>>>
>>> On 8 Nov 2005, at 12:50, Tino Wildenhain wrote:
>>>
>>>
>>>> Evandro's mailing lists (Please, don't send personal messages
>>>> to  this
>>>> address) schrieb:
>>>>
>>>>
>>>>> Hi guys,
>>>>>  I would like to know if it is possible to have more than 1600
>>>>> columns on windows without recompiling postgres.
>>>>>
>>>>>
>>>>>
>>>> I would like to know who on earth needs 1600 columns and even
>>>> beyond?
>>>> Hint: you can have practically unlimited rows in your n:m table :-)
>>>>
>>>>
>>>
>>> Well this screams random arbitrary limit to me. Why does this limit
>>> exist? What ever happened to the holy 0,1,infinity triumvirate?
>>>
>>
>> I guess it eases implementation and there is no reason to go so high
>> on columns either. The limit could even be lower w/o and hurts but
>> 1600 seems skyrocket high enough (read unlimited :-)
>>
>
> I'd have to vote with Tino here.  Why worry about an arbitrary
> limit you
> should never really be approaching anyway.  If a table has more than
> several dozen columns, you've likely missed some important step of
> normalization.  Once you near 100 columns, something is usually
> horribly
> wrong.  I cannot imagine having a table that actually needed 1600 or
> more columns.
>
> And, Evandro, nothing is free.  If someone went to the trouble of
> removing the limit of 1600, we'd probably pay in some other way, most
> likely with poor performance.  There are other, far more important
> features to work on, I'd think.
>

Oh wait, PG is written in C isn't it. I guess fixed size things are a
bit easier to deal with. Pardon me then :)

Re: Beyond the 1600 columns limit on windows

From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
Sorry,
 
It has nothing to do with normalisation.  It is a program for scientific applications.
Data values are broken into column to allow multiple linear regression and multivariate regression trees computations.
 
Even SPSS the most well-known statistic sw uses the same approach and data structure that my software uses.
Probably I should use another data structure but would not be as eficient and practical as the one I use now.
 
Many thanks
-Evandro

 
On 08 Nov 2005 05:30:07 -0800, Randal L. Schwartz <merlyn@stonehenge.com> wrote:
>>>>> "Evandro's" == Evandro's mailing lists (Please, don't send personal messages to this address) < listasjr@gmail.com> writes:

[I would have replied to your personal address, but I'm not about
to copy it from a footer.]

Evandro's> I'm doing a PhD in data mining and I need more than 1600 columns. I got an
Evandro's> error message saying that I can not use more than 1600 columns.
Evandro's>  It is happening because I have to change categorical values to binary
Evandro's> creating new columns. Do you know if oracle can handle it?

/me boggles

You are doing a PhD in data mining, and you have a table that needs
more than 1600 columns?

/me gasps

What are they *teaching* these days?

If you have a design that has more than 20 or so columns, you're
probably already not normalizing properly.  There just aren't *that*
many attributes of a object before you should start factoring parts of
it out, even if it means creating some 1-1 tables.

In programming, if I ever see someone name a sequence of variables,
like "thing1" and "thing2", I know there's going to be trouble ahead,
because that should have been a different data structure.  Similarly,
I bet some of your columns are "foo1" and "foo2".  Signs of brokenness
in the design.

Or do you really have 1600 *different* attributes, none of which have
a number in their name?  That requires a serious amount of
creativity. :)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages

Re: Beyond the 1600 columns limit on windows

From
Bruno Wolff III
Date:
> >>>
> >>>Well this screams random arbitrary limit to me. Why does this limit
> >>>exist? What ever happened to the holy 0,1,infinity triumvirate?
> >>>
> >>
> >>I guess it eases implementation and there is no reason to go so high
> >>on columns either. The limit could even be lower w/o and hurts but
> >>1600 seems skyrocket high enough (read unlimited :-)

It is probably what fits in a single block.

Re: Beyond the 1600 columns limit on windows

From
Richard Huxton
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:
> Sorry,
>  It has nothing to do with normalisation. It is a program for scientific
> applications.

It has everything to do with normalisation. You appear to be pushing
application presentation issues into the structure of your database. If
SQL allowed you, this would break 1NF.

> Data values are broken into column to allow multiple linear regression and
> multivariate regression trees computations.

Sounds like you want an array then (or perhaps several arrays).

>  Even SPSS the most well-known statistic sw uses the same approach and data
> structure that my software uses.

Ah - and they've made a good choice?

> Probably I should use another data structure but would not be as eficient
> and practical as the one I use now.

The structure you use inside your application and the data definition
used by the database are two separate things. You presumably are doing
some transformation of data on fetching it anyway - I'd switch
rows-columns over then.

--
   Richard Huxton
   Archonet Ltd

Re: Beyond the 1600 columns limit on windows

From
"John D. Burger"
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:

> It has nothing to do with normalisation.  It is a program for
> scientific applications.
> Data values are broken into column to allow multiple linear regression
> and multivariate regression trees computations.

Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:

instanceID  color=red  color=blue  color=yellow  ...  height=71
height=72
-------------------------------------------------
42           True          False       False
43           False     True        False
44           False     False       True
...

This is likely to be extremely sparse, and you might use a sparse
representation accordingly.  As several folks have suggested, the
representation in the database needn't be the same as in your code.

> Even SPSS the most well-known statistic sw uses the same approach and
> data structure that my software uses.
> Probably I should use another data structure but would not be as
> eficient and practical as the one I use now.

The point is that, if you want to use Postgres, this is not in fact
efficient and practical.  In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.

- John D. Burger
   MITRE

Re: Beyond the 1600 columns limit on windows

From
"Jim C. Nasby"
Date:
On Tue, Nov 08, 2005 at 02:14:58PM -0500, John D. Burger wrote:
> Evandro's mailing lists (Please, don't send personal messages to this
> address) wrote:
>
> >It has nothing to do with normalisation.? It is a program for
> >scientific applications.
> >Data?values are broken into column to allow multiple linear regression
> >and multivariate regression trees computations.
>
> Having done similar things in the past, I wonder if your current DB
> design includes a column for every feature-value combination:
>
> instanceID  color=red  color=blue  color=yellow  ...  height=71
> height=72
> -------------------------------------------------
> 42           True          False       False
> 43           False     True        False
> 44           False     False       True
> ...
>
> This is likely to be extremely sparse, and you might use a sparse
> representation accordingly.  As several folks have suggested, the
> representation in the database needn't be the same as in your code.
>
> >Even SPSS?the most well-known statistic sw uses the same approach and
> >data structure that my software uses.
> >Probably I should use another data structure but would not be as
> >eficient and practical as the one I use now.
>
> The point is that, if you want to use Postgres, this is not in fact
> efficient and practical.  In fact, it might be the case that mapping
> from a sparse DB representation to your internal data structures is
> =more= efficient than naively using the same representation in both
> places.

s/Postgres/just about any database/

BTW, even if you're doing logic in the database that doesn't mean you
have to stick with the way you're representing things. There's ways to
get the same info via conventional SQL that doesn't involve building a
huge crosstab.

Something interesting is that the data structure presented here looks a
hell of a lot like a bitmap index, something new in 8.1 (well, at least
bitmap index scans).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Beyond the 1600 columns limit on windows

From
"Evandro's mailing lists (Please, don't send personal messages to this address)"
Date:
Yes it is exactly that. I will follow you advice and create a abstraction layer for the data access that will return the sparse dataset using the standard dataset as input.
 
There is just one thing I disagree you said it that the performance is not good, right. However, it is practical! Nothing is easier and more practical than keeping the sparse representation inside of the database for my application.

 
On 11/8/05, John D. Burger <john@mitre.org> wrote:
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:

> It has nothing to do with normalisation. It is a program for
> scientific applications.
> Datavalues are broken into column to allow multiple linear regression
> and multivariate regression trees computations.

Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:

instanceID  color=red  color=blue  color=yellow  ...  height=71
height=72
-------------------------------------------------
42           True       False       False
43           False     True        False
44           False     False       True
...

This is likely to be extremely sparse, and you might use a sparse
representation accordingly.  As several folks have suggested, the
representation in the database needn't be the same as in your code.

> Even SPSSthe most well-known statistic sw uses the same approach and
> data structure that my software uses.
> Probably I should use another data structure but would not be as
> eficient and practical as the one I use now.

The point is that, if you want to use Postgres, this is not in fact
efficient and practical.  In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.

- John D. Burger
  MITRE



--
Evandro M Leite Jr
PhD Student & Software developer
University of Southampton, UK
Personal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandro
Please, use Jr(at)evandro.org for personal messages

Re: Beyond the 1600 columns limit on windows

From
Tino Wildenhain
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) schrieb:
> Yes it is exactly that. I will follow you advice and create a
> abstraction layer for the data access that will return the sparse
> dataset using the standard dataset as input.
>
> There is just one thing I disagree you said it that the performance is
> not good, right. However, it is practical! Nothing is easier and more
> practical than keeping the sparse representation inside of the database
> for my application.

I bet even your application would profit from not handling sparse data.
You could just "not" insert them into your tree instead of having
to jump over empty elements.

And there is always a way to lazily abstract the data to some
frontend (While I doubt anybody can actuall scroll wide enough on
a screen to see all the 1600 colums ;)


Re: Beyond the 1600 columns limit on windows

From
Chris Travers
Date:
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:

> Yes it is exactly that. I will follow you advice and create a
> abstraction layer for the data access that will return the sparse
> dataset using the standard dataset as input.
>
> There is just one thing I disagree you said it that the performance is
> not good, right. However, it is practical! Nothing is easier and more
> practical than keeping the sparse representation inside of the
> database for my application.

I think that you misunderstand the problems that come from doing it that
way.

The basic issue is this:  the point of creating a system using an RDBMS
is to separate your data from your application.  This allows for various
methods of data mining later.  If you are presenting information in your
DB this way, you are breaking that assumption and so you get very little
(if any) benefit from using PostgreSQL instead of something like BDB.
Database design is usually about concise and unambiguous representation
of data and normalization is a part of this.  This allows various
applications or ad hoc queries to be able to draw against the data in
meaningful ways.

The traditional way of representing sparse data is to use a join.

CREATE TABLE bird (
id SERIAL PRIMARY KEY,
description TEXT,
bname TEXT);

CREATE TABLE bird_color (
bird_id INT REFERENCES bird (id),
color TEXT,
is_color BOOL,
UNIQUE (bird_id, color));

Now, you can select bname from bird where (select is_color from
bird_color where color = 'red' and bird_id = bird.id)

The point is not that this makes it easier or harder from the stand
point of your application but that it makes it easier to build secondary
apps against the same data set, and that it avoids various ambiguities
that could result from secondary data entry/analysis apps.

People have suggested using an array, and that would be possible as well
(undecided whether this breaks first normal form) but this will
certainly cause more headache with secondary apps.  Please understand
that PostgreSQL is designed around a world where these secondary apps
inevitably get built while a lot of commerical, off the shelf software
assumes that they won't be.  This is why many of us question the DB
design of these apps.

I hope that this helps explain some of the why's of this thread.

Best Wishes,
Chris Travers
Metatron Technology Consulting