Thread: More then 1600 columns?

More then 1600 columns?

From
"Mark Mitchell"
Date:

Greeting PostgreSQL Gurus…

 

From my goggling I know this has been discussed before and from what I have read it seems that the consensus is you can re-compile postgres and increase the block size from 8k to 16k or 32k to double or quadruple this limit.

I re-compiled the latest 9.0.1 source rpm on CentOS 5 x64 setting --with-blocksize=32 but when I tested it I still get the error "Error:  tables can have at most 1600 columns"

 

pg_controldata verifies  the block size is

 

"Database block size:                         32768"

 

I also tried to increate the WAL block size to see if that had any effect but it does not. So is what I have read wrong? Is there are hard limit of 1600 that you cannot get around?

 

- Mark

Re: More then 1600 columns?

From
Tom Lane
Date:
"Mark Mitchell" <mmitchell@riccagroup.com> writes:
> Is there are hard limit of 1600 that you cannot get around?

Yes.

Generally, wanting more than a few dozen columns is a good sign that you
need to rethink your schema design.  What are you trying to accomplish
exactly?

            regards, tom lane

Re: More then 1600 columns?

From
John R Pierce
Date:
On 11/11/10 9:24 PM, Tom Lane wrote:
> "Mark Mitchell"<mmitchell@riccagroup.com>  writes:
>> Is there are hard limit of 1600 that you cannot get around?
> Yes.
>
> Generally, wanting more than a few dozen columns is a good sign that you
> need to rethink your schema design.  What are you trying to accomplish
> exactly?
>

indeed.    I'd say a good read on 'data normalization' and the Third
Normal Form would be in order.

relational databases are *not* spreadsheets (and, for that matter,
spreadsheets make lousy relational databases)

if these 1600+ elements come from an ORM, you probably need to rethink
your object model, as no sane object class should have that many members.



Re: More then 1600 columns?

From
Dmitriy Igrishin
Date:
Hey Mark,

Yeah, I can't imagine an entity in a real project even with more than 100
columns. Its rare case.
But if you entities (rows/tuples) of some class (table) can contains variable
set of columns (properties) you can look at hstore contrib module.

2010/11/12 John R Pierce <pierce@hogranch.com>
On 11/11/10 9:24 PM, Tom Lane wrote:
"Mark Mitchell"<mmitchell@riccagroup.com>  writes:
Is there are hard limit of 1600 that you cannot get around?
Yes.

Generally, wanting more than a few dozen columns is a good sign that you
need to rethink your schema design.  What are you trying to accomplish
exactly?
       

indeed.    I'd say a good read on 'data normalization' and the Third Normal Form would be in order.

relational databases are *not* spreadsheets (and, for that matter, spreadsheets make lousy relational databases)

if these 1600+ elements come from an ORM, you probably need to rethink your object model, as no sane object class should have that many members.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
// Dmitriy.


Re: More then 1600 columns?

From
Peter Bex
Date:
On Fri, Nov 12, 2010 at 10:17:50AM +0300, Dmitriy Igrishin wrote:
> Hey Mark,
>
> Yeah, I can't imagine an entity in a real project even with more than 100
> columns. Its rare case.
> But if you entities (rows/tuples) of some class (table) can contains
> variable
> set of columns (properties) you can look at hstore contrib module.

What can also work extremely well is storing the data in an array.
If you need to access the array based on more meaningful keys you could
store key/index pairs in another table.

This approach only works well if you have multiple arrays with the same
layout.  You probably also need to build up your query dynamically if
you need to access variable numbers of datapoints.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: More then 1600 columns?

From
Dmitriy Igrishin
Date:
Hey Peter,

Unfortunately, there is no indexes on arrays (only on expressions).
With hstore we can easily create GiST index for effective access.

2010/11/12 Peter Bex <Peter.Bex@xs4all.nl>
On Fri, Nov 12, 2010 at 10:17:50AM +0300, Dmitriy Igrishin wrote:
> Hey Mark,
>
> Yeah, I can't imagine an entity in a real project even with more than 100
> columns. Its rare case.
> But if you entities (rows/tuples) of some class (table) can contains
> variable
> set of columns (properties) you can look at hstore contrib module.

What can also work extremely well is storing the data in an array.
If you need to access the array based on more meaningful keys you could
store key/index pairs in another table.

This approach only works well if you have multiple arrays with the same
layout.  You probably also need to build up your query dynamically if
you need to access variable numbers of datapoints.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                       -- Donald Knuth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
// Dmitriy.


Re: More then 1600 columns?

From
Peter Bex
Date:
On Fri, Nov 12, 2010 at 10:43:14AM +0300, Dmitriy Igrishin wrote:
> Hey Peter,
>
> Unfortunately, there is no indexes on arrays (only on expressions).
> With hstore we can easily create GiST index for effective access.

True. In my project I only ever needed to search on a particular key,
and I made sure that that key always had a fixed position in the array.
You can then create an index on the expression that extracts that
index from the array.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: More then 1600 columns?

From
Pavel Stehule
Date:
2010/11/12 Dmitriy Igrishin <dmitigr@gmail.com>:
> Hey Peter,
>
> Unfortunately, there is no indexes on arrays (only on expressions).

there are indexes on arrays - you can use a GiST, Gin indexes.

regards

Pavel


> With hstore we can easily create GiST index for effective access.
>
> 2010/11/12 Peter Bex <Peter.Bex@xs4all.nl>
>>
>> On Fri, Nov 12, 2010 at 10:17:50AM +0300, Dmitriy Igrishin wrote:
>> > Hey Mark,
>> >
>> > Yeah, I can't imagine an entity in a real project even with more than
>> > 100
>> > columns. Its rare case.
>> > But if you entities (rows/tuples) of some class (table) can contains
>> > variable
>> > set of columns (properties) you can look at hstore contrib module.
>>
>> What can also work extremely well is storing the data in an array.
>> If you need to access the array based on more meaningful keys you could
>> store key/index pairs in another table.
>>
>> This approach only works well if you have multiple arrays with the same
>> layout.  You probably also need to build up your query dynamically if
>> you need to access variable numbers of datapoints.
>>
>> Cheers,
>> Peter
>> --
>> http://sjamaan.ath.cx
>> --
>> "The process of preparing programs for a digital computer
>>  is especially attractive, not only because it can be economically
>>  and scientifically rewarding, but also because it can be an aesthetic
>>  experience much like composing poetry or music."
>>                                                        -- Donald Knuth
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> // Dmitriy.
>
>
>

Re: More then 1600 columns?

From
Dmitriy Igrishin
Date:
Hey Pavel,

Really ? I am sorry, I'll try !

2010/11/12 Pavel Stehule <pavel.stehule@gmail.com>
2010/11/12 Dmitriy Igrishin <dmitigr@gmail.com>:
> Hey Peter,
>
> Unfortunately, there is no indexes on arrays (only on expressions).

there are indexes on arrays - you can use a GiST, Gin indexes.

regards

Pavel


> With hstore we can easily create GiST index for effective access.
>
> 2010/11/12 Peter Bex <Peter.Bex@xs4all.nl>
>>
>> On Fri, Nov 12, 2010 at 10:17:50AM +0300, Dmitriy Igrishin wrote:
>> > Hey Mark,
>> >
>> > Yeah, I can't imagine an entity in a real project even with more than
>> > 100
>> > columns. Its rare case.
>> > But if you entities (rows/tuples) of some class (table) can contains
>> > variable
>> > set of columns (properties) you can look at hstore contrib module.
>>
>> What can also work extremely well is storing the data in an array.
>> If you need to access the array based on more meaningful keys you could
>> store key/index pairs in another table.
>>
>> This approach only works well if you have multiple arrays with the same
>> layout.  You probably also need to build up your query dynamically if
>> you need to access variable numbers of datapoints.
>>
>> Cheers,
>> Peter
>> --
>> http://sjamaan.ath.cx
>> --
>> "The process of preparing programs for a digital computer
>>  is especially attractive, not only because it can be economically
>>  and scientifically rewarding, but also because it can be an aesthetic
>>  experience much like composing poetry or music."
>>                                                        -- Donald Knuth
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> // Dmitriy.
>
>
>



--
// Dmitriy.


Re: More then 1600 columns?

From
Dmitriy Igrishin
Date:
Well, it's possible to create GIN indexes on arrays of built-in data types
without extra efforts, because of "GIN" access method has defaul
operator class. But it's not true for GiST:

SELECT am.amname AS index_method, opc.opcname AS opclass_name
FROM pg_am am, pg_opclass opc
WHERE opc.opcmethod = am.oid AND am.amname = 'gist'
ORDER BY index_method, opclass_name;

 gist         | box_ops
 gist         | circle_ops
 gist         | point_ops
 gist         | poly_ops
 gist         | tsquery_ops
 gist         | tsvector_ops

With hstore I don't need to care about it at all.

But thank you very much to points me about indexes on arrays !

2010/11/12 Pavel Stehule <pavel.stehule@gmail.com>
2010/11/12 Dmitriy Igrishin <dmitigr@gmail.com>:
> Hey Peter,
>
> Unfortunately, there is no indexes on arrays (only on expressions).

there are indexes on arrays - you can use a GiST, Gin indexes.

regards

Pavel


> With hstore we can easily create GiST index for effective access.
>
> 2010/11/12 Peter Bex <Peter.Bex@xs4all.nl>
>>
>> On Fri, Nov 12, 2010 at 10:17:50AM +0300, Dmitriy Igrishin wrote:
>> > Hey Mark,
>> >
>> > Yeah, I can't imagine an entity in a real project even with more than
>> > 100
>> > columns. Its rare case.
>> > But if you entities (rows/tuples) of some class (table) can contains
>> > variable
>> > set of columns (properties) you can look at hstore contrib module.
>>
>> What can also work extremely well is storing the data in an array.
>> If you need to access the array based on more meaningful keys you could
>> store key/index pairs in another table.
>>
>> This approach only works well if you have multiple arrays with the same
>> layout.  You probably also need to build up your query dynamically if
>> you need to access variable numbers of datapoints.
>>
>> Cheers,
>> Peter
>> --
>> http://sjamaan.ath.cx
>> --
>> "The process of preparing programs for a digital computer
>>  is especially attractive, not only because it can be economically
>>  and scientifically rewarding, but also because it can be an aesthetic
>>  experience much like composing poetry or music."
>>                                                        -- Donald Knuth
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> // Dmitriy.
>
>
>



--
// Dmitriy.


Re: More then 1600 columns?

From
Pavel Stehule
Date:
2010/11/12 Dmitriy Igrishin <dmitigr@gmail.com>:
> Well, it's possible to create GIN indexes on arrays of built-in data types
> without extra efforts, because of "GIN" access method has defaul
> operator class. But it's not true for GiST:

now you have a true :)

you need to install intarray contrib module for GiST

Pavel

>
> SELECT am.amname AS index_method, opc.opcname AS opclass_name
> FROM pg_am am, pg_opclass opc
> WHERE opc.opcmethod = am.oid AND am.amname = 'gist'
> ORDER BY index_method, opclass_name;
>
>  gist         | box_ops
>  gist         | circle_ops
>  gist         | point_ops
>  gist         | poly_ops
>  gist         | tsquery_ops
>  gist         | tsvector_ops
>
> With hstore I don't need to care about it at all.
>
> But thank you very much to points me about indexes on arrays !
>
> 2010/11/12 Pavel Stehule <pavel.stehule@gmail.com>
>>
>> 2010/11/12 Dmitriy Igrishin <dmitigr@gmail.com>:
>> > Hey Peter,
>> >
>> > Unfortunately, there is no indexes on arrays (only on expressions).
>>
>> there are indexes on arrays - you can use a GiST, Gin indexes.
>>
>> regards
>>
>> Pavel
>>
>>
>> > With hstore we can easily create GiST index for effective access.
>> >
>> > 2010/11/12 Peter Bex <Peter.Bex@xs4all.nl>
>> >>
>> >> On Fri, Nov 12, 2010 at 10:17:50AM +0300, Dmitriy Igrishin wrote:
>> >> > Hey Mark,
>> >> >
>> >> > Yeah, I can't imagine an entity in a real project even with more than
>> >> > 100
>> >> > columns. Its rare case.
>> >> > But if you entities (rows/tuples) of some class (table) can contains
>> >> > variable
>> >> > set of columns (properties) you can look at hstore contrib module.
>> >>
>> >> What can also work extremely well is storing the data in an array.
>> >> If you need to access the array based on more meaningful keys you could
>> >> store key/index pairs in another table.
>> >>
>> >> This approach only works well if you have multiple arrays with the same
>> >> layout.  You probably also need to build up your query dynamically if
>> >> you need to access variable numbers of datapoints.
>> >>
>> >> Cheers,
>> >> Peter
>> >> --
>> >> http://sjamaan.ath.cx
>> >> --
>> >> "The process of preparing programs for a digital computer
>> >>  is especially attractive, not only because it can be economically
>> >>  and scientifically rewarding, but also because it can be an aesthetic
>> >>  experience much like composing poetry or music."
>> >>                                                        -- Donald Knuth
>> >>
>> >> --
>> >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-general
>> >
>> >
>> >
>> > --
>> > // Dmitriy.
>> >
>> >
>> >
>
>
>
> --
> // Dmitriy.
>
>
>

Re: More then 1600 columns?

From
John R Pierce
Date:
I looked up the OP's domain.    They develop medical research data
analysis software.


That sort of software and the kinds of data analysis they do tortures
SQL databases.   These 1600+ element rows are likely very sparse.



Re: More then 1600 columns?

From
Thomas Kellerer
Date:
Peter Bex, 12.11.2010 08:36:
> What can also work extremely well is storing the data in an array.
> If you need to access the array based on more meaningful keys you could
> store key/index pairs in another table.
>

The hstore module would also be a viable alternative - and it's indexable as well.

Thomas

Re: More then 1600 columns?

From
Peter Bex
Date:
On Fri, Nov 12, 2010 at 10:18:31AM +0100, Thomas Kellerer wrote:
> The hstore module would also be a viable alternative - and it's indexable
> as well.

That's what the post I replied to mentioned :)

Also, when I looked at hstore, it had a limitation on its length which
made it a little dangerous to use for me:
"In the current implementation, neither the key nor the value string
 can exceed 65535 bytes in length"
[http://www.postgresql.org/docs/8.4/interactive/hstore.html]

It looks like this limitation has been lifted in 9.0.  Good news!

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: More then 1600 columns?

From
Dmitriy Igrishin
Date:
Sounds like semi-structured data handling. For this tasks hstore would be ideal, IMO.

2010/11/12 John R Pierce <pierce@hogranch.com>
I looked up the OP's domain.    They develop medical research data analysis software.


That sort of software and the kinds of data analysis they do tortures SQL databases.   These 1600+ element rows are likely very sparse.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
// Dmitriy.


Re: More then 1600 columns?

From
"Mark Mitchell"
Date:
Yes I understand that this is "bad design" but what we are doing is storing each form field in a survey in its own
column.For very long surveys we end up with thousands of elements.  
I know storing in an array is possible but it makes it so much easier to query the data set when each element is in its
ownfield. I had lots of comments on why I should not do this and the possible alternatives and I thank everyone for
theirinput but no one answered the question about compiling with a higher block size to get more columns. Can anyone
answerthat? 

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, November 12, 2010 12:24 AM
To: Mark Mitchell
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] More then 1600 columns?

"Mark Mitchell" <mmitchell@riccagroup.com> writes:
> Is there are hard limit of 1600 that you cannot get around?

Yes.

Generally, wanting more than a few dozen columns is a good sign that you
need to rethink your schema design.  What are you trying to accomplish
exactly?

            regards, tom lane


Re: More then 1600 columns?

From
"Mark Mitchell"
Date:

hstore does look like it could work out better then a mother load of columns like we have now so long as the index support is good and there are no limits on value length.

It seems that the limit was lifted in 9.0? Is that true?

I was not aware of hstore when we started using postgres, that’s for the info!

 

And yes we do data analysis that tortures SQL, but SQL allows us to do so many things quickly and less painfully. Better to torture the machines then torture ourselves….

 

- Mark

 

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Dmitriy Igrishin
Sent: Friday, November 12, 2010 4:50 AM
To: John R Pierce
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] More then 1600 columns?

 

Sounds like semi-structured data handling. For this tasks hstore would be ideal, IMO.

2010/11/12 John R Pierce <pierce@hogranch.com>

I looked up the OP's domain.    They develop medical research data analysis software.


That sort of software and the kinds of data analysis they do tortures SQL databases.   These 1600+ element rows are likely very sparse.



--

Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
// Dmitriy.

Re: More then 1600 columns?

From
Rob Sargent
Date:

On 11/12/2010 08:38 AM, Mark Mitchell wrote:
> Yes I understand that this is "bad design" but what we are doing is storing each form field in a survey in its own
column.For very long surveys we end up with thousands of elements.  
> I know storing in an array is possible but it makes it so much easier to query the data set when each element is in
itsown field. I had lots of comments on why I should not do this and the possible alternatives and I thank everyone for
theirinput but no one answered the question about compiling with a higher block size to get more columns. Can anyone
answerthat? 
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, November 12, 2010 12:24 AM
> To: Mark Mitchell
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] More then 1600 columns?
>
> "Mark Mitchell" <mmitchell@riccagroup.com> writes:
>> Is there are hard limit of 1600 that you cannot get around?
>
> Yes.
>
> Generally, wanting more than a few dozen columns is a good sign that you
> need to rethink your schema design.  What are you trying to accomplish
> exactly?
>
>             regards, tom lane
>
>
You can answer this yourself.  Save chunks of the survey each in their
own table all keyed with a single id.  I'm betting you don't write all
1600 fields at once (or your willing to seriously piss-off the data
entry staff when stuff happens trying to save the last "page"). select *
from table1, table 2 ... where table1.id = table2.id and table2.id =
table3.id ....  Then you don't have to ensure that you custom postgres
is "everywhere you want to be".

Re: More then 1600 columns?

From
Tom Lane
Date:
"Mark Mitchell" <mmitchell@riccagroup.com> writes:
> I know storing in an array is possible but it makes it so much easier to query the data set when each element is in
itsown field. I had lots of comments on why I should not do this and the possible alternatives and I thank everyone for
theirinput but no one answered the question about compiling with a higher block size to get more columns. Can anyone
answerthat? 

Yes, I did answer it: there is no such compilation option.

If you were willing to run a very nonstandard version of Postgres, you
could try widening t_hoff (see src/include/access/htup.h) but there is
nobody who can tell you what the fallout from that might be.  One big
concern that I would have is the likelihood of O(N^2) behavior on very
long query targetlists.

On the whole I think you'd be a lot better off looking into hstore,
especially the improved 9.0 version.

            regards, tom lane

Re: More then 1600 columns?

From
"Mark Mitchell"
Date:
Apologizes Tom I did not see that you had answered yes to my question about the hard limit.
You have all been very helpful, I will give up on the 1600+ columns and look into using hstore.

Cheers

- Mark

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Friday, November 12, 2010 11:09 AM
To: Mark Mitchell
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] More then 1600 columns?

"Mark Mitchell" <mmitchell@riccagroup.com> writes:
> I know storing in an array is possible but it makes it so much easier to query the data set when each element is in
itsown field. I had lots of comments on why I should not do this and the possible alternatives and I thank everyone for
theirinput but no one answered the question about compiling with a higher block size to get more columns. Can anyone
answerthat? 

Yes, I did answer it: there is no such compilation option.

If you were willing to run a very nonstandard version of Postgres, you
could try widening t_hoff (see src/include/access/htup.h) but there is
nobody who can tell you what the fallout from that might be.  One big
concern that I would have is the likelihood of O(N^2) behavior on very
long query targetlists.

On the whole I think you'd be a lot better off looking into hstore,
especially the improved 9.0 version.

            regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: More then 1600 columns?

From
"Clark C. Evans"
Date:
On Fri, 12 Nov 2010, Tom Lane wrote:
> Generally, wanting more than a few dozen columns is a
> good sign that you need to rethink your schema design.
> What are you trying to accomplish exactly?

Generally speaking, yes.  However, survey instruments are a
very legitimate design where this is not only correct, but
quite desirable.  Sure -- in an ideal world you might be
able to turn your instrument into something that is a highly
normalized structure. However, it's simply not practical.

PostgreSQL absolutely should look into supporting this.  While
the storage may not be ideal, it is an increasingly common case.
Perhaps the storage engine could permit a set of columns to be
tagged for ``sparse column storage``?

On Fri, 12 Nov 2010, Mark Michell wrote:
> Yes I understand that this is "bad design" but what we are doing is
> storing each form field in a survey in its own column. For very long
> surveys we end up with thousands of elements.

It's *not* necessarily bad design.  If you've got a survey
instrument that has 1000 unique data points, it is a reasonable
thing to do.  We (Prometheus Research) have these sorts of
things all the times.

#1  You want to store and query them based on specific fields.

#2  You want to enforce the data integrity of each column
    by using types, enums and other techniques.

#3  You want to integrate this structure with the other
    aspects of your database.

> I know storing in an array is possible but it makes it
> so much easier to query the data set when each element
> is in its own field.  comments on why I should not do this

Using HSTORE or using an EAV table completely undermines
the ability to ensure column-level typing, constraints,
and... use SQL.  With arrays / EAV table structure, you
end up having to re-invent your query language. It is a
non-solution.

> the possible alternatives

The work-around we use is to break our instrument into
one table per ``section``.  We also save each section as the
user completes that part of the instrument.  This has the
advantage of allowing you to model repeating sections.  We
typically enforce column uniqueness across all sections.
This requires your application and query tool to be a bit
smart about making a virtual table with all 1000+ columns.
Kinda sucks, but it works.

...

I guess the other option is to look at a RDF data store where
you use SPARQL to query the data.  OpenJena SDB is implemented
as a 3-table storage in PostgreSQL if you wish to keep the same
database for both your assessment storage and other tables. I'm
not lovin' this option, but it's on the table for our company.

Instead, I'd absolutely love to see PostgreSQL add a variant
scalar type (for when the type of a column changes over the
lifetime of the instrument) and a column storage for Nth+
and additional columns.  For now, the work-around above sort
of works for us.

Best,

Clark

Re: More then 1600 columns?

From
Dann Corbit
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Clark C. Evans
> Sent: Friday, November 12, 2010 12:25 PM
> To: Mark Mitchell; 'Tom Lane'
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] More then 1600 columns?
>
> On Fri, 12 Nov 2010, Tom Lane wrote:
> > Generally, wanting more than a few dozen columns is a
> > good sign that you need to rethink your schema design.
> > What are you trying to accomplish exactly?
>
> Generally speaking, yes.  However, survey instruments are a
> very legitimate design where this is not only correct, but
> quite desirable.  Sure -- in an ideal world you might be
> able to turn your instrument into something that is a highly
> normalized structure. However, it's simply not practical.

Two tables with a foreign key is not practical?

> PostgreSQL absolutely should look into supporting this.  While
> the storage may not be ideal, it is an increasingly common case.
> Perhaps the storage engine could permit a set of columns to be
> tagged for ``sparse column storage``?

Reminds me of a quote from the movie Pinocchio:
"Give a boy enough rope and he'll hang himself."
Is a fundamental design change for the purpose of allowing massive denormalization really a good idea?

> On Fri, 12 Nov 2010, Mark Michell wrote:
> > Yes I understand that this is "bad design" but what we are doing is
> > storing each form field in a survey in its own column. For very long
> > surveys we end up with thousands of elements.
>
> It's *not* necessarily bad design.  If you've got a survey
> instrument that has 1000 unique data points, it is a reasonable
> thing to do.  We (Prometheus Research) have these sorts of
> things all the times.
>
> #1  You want to store and query them based on specific fields.

Use a join

> #2  You want to enforce the data integrity of each column
>     by using types, enums and other techniques.

Non-sequiter

> #3  You want to integrate this structure with the other
>     aspects of your database.

Foreign keys

> > I know storing in an array is possible but it makes it
> > so much easier to query the data set when each element
> > is in its own field.  comments on why I should not do this
>
> Using HSTORE or using an EAV table completely undermines
> the ability to ensure column-level typing, constraints,
> and... use SQL.  With arrays / EAV table structure, you
> end up having to re-invent your query language. It is a
> non-solution.
>
> > the possible alternatives
>
> The work-around we use is to break our instrument into
> one table per ``section``.  We also save each section as the
> user completes that part of the instrument.  This has the
> advantage of allowing you to model repeating sections.  We
> typically enforce column uniqueness across all sections.
> This requires your application and query tool to be a bit
> smart about making a virtual table with all 1000+ columns.
> Kinda sucks, but it works.
>
> ...
>
> I guess the other option is to look at a RDF data store where
> you use SPARQL to query the data.  OpenJena SDB is implemented
> as a 3-table storage in PostgreSQL if you wish to keep the same
> database for both your assessment storage and other tables. I'm
> not lovin' this option, but it's on the table for our company.
>
> Instead, I'd absolutely love to see PostgreSQL add a variant
> scalar type (for when the type of a column changes over the
> lifetime of the instrument) and a column storage for Nth+
> and additional columns.  For now, the work-around above sort
> of works for us.

Sometimes it is a good idea to denormalize.  For instance, performance might improve if you put bill_to_address and
ship_to_addressin the customer record instead of joining against an address table.  But if you have thousands of
differentchild types, it is a good idea to ask yourself about the value of denormalization verses the cost.  There is a
reasonthat BCNF was invented in 1974. 

Normally, I am all for the idea of making a database more capable.  But if that design change encourages terrible
practiceI can think of better ways to expend programmer time and effort (namely on changes that encourage good practice
andincreased productivity). 

I would be very curious to hear of a real case where there is an advantage to having many thousands of columns in a
tableverses using a child table.  I have never seen such a thing in real life.  I am not opposed to the idea that such
athing is possible.  It is just that I have always found the child table is simply better.  If (for access) the single
tableseems simpler, then a view can be used. 

IMO-YMMV



Re: More then 1600 columns?

From
"Clark C. Evans"
Date:
On Fri, 12 Nov 2010 21:10 +0000, "Dann Corbit" wrote:
> If (for access) the single table seems simpler, then
> a view can be used.

Even if you "partition" the columns in the instrument
over N tables, you still can't query it in a single
result set.   The limit is quite deep in PostgreSQL
and extends to tuples, including views and in-memory
query results.

I find that partitioning does work, but it requires extra
care on the part of the application developer that really
shouldn't be necessary.

Best,

Clark

Re: More then 1600 columns?

From
Dmitriy Igrishin
Date:
I can't imagine how to maintain a database with tables with
1600 columns... I can't imagine how to simple work with this
garbage of data via SQL...

2010/11/13 Clark C. Evans <cce@clarkevans.com>
On Fri, 12 Nov 2010 21:10 +0000, "Dann Corbit" wrote:
> If (for access) the single table seems simpler, then
> a view can be used.

Even if you "partition" the columns in the instrument
over N tables, you still can't query it in a single
result set.   The limit is quite deep in PostgreSQL
and extends to tuples, including views and in-memory
query results.

I find that partitioning does work, but it requires extra
care on the part of the application developer that really
shouldn't be necessary.

Best,

Clark

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
// Dmitriy.


Re: More then 1600 columns?

From
Rob Sargent
Date:

On 11/12/2010 02:25 PM, Clark C. Evans wrote:
> On Fri, 12 Nov 2010 21:10 +0000, "Dann Corbit" wrote:
>> If (for access) the single table seems simpler, then
>> a view can be used.
>
> Even if you "partition" the columns in the instrument
> over N tables, you still can't query it in a single
> result set.   The limit is quite deep in PostgreSQL
> and extends to tuples, including views and in-memory
> query results.
>
> I find that partitioning does work, but it requires extra
> care on the part of the application developer that really
> shouldn't be necessary.
>
> Best,
>
> Clark
>
A single questionnaire has 1600 questions.  Fine.  That all of those
questions/answers are definitive to a single analyzable entity boggles
the mind.  There are no sections like "if previous answer was no, skip
the next section" or "choose any of the following values which apply"?
No demographics block for example?  I see the point of "getting it all
back simply" but there's also the cost of always having all the other
stuff when I'm only interested in (hypothetical) sections 1, 5, and 7.

Re: More then 1600 columns?

From
"Mark Mitchell"
Date:

These tables are created and maintained automatically by software.  A table may be created with some initial columns and as respondents take a survey and we collect data each data point is stored in a column and these columns are created on the fly by the application.

There is no need to "maintain" it as the software does that for us. While it may seems like bad practice it has actually worked out very well for us for years. When we reach the column limit we create sub tables but when reporting on this data it becomes an issue because you have to know which data point is in what sub table, that is what I am trying to get around. I think it's very obvious that Postgres developers have no interest in going over 1600 columns in the foreseeable future and which forces us to find creative ways around it but I just don't see why it has to be this way. Even view which are not even stored entities have this limit.  I'm sure there are good reasons in terms of how the Postgres code works which is way over my head but when you look at the comparison of database limitations ( http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems#Limits   ) it seems that in every area expect column name size and number of columns postgres has huge advantages over other systems.  

 

From: Dmitriy Igrishin [mailto:dmitigr@gmail.com]
Sent: Friday, November 12, 2010 4:30 PM
To: Clark C. Evans
Cc: Dann Corbit; Mark Mitchell; Tom Lane; pgsql-general@postgresql.org
Subject: Re: [GENERAL] More then 1600 columns?

 

I can't imagine how to maintain a database with tables with
1600 columns... I can't imagine how to simple work with this
garbage of data via SQL...

2010/11/13 Clark C. Evans <cce@clarkevans.com>

On Fri, 12 Nov 2010 21:10 +0000, "Dann Corbit" wrote:
> If (for access) the single table seems simpler, then
> a view can be used.

Even if you "partition" the columns in the instrument
over N tables, you still can't query it in a single
result set.   The limit is quite deep in PostgreSQL
and extends to tuples, including views and in-memory
query results.

I find that partitioning does work, but it requires extra
care on the part of the application developer that really
shouldn't be necessary.

Best,

Clark


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
// Dmitriy.

Re: More then 1600 columns?

From
Dmitriy Igrishin
Date:
Mark,

For these purposes (dynamically properties) we have hstore in PostgreSQL
which works very well. Any table in PostgreSQL is a static composite type
as well. I don't clearly understand why do you need thousands NULLable
columns in a huge tables (which also makes NOT NULL constraints useless)
rather than well defined compact table with just one (statically allocated)
column of type hstore ? Again I think that hstore ideal solution here.

2010/11/13 Mark Mitchell <mmitchell@riccagroup.com>

These tables are created and maintained automatically by software.  A table may be created with some initial columns and as respondents take a survey and we collect data each data point is stored in a column and these columns are created on the fly by the application.

There is no need to "maintain" it as the software does that for us. While it may seems like bad practice it has actually worked out very well for us for years. When we reach the column limit we create sub tables but when reporting on this data it becomes an issue because you have to know which data point is in what sub table, that is what I am trying to get around. I think it's very obvious that Postgres developers have no interest in going over 1600 columns in the foreseeable future and which forces us to find creative ways around it but I just don't see why it has to be this way. Even view which are not even stored entities have this limit.  I'm sure there are good reasons in terms of how the Postgres code works which is way over my head but when you look at the comparison of database limitations ( http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems#Limits   ) it seems that in every area expect column name size and number of columns postgres has huge advantages over other systems.  

 

From: Dmitriy Igrishin [mailto:dmitigr@gmail.com]
Sent: Friday, November 12, 2010 4:30 PM
To: Clark C. Evans
Cc: Dann Corbit; Mark Mitchell; Tom Lane; pgsql-general@postgresql.org
Subject: Re: [GENERAL] More then 1600 columns?

 

I can't imagine how to maintain a database with tables with
1600 columns... I can't imagine how to simple work with this
garbage of data via SQL...

2010/11/13 Clark C. Evans <cce@clarkevans.com>

On Fri, 12 Nov 2010 21:10 +0000, "Dann Corbit" wrote:
> If (for access) the single table seems simpler, then
> a view can be used.

Even if you "partition" the columns in the instrument
over N tables, you still can't query it in a single
result set.   The limit is quite deep in PostgreSQL
and extends to tuples, including views and in-memory
query results.

I find that partitioning does work, but it requires extra
care on the part of the application developer that really
shouldn't be necessary.

Best,

Clark


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
// Dmitriy.




--
// Dmitriy.


Re: More then 1600 columns?

From
Tom Lane
Date:
"Mark Mitchell" <mmitchell@riccagroup.com> writes:
> I think it's very obvious that Postgres developers have no interest in
> going over 1600 columns in the foreseeable future and which forces us
> to find creative ways around it but I just don't see why it has to be
> this way.

Well, it's a tradeoff.  Supporting > 1600 columns would require widening
t_hoff, which means another byte occupied by row headers, which is a
data structure that we have sweated blood to minimize and aren't eager
to bloat just to support what seems extremely dubious database design
practice.  The other possible inefficiencies are minor by comparison
to that objection: larger row headers are a cost that will be paid by
*every* user of Postgres.

            regards, tom lane

Re: More then 1600 columns?

From
"Clark C. Evans"
Date:
Tom,

Thank you for the helpful response.

It is very reasonable for the PostgreSQL developers to
decide this isn't a common enough problem to justify the
effort to change and/or the runtime cost.  For example,
I'd rather advocate for other features myself (such as CUBE).

The solution "in the field" is to shard the columns into
sets (I call them facets).  Your instrument table then
has N subordinate tables with a 1 to 0/1 relationship
implemented by placing a UNIQUE key on the FK columns.
The consequence is that application software has to
manage the column-based partitioning.

The suggestion to not use a relational model (HSTORE,
XML, or EAV table) is also valid for some use cases.
However, it often replaces one problem with another:
you now need to write your own query language.  IMHO,
if you go this far, you should switch to RDF+SPARQL.

What would be most helpful though is if the answer to
this question stop being an attack on the business
requirement analysis, database design skills, and/or
sanity of the requester.  It's a limitation of
PostgreSQL's implementation; a deliberate performance
trade-off that is infeasible to change.  That's fine.
PostgreSQL is a fantastic database -- it's welcome to
have a few limitations.

Best,

Clark

On Fri, 12 Nov 2010 17:10 -0500, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> "Mark Mitchell" <mmitchell@riccagroup.com> writes:
> > I think it's very obvious that Postgres developers have no interest in
> > going over 1600 columns in the foreseeable future and which forces us
> > to find creative ways around it but I just don't see why it has to be
> > this way.
>
> Well, it's a tradeoff.  Supporting > 1600 columns would require widening
> t_hoff, which means another byte occupied by row headers, which is a
> data structure that we have sweated blood to minimize and aren't eager
> to bloat just to support what seems extremely dubious database design
> practice.  The other possible inefficiencies are minor by comparison
> to that objection: larger row headers are a cost that will be paid by
> *every* user of Postgres.
>
>             regards, tom lane
>






Re: More then 1600 columns?

From
Tom Lane
Date:
"Clark C. Evans" <cce@clarkevans.com> writes:
> What would be most helpful though is if the answer to
> this question stop being an attack on the business
> requirement analysis, database design skills, and/or
> sanity of the requester.  It's a limitation of
> PostgreSQL's implementation; a deliberate performance
> trade-off that is infeasible to change.

Just for the record: I don't think its *infeasible* to change it.
What I'm saying is that it would be a bad tradeoff for the vast
majority of users.

I could imagine accepting a patch that provides a compile-time option
to change the limit.  The core of it would be something like

+    #ifdef SUPPORT_RIDICULOUSLY_MANY_COLUMNS
+        uint16    t_hoff;        /* sizeof header incl. bitmap, padding */
+    #else
        uint8    t_hoff;        /* sizeof header incl. bitmap, padding */
+    #endif

plus whatever other fallout ensues elsewhere.  But somebody would have
to step up to develop and test such a patch, and keep on testing it to
ensure no bit-rot sets in, because it seems very unlikely that any
mainstream distributions would ever choose to enable the option.
I don't think any of the core developers have any interest in hacking
on this; we have bigger fish to fry.  So it'd be a matter of someone
scratching their own itch.

            regards, tom lane