Thread: maximum "target list" (maximum columns)

maximum "target list" (maximum columns)

From
Shaun Cutts
Date:
The opinion of database developers is that using more than 1664 columns is bad design, so that the current maximum
numberof columns is not onerous. 

When I simply built applications on top of databases, I was of the same opinion.

However, now my job is to perform some sort of analysis on data from elsewhere. Over the course of a couple weeks I
needto do something with data that often is 
structured into tables of many thousands of columns. Figuring out what the appropriate form is part of manipulating the
data— something for which I’d 
like to use postgres. My applications, such as they are, handle metadata not data.

Operations such as crosstab or json_populate_record can easily create rows with many more than 1664 fields. I also use
datasuch as census data — with many different survey fields all applying to geographic areas. For a given application
onlysome are relevant, but ideally I’d create a materialized view for a given project with relevant data. In the best
case,the “library” version of the dataset would have just a few very wide tables. Storing in json is possible but
inconvenient,slower and means that often type info must be stored 
separately.

In short, IMHO the database shouldn’t force me to structure my data before I understand it, and being able to query it
ishow I come to understand it. At some scale, practicality dictates that one needs to take “special measures” to handle
largevolumes of data — large in breadth as well as width. But this boundary should be on the order of millions of
columns,not thousands. 

Is there a reason besides “its bad design” to disallow tables with many columns?



Re: maximum "target list" (maximum columns)

From
Tom Lane
Date:
Shaun Cutts <shauncutts@factfiber.com> writes:
> Is there a reason besides “its bad design” to disallow tables with many
> columns?

There are numerous implementation problems you'd have to overcome.
There are illuminating comments in htup_details.h:

 * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
 * The key limit on this value is that the size of the fixed overhead for
 * a tuple, plus the size of the null-values bitmap (at 1 bit per column),
 * plus MAXALIGN alignment, must fit into t_hoff which is uint8.  On most
 * machines the upper limit without making t_hoff wider would be a little
 * over 1700.  We use round numbers here and for MaxHeapAttributeNumber
 * so that alterations in HeapTupleHeaderData layout won't change the
 * supported max number of columns.

and

 * In any case, depending on column data types you will likely be running
 * into the disk-block-based limit on overall tuple size if you have more
 * than a thousand or so columns.  TOAST won't help.

Also, in the past we've noted planner and executor behaviors that are
O(N^2) in the number of columns in a query result.  I'm not sure those are
all gone, but if not you'd be in for some pain with very wide queries.

If someone were to throw lots of effort at the problem, and not care
about preserving on-disk database compatibility, no doubt all these
things could be dealt with.  But I don't see it getting to the top of
the community's TODO list.

            regards, tom lane


Re: maximum "target list" (maximum columns)

From
Gavin Flower
Date:
On 16/06/16 16:39, Shaun Cutts wrote:
> The opinion of database developers is that using more than 1664 columns is bad design, so that the current maximum
numberof columns is not onerous. 
>
> When I simply built applications on top of databases, I was of the same opinion.
>
> However, now my job is to perform some sort of analysis on data from elsewhere. Over the course of a couple weeks I
needto do something with data that often is 
> structured into tables of many thousands of columns. Figuring out what the appropriate form is part of manipulating
thedata — something for which I’d 
> like to use postgres. My applications, such as they are, handle metadata not data.
>
> Operations such as crosstab or json_populate_record can easily create rows with many more than 1664 fields. I also
usedata such as census data — with many different survey fields all applying to geographic areas. For a given
applicationonly some are relevant, but ideally I’d create a materialized view for a given project with relevant data.
Inthe best case, the “library” version of the dataset would have just a few very wide tables. Storing in json is
possiblebut inconvenient, slower and means that often type info must be stored 
> separately.
>
> In short, IMHO the database shouldn’t force me to structure my data before I understand it, and being able to query
itis how I come to understand it. At some scale, practicality dictates that one needs to take “special measures” to
handlelarge volumes of data — large in breadth as well as width. But this boundary should be on the order of millions
ofcolumns, not thousands. 
>
> Is there a reason besides “its bad design” to disallow tables with many columns?
>
>
>
Not had to deal with extremely fat database tables.

However, back when I was a COBOL programmer: one system had a very big
record type, more than 10 times bigger than anything else I'd ever dealt
with.  It was an extreme pain to debug problems with such a large record
(try poring over hex dumps on lineflow!) - so I expect it would be
painful for a large database table.

Could you spread the huge number of columns into a number or database
tables with some sort of logical grouping?


Cheers,
Gavin



Re: maximum "target list" (maximum columns)

From
John R Pierce
Date:
On 6/15/2016 9:39 PM, Shaun Cutts wrote:
> Is there a reason besides “its bad design” to disallow tables with many columns?

it would require significantly different internal architecture optimized
for said huge rows.



--
john r pierce, recycling bits in santa cruz