Re: More then 1600 columns? - Mailing list pgsql-general

From Dann Corbit
Subject Re: More then 1600 columns?
Date
Msg-id 87F42982BF2B434F831FCEF4C45FC33E4207CA38@EXCHANGE.corporate.connx.com
Whole thread Raw
In response to Re: More then 1600 columns?  ("Clark C. Evans" <cce@clarkevans.com>)
Responses Re: More then 1600 columns?  ("Clark C. Evans" <cce@clarkevans.com>)
List pgsql-general
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Clark C. Evans
> Sent: Friday, November 12, 2010 12:25 PM
> To: Mark Mitchell; 'Tom Lane'
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] More then 1600 columns?
>
> On Fri, 12 Nov 2010, Tom Lane wrote:
> > Generally, wanting more than a few dozen columns is a
> > good sign that you need to rethink your schema design.
> > What are you trying to accomplish exactly?
>
> Generally speaking, yes.  However, survey instruments are a
> very legitimate design where this is not only correct, but
> quite desirable.  Sure -- in an ideal world you might be
> able to turn your instrument into something that is a highly
> normalized structure. However, it's simply not practical.

Two tables with a foreign key is not practical?

> PostgreSQL absolutely should look into supporting this.  While
> the storage may not be ideal, it is an increasingly common case.
> Perhaps the storage engine could permit a set of columns to be
> tagged for ``sparse column storage``?

Reminds me of a quote from the movie Pinocchio:
"Give a boy enough rope and he'll hang himself."
Is a fundamental design change for the purpose of allowing massive denormalization really a good idea?

> On Fri, 12 Nov 2010, Mark Michell wrote:
> > Yes I understand that this is "bad design" but what we are doing is
> > storing each form field in a survey in its own column. For very long
> > surveys we end up with thousands of elements.
>
> It's *not* necessarily bad design.  If you've got a survey
> instrument that has 1000 unique data points, it is a reasonable
> thing to do.  We (Prometheus Research) have these sorts of
> things all the times.
>
> #1  You want to store and query them based on specific fields.

Use a join

> #2  You want to enforce the data integrity of each column
>     by using types, enums and other techniques.

Non-sequiter

> #3  You want to integrate this structure with the other
>     aspects of your database.

Foreign keys

> > I know storing in an array is possible but it makes it
> > so much easier to query the data set when each element
> > is in its own field.  comments on why I should not do this
>
> Using HSTORE or using an EAV table completely undermines
> the ability to ensure column-level typing, constraints,
> and... use SQL.  With arrays / EAV table structure, you
> end up having to re-invent your query language. It is a
> non-solution.
>
> > the possible alternatives
>
> The work-around we use is to break our instrument into
> one table per ``section``.  We also save each section as the
> user completes that part of the instrument.  This has the
> advantage of allowing you to model repeating sections.  We
> typically enforce column uniqueness across all sections.
> This requires your application and query tool to be a bit
> smart about making a virtual table with all 1000+ columns.
> Kinda sucks, but it works.
>
> ...
>
> I guess the other option is to look at a RDF data store where
> you use SPARQL to query the data.  OpenJena SDB is implemented
> as a 3-table storage in PostgreSQL if you wish to keep the same
> database for both your assessment storage and other tables. I'm
> not lovin' this option, but it's on the table for our company.
>
> Instead, I'd absolutely love to see PostgreSQL add a variant
> scalar type (for when the type of a column changes over the
> lifetime of the instrument) and a column storage for Nth+
> and additional columns.  For now, the work-around above sort
> of works for us.

Sometimes it is a good idea to denormalize.  For instance, performance might improve if you put bill_to_address and
ship_to_addressin the customer record instead of joining against an address table.  But if you have thousands of
differentchild types, it is a good idea to ask yourself about the value of denormalization verses the cost.  There is a
reasonthat BCNF was invented in 1974. 

Normally, I am all for the idea of making a database more capable.  But if that design change encourages terrible
practiceI can think of better ways to expend programmer time and effort (namely on changes that encourage good practice
andincreased productivity). 

I would be very curious to hear of a real case where there is an advantage to having many thousands of columns in a
tableverses using a child table.  I have never seen such a thing in real life.  I am not opposed to the idea that such
athing is possible.  It is just that I have always found the child table is simply better.  If (for access) the single
tableseems simpler, then a view can be used. 

IMO-YMMV



pgsql-general by date:

Previous
From: Stodge
Date:
Subject: ODBC Cursor inserting records appears to lock the database
Next
From: "Clark C. Evans"
Date:
Subject: Re: More then 1600 columns?