Re: Beyond the 1600 columns limit on windows - Mailing list pgsql-general

From John D. Burger
Subject Re: Beyond the 1600 columns limit on windows
Date
Msg-id ef5f06ce42ae2d3ad304504be6d5b7fc@mitre.org
Whole thread Raw
In response to Re: Beyond the 1600 columns limit on windows  ("Evandro's mailing lists (Please, don't send personal messages to this address)" <listasjr@gmail.com>)
Responses Re: Beyond the 1600 columns limit on windows  ("Jim C. Nasby" <jnasby@pervasive.com>)
Re: Beyond the 1600 columns limit on windows  ("Evandro's mailing lists (Please, don't send personal messages to this address)" <listasjr@gmail.com>)
List pgsql-general
Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:

> It has nothing to do with normalisation.  It is a program for
> scientific applications.
> Data values are broken into column to allow multiple linear regression
> and multivariate regression trees computations.

Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:

instanceID  color=red  color=blue  color=yellow  ...  height=71
height=72
-------------------------------------------------
42           True          False       False
43           False     True        False
44           False     False       True
...

This is likely to be extremely sparse, and you might use a sparse
representation accordingly.  As several folks have suggested, the
representation in the database needn't be the same as in your code.

> Even SPSS the most well-known statistic sw uses the same approach and
> data structure that my software uses.
> Probably I should use another data structure but would not be as
> eficient and practical as the one I use now.

The point is that, if you want to use Postgres, this is not in fact
efficient and practical.  In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.

- John D. Burger
   MITRE

pgsql-general by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Best way to use indexes for partial match at beginning
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Setting max_fsm_pages