Home > mailing lists

Re: Beyond the 1600 columns limit on windows - Mailing list pgsql-general

From	John D. Burger
Subject	Re: Beyond the 1600 columns limit on windows
Date	November 8, 2005 15:49:18
Msg-id	ef5f06ce42ae2d3ad304504be6d5b7fc@mitre.org Whole thread Raw
In response to	Re: Beyond the 1600 columns limit on windows ("Evandro's mailing lists (Please, don't send personal messages to this address)" <listasjr@gmail.com>)
Responses	Re: Beyond the 1600 columns limit on windows Re: Beyond the 1600 columns limit on windows
List	pgsql-general

Tree view

Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:

> It has nothing to do with normalisation.  It is a program for
> scientific applications.
> Data values are broken into column to allow multiple linear regression
> and multivariate regression trees computations.

Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:

instanceID  color=red  color=blue  color=yellow  ...  height=71
height=72
-------------------------------------------------
42           True          False       False
43           False     True        False
44           False     False       True
...

This is likely to be extremely sparse, and you might use a sparse
representation accordingly.  As several folks have suggested, the
representation in the database needn't be the same as in your code.

> Even SPSS the most well-known statistic sw uses the same approach and
> data structure that my software uses.
> Probably I should use another data structure but would not be as
> eficient and practical as the one I use now.

The point is that, if you want to use Postgres, this is not in fact
efficient and practical.  In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.

- John D. Burger
   MITRE

pgsql-general by date:

From: "Jim C. Nasby"
Date: 08 November 2005, 15:48:39
Subject: Re: Best way to use indexes for partial match at beginning

From: "Jim C. Nasby"
Date: 08 November 2005, 15:50:54
Subject: Re: Setting max_fsm_pages

Re: Beyond the 1600 columns limit on windows - Mailing list pgsql-general

Previous

Next