Re: SQL - planet redundant data - Mailing list pgsql-general

From John D. Burger
Subject Re: SQL - planet redundant data
Date
Msg-id 6b3cfce4b5447675879272ef37e59e54@mitre.org
Whole thread Raw
In response to SQL - planet redundant data  (Poul Jensen <flyvholm@gfy.ku.dk>)
List pgsql-general
> To store the detailed records the SQL novice would construct one table
> pr. file and exclude any constant columns since these are redundant
> (given in summary table). These detailed tables would then have
> different column schemas - (mn,tmp,wind) for ~58%, (hr,mn,tmp,wind)
> for ~40%, (d_o_y,hr,mn,tmp,wind) for ~2% and (yr,d_o_y,hr,mn,tmp,wind)
> for ~0.005%.

I don't understand this at all - why would you have one table per file?
  Why not effectively append all of your per-file tables into one huge
table, with an additional column indicating which file the data comes
from?  Then you have only two tables, one with a row for each
observation in your data, one with a row for each file.  Some queries
need to join the two tables, but that's not a big deal.

There may be a lot of redundancy in your data, but that's life.  If
it's not deterministic redundancy (e.g., file ID => station ID), then I
would just live with it.

It also seems to me that you may be tying your schema design too
closely to the current way that the data is represented.  Do you really
need to have the data file figure so prominently in your design?  If
you do need to keep track of which file an observation came from, I
might have one huge table of observations, each with an observation ID,
and have another table that maps filenames to ranges of observation IDs
(filename, startObs, endObs).  A range query could then quickly find
which table is responsible for an observation ID.

Hmm, in fact if the redundant values you're worried about come in long
stretches (e.g., temperature is the same for many observations in a
row), I suppose you could do the same thing - map a constant value to
the range of observation IDs for which it holds.  This gets back to
having many tables, though.

- John Burger
   MITRE



pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: constraints on composite types
Next
From: "John Zubac"
Date:
Subject: FW: Configuring Postgres to use unix sockets