Re: pgsql and large tables - Mailing list pgsql-general

From Gurupartap Davis
Subject Re: pgsql and large tables
Date
Msg-id 004201c16e3a$3d915de0$0f00a8c0@marlows
Whole thread Raw
In response to Re: pgsql and large tables  (Francisco Reyes <lists@natserv.com>)
Responses Re: pgsql and large tables  (Francisco Reyes <lists@natserv.com>)
List pgsql-general
Sheesh...I think I see what you're saying about normalization.  I've never
had a formal db class, but it looks like you're talking about removing all
the redundancy from the database...and there's a lot in this one(!)

I could have a forecast table like so:
id serial primary key,
model_id references model (id),
yearmoda date,
modelruntime smallint

Then a temperature table:
forecast_id references forecast (id),
stn_id references station (id)
modelhr smallint,
value numeric (6,2)

repeat for relative humidity, dewpoint, etc...

zhr,zday, and modelrundate in the current table are redundant, as they are
derivitive of the date and modelhr....

It looks like it would save a hella lot of disk space...probably over 50%,
seeing as how certain variables (dwpt, rh, lpre) only show up in a fraction
of the records...(ie, most records have NULL for these fields)

Dang...I don't know if my employer will let me make a change this radical,
though.  We would have to redo every single query in about 50 scripts.
Hrmmm.  I'm a programmer, not a DB admin...but my employer is too cheap
(errr, sorry..."thrifty") to hire a real DB admin :-/

>
> > I'm migrating this table from an existing mysql installation...
> > This is what it looks like right now:
> >               Table "forecast"
> >   Attribute   |         Type          | Modifier
> > --------------+-----------------------+----------
> >  zhr          | smallint              |
> >  zday         | smallint              |
> >  model        | character varying(32) |
> >  temp         | numeric(6,2)          |
> >  modelhr      | smallint              | not null
> >  modelruntime | smallint              | not null
> >  modelrundate | smallint              |
> >  stn          | character(4)          | not null
> >  rh           | numeric(6,2)          |
> >  wdsp         | numeric(6,2)          |
> >  wddir        | character varying(2)  |
> >  dwpt         | numeric(6,2)          |
> >  lpre         | numeric(6,2)          |
> >  yearmoda     | date                  | not null
> >
> > It's a table for weather forecasts, a record is identified uniquely by
> > (model, stn, yearmoda, modelruntime, modelhr) although I will rarely
have a
> > query that specifies all of those fields.

We need to run the db 24x7.  Data comes in all day long, one model run at a
time.  A model run is anywhere from 10-40 hours interpolated over about 1800
weather stations.  All we do is add data (and, occasionally, re-insert data
that was found corrupt)...no deleting, though.  Would you recommend doing a
vacuum analyze after every model run, or would once a day be sufficient?

> You also mentioned issues with downtime. Is this DB going to be used
> 24x7? You need to do vacuum analyze at least after every big update.
>
> How often will your data be updated? Once data is loaded will it be
> changed at all?
>


pgsql-general by date:

Previous
From: Doug McNaught
Date:
Subject: Re: Database server crash ! URGENT !
Next
From: "Command Prompt, Inc."
Date:
Subject: 7.2 pg_hba.conf load on SIGHUP?