Re: pgsql and large tables - Mailing list pgsql-general
From | Gurupartap Davis |
---|---|
Subject | Re: pgsql and large tables |
Date | |
Msg-id | 004201c16e3a$3d915de0$0f00a8c0@marlows Whole thread Raw |
In response to | Re: pgsql and large tables (Francisco Reyes <lists@natserv.com>) |
Responses |
Re: pgsql and large tables
|
List | pgsql-general |
Sheesh...I think I see what you're saying about normalization. I've never had a formal db class, but it looks like you're talking about removing all the redundancy from the database...and there's a lot in this one(!) I could have a forecast table like so: id serial primary key, model_id references model (id), yearmoda date, modelruntime smallint Then a temperature table: forecast_id references forecast (id), stn_id references station (id) modelhr smallint, value numeric (6,2) repeat for relative humidity, dewpoint, etc... zhr,zday, and modelrundate in the current table are redundant, as they are derivitive of the date and modelhr.... It looks like it would save a hella lot of disk space...probably over 50%, seeing as how certain variables (dwpt, rh, lpre) only show up in a fraction of the records...(ie, most records have NULL for these fields) Dang...I don't know if my employer will let me make a change this radical, though. We would have to redo every single query in about 50 scripts. Hrmmm. I'm a programmer, not a DB admin...but my employer is too cheap (errr, sorry..."thrifty") to hire a real DB admin :-/ > > > I'm migrating this table from an existing mysql installation... > > This is what it looks like right now: > > Table "forecast" > > Attribute | Type | Modifier > > --------------+-----------------------+---------- > > zhr | smallint | > > zday | smallint | > > model | character varying(32) | > > temp | numeric(6,2) | > > modelhr | smallint | not null > > modelruntime | smallint | not null > > modelrundate | smallint | > > stn | character(4) | not null > > rh | numeric(6,2) | > > wdsp | numeric(6,2) | > > wddir | character varying(2) | > > dwpt | numeric(6,2) | > > lpre | numeric(6,2) | > > yearmoda | date | not null > > > > It's a table for weather forecasts, a record is identified uniquely by > > (model, stn, yearmoda, modelruntime, modelhr) although I will rarely have a > > query that specifies all of those fields. We need to run the db 24x7. Data comes in all day long, one model run at a time. A model run is anywhere from 10-40 hours interpolated over about 1800 weather stations. All we do is add data (and, occasionally, re-insert data that was found corrupt)...no deleting, though. Would you recommend doing a vacuum analyze after every model run, or would once a day be sufficient? > You also mentioned issues with downtime. Is this DB going to be used > 24x7? You need to do vacuum analyze at least after every big update. > > How often will your data be updated? Once data is loaded will it be > changed at all? >
pgsql-general by date: