Re: pgsql and large tables - Mailing list pgsql-general

From Francisco Reyes
Subject Re: pgsql and large tables
Date
Msg-id 20011116094724.T62783-100000@zoraida.natserv.net
Whole thread Raw
In response to Re: pgsql and large tables  ("Gurupartap Davis" <partap@yahoo.com>)
Responses Re: pgsql and large tables
List pgsql-general
On Thu, 15 Nov 2001, Gurupartap Davis wrote:

> Sheesh...I think I see what you're saying about normalization.  I've never
> had a formal db class, but it looks like you're talking about removing all
> the redundancy from the database...and there's a lot in this one(!)

The basics of normalization are not all that complex. Most of the time the
biggest issue is truly understanding your data and how you will need to
access it.

> I could have a forecast table like so:

I could try to help you, but I would need the original tables with every
field explained. Then your suggested new design.

> zhr,zday, and modelrundate in the current table are redundant, as they are
> derivitive of the date and modelhr....

If they can be derived AND you will never need to search on them, then
don't store them at all.

> It looks like it would save a hella lot of disk space.

It is not only the space saving, but your queries will run faster.

> seeing as how certain variables (dwpt, rh, lpre) only show up in a fraction
> of the records...(ie, most records have NULL for these fields)

Those could possibly be on a separate table then.

> Dang...I don't know if my employer will let me make a change this radical,
> though.

You could do something in between an optimal design and something which
would be relatively easy to do. Just explain to your employer that these
changes can reduce space consumption and speed up your queries. The
alternative is getting faster hardware and more memory... and even then
queries may not be as fast as they could be just by re-designing your
database.

> We would have to redo every single query in about 50 scripts.

Again talk to your employer. This change will take place once and your
queries and the savings on time and space will be a long term process.

> Hrmmm.  I'm a programmer, not a DB admin...but my employer is too cheap
> (errr, sorry..."thrifty") to hire a real DB admin :-/

If you are not paid by the hour then it may be just an issue of you
getting some help over the net and doing it yourself.
You may want to look into general DB forums, newsgroups, etc... where
people with more experience on design can help you.

Doing a non formal design is fairly simple, it is just a matter of knowing
your data. If you take the time to do what I suggested and explain your
existing tables field by fiend, taking in consideration we may not know
the terms, then people on this forum may be able to help you.

You don't need to go into too many details. Something like
zhr hour of ???. Usually blank/populated, etc..
zday day of ???. Usually blank/populated, etc..
model a description/name??
temp temperature at time/date???

You could also through in 20 records so we see how things are distributed.
In other words which values are often repeated and may be better off on a
separate table. Also which fields are usually blank so perhaps you may
consider putting them on a separate table for the space savings.

> We need to run the db 24x7.  Data comes in all day long,

How about letting the data go into ASCII files and then import them at set
intervals. In particular you want to have time to do your vacuum analyze
at least once per day.

>one model run at a time.
>A model run is anywhere from 10-40 hours
>interpolated over about 1800 weather stations.

Is the model run on the database?


>All we do is add data (and, occasionally, re-insert data
> that was found corrupt)...no deleting, though.  Would you recommend doing a
> vacuum analyze after every model run, or would once a day be sufficient?

Actually in your case I would recommend to do a vacuum analyze after each
large insert, if possible. If not then once a day.

Does the data comes in batches or as a continues stream?

What are the chances of getting more memory as I mentioned on the previous
mail?


pgsql-general by date:

Previous
From: Eric Crampton
Date:
Subject: Poor performance on SCSI machines, good on IDE?
Next
From: Helge Bahmann
Date:
Subject: Re: Poor performance on SCSI machines, good on IDE?