Re: pgsql and large tables - Mailing list pgsql-general
From | Francisco Reyes |
---|---|
Subject | Re: pgsql and large tables |
Date | |
Msg-id | 20011116094724.T62783-100000@zoraida.natserv.net Whole thread Raw |
In response to | Re: pgsql and large tables ("Gurupartap Davis" <partap@yahoo.com>) |
Responses |
Re: pgsql and large tables
|
List | pgsql-general |
On Thu, 15 Nov 2001, Gurupartap Davis wrote: > Sheesh...I think I see what you're saying about normalization. I've never > had a formal db class, but it looks like you're talking about removing all > the redundancy from the database...and there's a lot in this one(!) The basics of normalization are not all that complex. Most of the time the biggest issue is truly understanding your data and how you will need to access it. > I could have a forecast table like so: I could try to help you, but I would need the original tables with every field explained. Then your suggested new design. > zhr,zday, and modelrundate in the current table are redundant, as they are > derivitive of the date and modelhr.... If they can be derived AND you will never need to search on them, then don't store them at all. > It looks like it would save a hella lot of disk space. It is not only the space saving, but your queries will run faster. > seeing as how certain variables (dwpt, rh, lpre) only show up in a fraction > of the records...(ie, most records have NULL for these fields) Those could possibly be on a separate table then. > Dang...I don't know if my employer will let me make a change this radical, > though. You could do something in between an optimal design and something which would be relatively easy to do. Just explain to your employer that these changes can reduce space consumption and speed up your queries. The alternative is getting faster hardware and more memory... and even then queries may not be as fast as they could be just by re-designing your database. > We would have to redo every single query in about 50 scripts. Again talk to your employer. This change will take place once and your queries and the savings on time and space will be a long term process. > Hrmmm. I'm a programmer, not a DB admin...but my employer is too cheap > (errr, sorry..."thrifty") to hire a real DB admin :-/ If you are not paid by the hour then it may be just an issue of you getting some help over the net and doing it yourself. You may want to look into general DB forums, newsgroups, etc... where people with more experience on design can help you. Doing a non formal design is fairly simple, it is just a matter of knowing your data. If you take the time to do what I suggested and explain your existing tables field by fiend, taking in consideration we may not know the terms, then people on this forum may be able to help you. You don't need to go into too many details. Something like zhr hour of ???. Usually blank/populated, etc.. zday day of ???. Usually blank/populated, etc.. model a description/name?? temp temperature at time/date??? You could also through in 20 records so we see how things are distributed. In other words which values are often repeated and may be better off on a separate table. Also which fields are usually blank so perhaps you may consider putting them on a separate table for the space savings. > We need to run the db 24x7. Data comes in all day long, How about letting the data go into ASCII files and then import them at set intervals. In particular you want to have time to do your vacuum analyze at least once per day. >one model run at a time. >A model run is anywhere from 10-40 hours >interpolated over about 1800 weather stations. Is the model run on the database? >All we do is add data (and, occasionally, re-insert data > that was found corrupt)...no deleting, though. Would you recommend doing a > vacuum analyze after every model run, or would once a day be sufficient? Actually in your case I would recommend to do a vacuum analyze after each large insert, if possible. If not then once a day. Does the data comes in batches or as a continues stream? What are the chances of getting more memory as I mentioned on the previous mail?
pgsql-general by date: