Re: Normalization or Performance - Mailing list pgsql-performance
From | Jim C. Nasby |
---|---|
Subject | Re: Normalization or Performance |
Date | |
Msg-id | 20041202230348.GG41545@decibel.org Whole thread Raw |
In response to | Normalization or Performance (Alvaro Nunes Melo <al_nunes@atua.com.br>) |
List | pgsql-performance |
On Thu, Dec 02, 2004 at 03:05:55PM -0200, Alvaro Nunes Melo wrote: > Hi, > > Before writing this mail, I'd researched a little about this topic, and > got some opinions from guys like Fabien Pascal, who argues that logical > design should be separated from physical design, and other sources. As > this is not fact, I'm writing to you guys, that make things work in real > world. > > We started our first big (for our company standards) project always > thinking in normalization. But once we imported legacy data into the DB, > things got harder. > > One example is the clients status. A client might be active, inactive or > pending (for many reasons). We store all the status a client have since > it is in the system. To check what is the actual status of a client, we > get the last status from this historical status table. This take a > considerable time, so our best results were achieved building a > function that checks the status and indexing this function. The problem > is that indexed functions mus bu immutable, so as you can figure, if the > status change after the creation of the index, the retunr of the > function is still the same. > > What do you suggest for situations like this? Should I add a field to > clients table and store its actual status, keep storing data in the > historical table an control its changes with a trigger? It seems you shouldn't have to resort to this. SELECT status FROM client_status WHERE client_id = blah ORDER BY status_date DESC LIMIT 1 should be pretty fast given an index on client_id, status_date (which should be able to be unique). > There are other situations that are making things difficult to us. For > example, one query must return the total amount a client bought in the > last 90 days. It's taking too long, when we must know it for many > clients, many times. So should I create summarization tables to store > this kind of stuff, update it with a trigger in daily basis (for > example), and solve this problem with one join? This sounds like a more likely candidate for a summary table, though you might not want to use a trigger. Unless you need absolutely up-to-date information it seems like a nightly process to update the totals would be better and more efficient. > Our database is not that big. The larger table has about 7.000.000 rows. > About 50.000 clients, half of them active. All that I'd point out above > uses indexes for queries, etc. But even with this it's not been fast > enough. We have a Dell server for this (I know, the Dell issue), a Dual > Xeon 2.8, SCSI HD, 1 GB mem. Do we need better hardware for our system? Is all this on a single HD? That's going to be a huge bottleneck. You'll be much better off with a mirrored partition for your WAL files and either raid5 or raid10 for the database itself. You'd probably be better off with more memory as well. If you're going to buy a new box instead of upgrade your existing one, I'd recommend going with an Opteron because of it's much better memory bandwidth. For reference, stats.distributed.net is a dual Opteron 244 1.8GHz with 4G ram, a 200G mirror for WAL and the system files and a 6x200G RAID10 for the database (all SATA drives). The largest table 120M rows and 825,000 8k pages. I can scan 1/5th of that table via an index scan in about a minute. (The schema can be found at http://minilink.org/cvs.distributed.net/l3.sql. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
pgsql-performance by date: