Re: Normalization or Performance

From: Jim C. Nasby
Subject: Re: Normalization or Performance
Date: ,
Msg-id: 20041202230348.GG41545@decibel.org
(view: Whole thread, Raw)
In response to: Normalization or Performance  (Alvaro Nunes Melo)
List: pgsql-performance

Tree view

Normalization or Performance  (Alvaro Nunes Melo, )
 Re: Normalization or Performance  (Richard Huxton, )
 Re: Normalization or Performance  ("Jim C. Nasby", )
 Re: Normalization or Performance  ("Iain", )

On Thu, Dec 02, 2004 at 03:05:55PM -0200, Alvaro Nunes Melo wrote:
> Hi,
>
> Before writing this mail, I'd researched a little about this topic, and
> got some opinions from guys like Fabien Pascal, who argues that logical
> design should be separated from physical design, and other sources. As
> this is not fact, I'm writing to you guys, that make things work in real
> world.
>
> We started our first big (for our company standards) project always
> thinking in normalization. But once we imported legacy data into the DB,
> things got harder.
>
> One example is the clients status. A client might be active, inactive or
> pending (for many reasons). We store all the status a client have since
> it is in the system. To check what is the actual status of a client, we
> get the last status from this historical status table. This take a
> considerable time, so our best results were achieved building  a
> function that checks the status and indexing this function. The problem
> is that indexed functions mus bu immutable, so as you can figure, if the
> status change after the creation of the index, the retunr of the
> function is still the same.
>
> What do you suggest for situations like this? Should I add a field to
> clients table and store its actual status, keep storing data in the
> historical table an control its changes with a trigger?

It seems you shouldn't have to resort to this. SELECT status FROM
client_status WHERE client_id = blah ORDER BY status_date DESC LIMIT 1
should be pretty fast given an index on client_id, status_date (which
should be able to be unique).

> There are other situations that are making things difficult to us. For
> example, one query must return the total amount a client bought in the
> last 90 days. It's taking too long, when we must know it for many
> clients, many times. So should I create summarization tables to store
> this kind of stuff, update it with a trigger in daily basis (for
> example), and solve this problem with one join?

This sounds like a more likely candidate for a summary table, though you
might not want to use a trigger. Unless you need absolutely up-to-date
information it seems like a nightly process to update the totals would
be better and more efficient.

> Our database is not that big. The larger table has about 7.000.000 rows.
> About 50.000 clients, half of them active. All that I'd point out above
> uses indexes for queries, etc. But even with this it's not been fast
> enough. We have a Dell server for this (I know, the Dell issue), a Dual
> Xeon 2.8, SCSI HD, 1 GB mem. Do we need better hardware for our system?

Is all this on a single HD? That's going to be a huge bottleneck. You'll
be much better off with a mirrored partition for your WAL files and
either raid5 or raid10 for the database itself. You'd probably be better
off with more memory as well. If you're going to buy a new box instead
of upgrade your existing one, I'd recommend going with an Opteron
because of it's much better memory bandwidth.

For reference, stats.distributed.net is a dual Opteron 244 1.8GHz with
4G ram, a 200G mirror for WAL and the system files and a 6x200G RAID10
for the database (all SATA drives). The largest table 120M rows and
825,000 8k pages. I can scan 1/5th of that table via an index scan in
about a minute. (The schema can be found at
http://minilink.org/cvs.distributed.net/l3.sql.
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


pgsql-performance by date:

From: Christopher Browne
Date:
Subject: Re: pg replication tools?
From: "Vishal Kashyap @ [SaiHertz]"
Date:
Subject: Re: pg replication tools?