From: <color><param>0000,0000,8000</param>Joel Burton <<jburton@scw.org></color>
To: <color><param>0000,0000,8000</param>Mike Castle <<dalgoda@ix.netcom.com></color>
<bold>Subject: <color><param>0000,0000,8000</param>Re: [NOVICE] running totals</bold></color>
Date sent: <color><param>0000,0000,8000</param>Tue, 5 Dec 2000 16:01:05 -0500</color>
On 5 Dec 2000, at 11:54, Mike Castle wrote:
<color><param>7F00,0000,0000</param>>
> I'm seeking advise on best way to handle something.
>
> I have a series of containers of a variety of fixed sizes and I want
to
> put items into them, each of which may be different sizes.
(Actually
> it's the length of a video tape and how many hours of tv shows I
can put
> onto it, but it's a pretty general problem).
>
> The question is: is it better to keep a running total on the
> containers and update it every time an item is put into/removed
from the
> container or is it better to do a select every time and recalculate
the
> size of the items in the containers?
>
> Or is this one of those in the class: it depends?
>
> Currently I have a trigger plpgsql function that updates a tape by
> adding/subtracting the show just added/deleted from that tape.
Since
> intervals are integeral (sp?) types, I'm not worried about round off
> errors. If the value was a float, I'd suppose one would want to a
> select at least every so often to account for round off error.
>
> Also, right now I have the running total as part of the object. This
> means that when the object is not in use, that's an extra field that
> isn't being utilized. Not an issue in this case since the containers
> are almost all in use at any given time. But, when a field of a row
is
> updated, is the whole row read from the database, updated, then
written
> back out? If so, would it be that in some cases, it would be
better to
> break out fields that are updated often into their own table with a
> reference? Which is more efficient?
>
> Thanks,
> mrc (Going book shopping tonight for good Tcl/TK and general SQL
> books) --
> Mike Castle Life is like a clock: You can work
> constantly
>
<underline><color><param>0000,8000,0000</param>dalgoda@ix.netcom.com</underline><color><param>7F00,0000,0000</param>
andbe right all the time, or not work
at all
> www.netcom.com/~dalgoda/ and be right at least twice a day. --
mrc
> We are all of us living in the shadow of Manhattan. --
Watchmen
</color>Like all good things, this can be an "it depends."
Generally speaking (!), it's a better idea not to store calculated
values in your database b/c if the calculation doesn't happen
(trigger was disabled during testing, etc.), your numbers would be
off, and you might never notice. In addition, you might want to
write a query that looks for *certain* totals (such as only action
films), so a total-all trigger might only be useful for some cases
anyway.
[ Mike -- I'm sure if you're creating triggers, you know basics about
VIEWs. But, since this is pgsql-novice, let me give other some
background.
Skip down to see more about your question]
If your database doesn't handle VIEWs, you're always stuck having
to do a complicated SELECT statement to find the most recent
count of shows on a tape. However, w/a database like PostgreSQL,
you can create a VIEW that shows the tape info, as well as total
shows.
For ex:
CREATE TABLE tape (
id INT NOT NULL PRIMARY KEY,
tape_name TEXT NOT NULL UNIQUE
);
-- assume that shows one-half-hour
-- is runtime '1', one-hour=runtime='2'
-- (this could be floats, or better still,
-- intervals, etc., but for simplicity:)
CREATE TABLE tv (
title TEXT PRIMARY KEY,
runtime INT NOT NULL
CHECK (runtime BETWEEN 1 AND 10),
tape_id INT NOT NULL REFERENCES tape
);
INSERT INTO tape VALUES (1, 'Simpsons');
INSERT INTO tape VALUES (2, 'Crime shows');
INSERT INTO tv VALUES ('Simpsons 3-eyed fish episode', 1, 1);
INSERT INTO tv VALUES ('Simpsons Lisa''s Crush on Nelson episode',
1, 1);
INSERT INTO tv VALUES ('Law & Order Episode 23', 1, 2);
You could always
SELECT id, tape_name, sum(runtime)
FROM tape t,
tv v
WHERE t.id = v.tape_id
GROUP BY id, tape_name;
But that's a pain to do every time! Instead,
CREATE VIEW tape_view AS
SELECT id, tape_name, sum(runtime)
FROM tape t,
show s
WHERE t.id = s.tape_id
GROUP BY id, tape_name;
Which means you can just
SELECT * FROM tape_view;
to see that same answer. Yes, it's more computationally intensive
than doing the trigger and having a regular SELECT, but, for most of
us
most of the time, the speed difference here is not a big deal at all.
[ Back to your question, Mike ]
I'd let your judge be
(a) how often do your insert/update data
(ie how often will your slow TRIGGER run?)
versus
(b) how often do you need to SELECt the
answer (ie how often does semi-slow VIEW run?)
considering
(c) how critical is it that the answer be absolutely
correct (in case trigger crashes, is disabled, etc.)
Of course, with a trigger, you could check that the total running
time doesn't exceed the tape running time, etc. Those things could
be done otherwise (by checking in the trigger, even if you weren't
storing the total), but since you're already in the trigger and already
know the total, there wouldn't be much speed loss.
Most of the time, I'd choose to calculate, unless speed for SELECTs
was really the primary issue.
--
If you do store the totals, make sure that the total field can't be
edited! That would be a critical mistake. The contrib/ directory of
PostgreSQL has an addon called (I think) noupdate, which can be
used to prevent updates to one column. Also, just be careful with
your trigger--if you DROP the table and reCREATE it, remember to
recreate your trigger, etc. It can be easy to forget these!
You might also want to run a SELECT query now-and-then to
compare your computed-as-you-go totals with a SUM() query, as
above, to make sure triggers are 100% working.
--
General advice about when to normalize, etc., can be found in a
good database theory book. I have two:
* SQL for Smarties (Celko). [terse and not always 100% clear, but
many helpful ideas about SQL and database design.]
* Introduction to Database Systems (Date). [hardly a true
beginners intro, but a very good book IMO on database theory.]
I'm not sure I'd say either book was perfect; others may have
better recommendations.
Usually for me, the choice of how to structure data is about 5%
space consideration, 15% speed, and about 70% minimizing
duplicates/data mismatches/ensure query ability, and 10% easiest-
to-do.
Good luck,
<nofill>
--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)