Thread: running totals

running totals

From

Mike Castle

Date:

05 December 2000, 13:15:33

I'm seeking advise on best way to handle something.

I have a series of containers of a variety of fixed sizes and I want to
put items into them, each of which may be different sizes.  (Actually it's
the length of a video tape and how many hours of tv shows I can put onto
it, but it's a pretty general problem).

The question is:  is it better to keep a running total on the containers
and update it every time an item is put into/removed from the container or
is it better to do a select every time and recalculate the size of the
items in the containers?

Or is this one of those in the class:  it depends?

Currently I have a trigger plpgsql function that updates a tape by
adding/subtracting the show just added/deleted from that tape. Since
intervals are integeral (sp?) types, I'm not worried about round off
errors.  If the value was a float, I'd suppose one would want to a select
at least every so often to account for round off error.

Also, right now I have the running total as part of the object.  This means
that when the object is not in use, that's an extra field that isn't being
utilized.  Not an issue in this case since the containers are almost all in
use at any given time.  But, when a field of a row is updated, is the whole
row read from the database, updated, then written back out?  If so, would
it be that in some cases, it would be better to break out fields that are
updated often into their own table with a reference?  Which is more
efficient?

Thanks,
mrc  (Going book shopping tonight for good Tcl/TK and general SQL books)
--
       Mike Castle       Life is like a clock:  You can work constantly
  dalgoda@ix.netcom.com  and be right all the time, or not work at all
www.netcom.com/~dalgoda/ and be right at least twice a day.  -- mrc
    We are all of us living in the shadow of Manhattan.  -- Watchmen

Re: running totals

From

"Joel Burton"

Date:

05 December 2000, 16:03:30

From:               <color><param>0000,0000,8000</param>Joel Burton <<jburton@scw.org></color>

To:                 <color><param>0000,0000,8000</param>Mike Castle <<dalgoda@ix.netcom.com></color>

<bold>Subject:            <color><param>0000,0000,8000</param>Re: [NOVICE] running totals</bold></color>

Date sent:          <color><param>0000,0000,8000</param>Tue, 5 Dec 2000 16:01:05 -0500</color>


On 5 Dec 2000, at 11:54, Mike Castle wrote:


<color><param>7F00,0000,0000</param>>

> I'm seeking advise on best way to handle something.

>

> I have a series of containers of a variety of fixed sizes and I want
to

> put items into them, each of which may be different sizes.
(Actually

> it's the length of a video tape and how many hours of tv shows I
can put

> onto it, but it's a pretty general problem).

>

> The question is:  is it better to keep a running total on the

> containers and update it every time an item is put into/removed
from the

> container or is it better to do a select every time and recalculate
the

> size of the items in the containers?

>

> Or is this one of those in the class:  it depends?

>

> Currently I have a trigger plpgsql function that updates a tape by

> adding/subtracting the show just added/deleted from that tape.
Since

> intervals are integeral (sp?) types, I'm not worried about round off

> errors.  If the value was a float, I'd suppose one would want to a

> select at least every so often to account for round off error.

>

> Also, right now I have the running total as part of the object.  This

> means that when the object is not in use, that's an extra field that

> isn't being utilized.  Not an issue in this case since the containers

> are almost all in use at any given time.  But, when a field of a row
is

> updated, is the whole row read from the database, updated, then
written

> back out?  If so, would it be that in some cases, it would be
better to

> break out fields that are updated often into their own table with a

> reference?  Which is more efficient?

>

> Thanks,

> mrc  (Going book shopping tonight for good Tcl/TK and general SQL

> books) --

>        Mike Castle       Life is like a clock:  You can work

>        constantly

>
<underline><color><param>0000,8000,0000</param>dalgoda@ix.netcom.com</underline><color><param>7F00,0000,0000</param>
andbe right all the time, or not work  
at all

> www.netcom.com/~dalgoda/ and be right at least twice a day.  --
 mrc

>     We are all of us living in the shadow of Manhattan.  --
Watchmen


</color>Like all good things, this can be an "it depends."


Generally speaking (!), it's a better idea not to store calculated

values in your database b/c if the calculation doesn't happen

(trigger was disabled during testing, etc.), your numbers would be

off, and you might never notice. In addition, you might want to

write a query that looks for *certain* totals (such as only action

films), so a total-all trigger might only be useful for some cases

anyway.


[ Mike -- I'm sure if you're creating triggers, you know basics about

VIEWs. But, since this is pgsql-novice, let me give other some
background.

Skip down to see more about your question]


If your database doesn't handle VIEWs, you're always stuck having

to do a complicated SELECT statement to find the most recent

count of shows on a tape. However, w/a database like PostgreSQL,

you can create a VIEW that shows the tape info, as well as total

shows.


For ex:



CREATE TABLE tape (

   id INT NOT NULL PRIMARY KEY,

   tape_name TEXT NOT NULL UNIQUE

);


-- assume that shows one-half-hour

-- is runtime '1', one-hour=runtime='2'

-- (this could be floats, or better still,

-- intervals, etc., but for simplicity:)


CREATE TABLE tv (

   title TEXT PRIMARY KEY,

   runtime INT NOT NULL

      CHECK (runtime BETWEEN 1 AND 10),

   tape_id INT NOT NULL REFERENCES tape

);


INSERT INTO tape VALUES (1, 'Simpsons');

INSERT INTO tape VALUES (2, 'Crime shows');


INSERT INTO tv VALUES ('Simpsons 3-eyed fish episode', 1, 1);

INSERT INTO tv VALUES ('Simpsons Lisa''s Crush on Nelson episode',

1, 1);

INSERT INTO tv VALUES ('Law & Order Episode 23', 1, 2);


You could always


SELECT id, tape_name, sum(runtime)

FROM tape t,

   tv v

WHERE t.id = v.tape_id

GROUP BY id, tape_name;


But that's a pain to do every time! Instead,


CREATE VIEW tape_view AS

SELECT id, tape_name, sum(runtime)

FROM tape t,

   show s

WHERE t.id = s.tape_id

GROUP BY id, tape_name;


Which means you  can just


SELECT * FROM tape_view;


to see that same answer. Yes, it's more computationally intensive

than doing the trigger and having a regular SELECT, but, for most of
us

most of the time, the speed difference here is not a big deal at all.



[ Back to your question, Mike ]


I'd let your judge be


(a) how often do your insert/update data

(ie how often will your slow TRIGGER run?)


versus


(b) how often do you need to SELECt the

answer (ie how often does semi-slow VIEW run?)


considering


(c) how critical is it that the answer be absolutely

correct (in case trigger crashes, is disabled, etc.)


Of course, with a trigger, you could check that the total running

time doesn't exceed the tape running time, etc. Those things could

be done otherwise (by checking in the trigger, even if you weren't

storing the total), but since you're already in the trigger and already

know the total, there wouldn't be much speed loss.



Most of the time, I'd choose to calculate, unless speed for SELECTs

was really the primary issue.


--


If you do store the totals, make sure that the total field can't be

edited! That would be a critical mistake. The contrib/ directory of

PostgreSQL has an addon called (I think) noupdate, which can be

used to prevent updates to one column. Also, just be careful with

your trigger--if you DROP the table and reCREATE it, remember to

recreate your trigger, etc. It can be easy to forget these!


You might also want to run a SELECT query now-and-then to

compare your computed-as-you-go totals with a SUM() query, as

above, to make sure triggers are 100% working.


--


General advice about when to normalize, etc., can be found in a

good database theory book. I have two:


* SQL for Smarties (Celko). [terse and not always 100% clear, but

many helpful ideas about SQL and database design.]


* Introduction to Database Systems (Date). [hardly a true

beginners intro, but a very good book IMO on database theory.]


I'm not sure I'd say either book was perfect; others may have

better recommendations.


Usually for me, the choice of how to structure data is about 5%

space consideration, 15% speed, and about 70% minimizing

duplicates/data mismatches/ensure query ability, and 10% easiest-

to-do.


Good luck,


<nofill>
--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)

RE: running totals

From

"Joel Burton"

Date:

06 December 2000, 19:57:07

> >CREATE VIEW tape_view AS
> >SELECT id, tape_name, sum(runtime)
> >FROM tape t,
> >show s
> >WHERE t.id = s.tape_id
> >GROUP BY id, tape_name;
> >
> >Which means you can just
 >
> >SELECT * FROM tape_view;
>
> Wow! This is a great idea... I had never thought of using a view
for
> this type of calculation.  I always did it the hard way.  Question
> though, does the VIEW keep the sum current at all times, or does
it
> just update when a query is run on the VIEW?


The view *IS* a query, really. When you SELECT from the VIEW,
you're really running a SELECT. So, no, merely updating in the the
table doesn't affect the view, only when you select from the view.



--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)

Re: running totals

From

Tom Lane

Date:

06 December 2000, 21:51:26

"Joel Burton" <jburton@scw.org> writes:
>>>> CREATE VIEW tape_view AS
>>>> SELECT id, tape_name, sum(runtime)
>>>> FROM tape t,
>>>> show s
>>>> WHERE t.id = s.tape_id
>>>> GROUP BY id, tape_name;
>>>>
>>>> Which means you can just
>>
>>>> SELECT * FROM tape_view;

One caveat here: grouped views are pretty flaky in 7.0.* and before,
because the system doesn't support multiple levels of grouping.
For example, you might think you could do
    SELECT avg(runtime) FROM tape_view;
but it won't work.  This should work in 7.1, however.

            regards, tom lane