Thread: Calculated values
Greetings! What is the 'best way' or 'guiding philosophy' if there is one for dealing with calculated values in a pg database? For example, say you have a table with a column of floats, and you want to make a view showing this column as a percent of the total across the column. Should you a) create another table with the total, and put on triggers on insert, update, and delete to modify the right total or b) create the view with a specific subselect to recalculate the total at select time. This has the disadvantage that the total seems to be recalculated for each row. Is there any sql syntax which can merge a dynamically generated aggregate, *calculated only once*, into each output row? Thanks! -- Camm Maguire camm@enhanced.com ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah
Camm Maguire wrote: > > Greetings! What is the 'best way' or 'guiding philosophy' if there is > one for dealing with calculated values in a pg database? For example, > say you have a table with a column of floats, and you want to make a > view showing this column as a percent of the total across the column. > Should you > > a) create another table with the total, and put on triggers on insert, > update, and delete to modify the right total or > > b) create the view with a specific subselect to recalculate the total > at select time. This has the disadvantage that the total seems to be > recalculated for each row. Is there any sql syntax which can merge a > dynamically generated aggregate, *calculated only once*, into each > output row? Hmmm ... You want to avoid "remerging", if I follow you ... Coud you try : CREATE VIEW my view AS SELECT id, partialsum, (partialsum/totalsum) AS percentage FROM (SELECT id, SUM(item) AS partialsum GROUP BY id) JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause That way, totalsum should be computed once. "Caching" computations in a secondary table is a (somewhat hidden) form of redundancy, and therefore a nice way to ask for trouble ... Hope this helps ... -- Emmanuel Charpentier
Greetings, and thanks for your reply! Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes: > Camm Maguire wrote: > > > > Greetings! What is the 'best way' or 'guiding philosophy' if there is > > one for dealing with calculated values in a pg database? For example, > > say you have a table with a column of floats, and you want to make a > > view showing this column as a percent of the total across the column. > > Should you > > > > a) create another table with the total, and put on triggers on insert, > > update, and delete to modify the right total or > > > > b) create the view with a specific subselect to recalculate the total > > at select time. This has the disadvantage that the total seems to be > > recalculated for each row. Is there any sql syntax which can merge a > > dynamically generated aggregate, *calculated only once*, into each > > output row? > > Hmmm ... You want to avoid "remerging", if I follow you ... > > Coud you try : > > CREATE VIEW my view AS > SELECT id, partialsum, (partialsum/totalsum) AS percentage > FROM (SELECT id, SUM(item) AS partialsum GROUP BY id) > JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause > I can't seem to get this syntax to work with pg. No subselects seem to be accepted in the from clause, and join doesn't seem to be a reserved word at all. But this kind of thing is exactly what I'm looking for. Another idea is to write a plpgsql function which returns tuples of a given table. Only problem here appears to be that to access the columns of the result, one has to do something like 'select id(function),percentage(function),...' which then calls the function multiple times for each column! Maybe this is better done outside the db. Any advice most appreciated! > That way, totalsum should be computed once. > > "Caching" computations in a secondary table is a (somewhat hidden) form > of redundancy, and therefore a nice way to ask for trouble ... > > Hope this helps ... > > -- > Emmanuel Charpentier > > -- Camm Maguire camm@enhanced.com ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah
Camm Maguire <camm@enhanced.com> writes: > Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes: >> CREATE VIEW my view AS >> SELECT id, partialsum, (partialsum/totalsum) AS percentage >> FROM (SELECT id, SUM(item) AS partialsum GROUP BY id) >> JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause > I can't seem to get this syntax to work with pg. No subselects seem > to be accepted in the from clause, and join doesn't seem to be a > reserved word at all. Sounds like you are trying to do it in 7.0 or before. Emmanuel is relying on 7.1 features --- and the example won't work as given anyway, since (a) the subselects neglect to specify source tables; (b) you have to write CROSS JOIN not JOIN if you want to omit ON/USING. In 7.0 you could accomplish the same thing with temp tables, or more straightforwardly by something like SELECT id, SUM(item) AS partialsum, SUM(item) / (SELECT SUM(item) FROM table) AS percentage FROM table GROUP BY id This relies for efficiency on the poorly-documented fact that the sub-select will only be evaluated once, since it has no dependency on the state of the outer select. (You can check this by seeing that EXPLAIN shows the subselect as an InitPlan not a SubPlan.) regards, tom lane
Greetings, and thanks so much for your reply! Tom Lane <tgl@sss.pgh.pa.us> writes: > Camm Maguire <camm@enhanced.com> writes: > > Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes: > >> CREATE VIEW my view AS > >> SELECT id, partialsum, (partialsum/totalsum) AS percentage > >> FROM (SELECT id, SUM(item) AS partialsum GROUP BY id) > >> JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause > > > I can't seem to get this syntax to work with pg. No subselects seem > > to be accepted in the from clause, and join doesn't seem to be a > > reserved word at all. > > Sounds like you are trying to do it in 7.0 or before. Emmanuel is > relying on 7.1 features --- and the example won't work as given anyway, > since (a) the subselects neglect to specify source tables; (b) you > have to write CROSS JOIN not JOIN if you want to omit ON/USING. > Thanks! Indeed, I'm using 7.0.3. > In 7.0 you could accomplish the same thing with temp tables, or more > straightforwardly by something like > > SELECT id, > SUM(item) AS partialsum, > SUM(item) / (SELECT SUM(item) FROM table) AS percentage > FROM table > GROUP BY id > > This relies for efficiency on the poorly-documented fact that the > sub-select will only be evaluated once, since it has no dependency > on the state of the outer select. (You can check this by seeing that > EXPLAIN shows the subselect as an InitPlan not a SubPlan.) Thanks again. Alas, the 'explain' for me still shows the subplan, and I can see why. Here is my view definition: create view csp2 as SELECT c1.asof, c1.ticker, c1.sp, c1.price AS p1, (((100 * float8(c1.sp)) * c1.price) / (select sum(price*sp) from sprices,sinfo where sprices.sinfo = sinfo.sinfo and sprices.asof = c1.asof)) AS wt, c2.price AS p2, c2.split, c1.div, (100 * c1.ret) FROM csp1 c1, csp1 c2, dates WHERE ((((c1.asof = dates.asof)) AND (c2.asof = dates.nasof)) AND (c1.id = c2.id)); What is obviously doing this is the 'sprices.asof = c1.asof' dependency between the inner and outer select. Trouble is, my only intention is to be able to use this view with a constant, albeit 'runtime-selectable', 'asof' or date, as in select * from csp where asof = '20000103'; Any other suggestions? This dependency issue slows the above query down significantly, resulting in the calculation of the same sum ~ 1000 times. Do you also agree with the previous respondent that trying to have a table of sums, updated dynamically with triggers, is not a good idea? I'm trying to find the right philosophy to the design of this db, and am ending up searching for a happy medium between select speed and insert complexity. > > regards, tom lane > > Thanks again, -- Camm Maguire camm@enhanced.com ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah
hello. this might be as much of a general database question as it is a postgres question... i have a table with 5 columns...a primary key (integer), three small (10 character) text fields, and one semi-large (1400 characters) text field. note that only a small percentage (5% ?) of the rows contain 1400 characters in the 5th column...the other 95% have approx 10 characters. it has 1100 rows. the problem is this - queries (command line) often take a very long time - anywhere from 5-15 seconds - to execute. the queries use only the primary key and nothing else in the where clause. no joins. a sample query is: select * from weather where weatherid = 12372; from the command line, it seems that the first query can take a very long time but subsequent queries happen quickly ( < 1 sec). i'm guessing that this is the result of caching or something. do the long times make sense? what can i do to shorten them? would a smaller text field help? i have no reason to think that this would be faster or slower in another db, so it might be unrelated to postgres itself. i greatly appreciate your help. chris
First thing to try is to do explain select * from weather where weatherid = 12372; and see if it's doing a sequential scan on your table. If it is, the index and table stats may be out of date and I think you fix those using vacuum analyze. One other thing is that if you only have 1100 rows of the sizes you describe, even a table scan shouldn't take as long as you say unless the machine is either under serious load or doesn't have enough memory to run postgres without swapping. (or the disk is freaking out with io errors) Alex. On Mon, 29 Jan 2001, chris markiewicz wrote: > hello. > > this might be as much of a general database question as it is a postgres > question... > > i have a table with 5 columns...a primary key (integer), three small (10 > character) text fields, and one semi-large (1400 characters) text field. > note that only a small percentage (5% ?) of the rows contain 1400 characters > in the 5th column...the other 95% have approx 10 characters. it has 1100 > rows. > > the problem is this - queries (command line) often take a very long time - > anywhere from 5-15 seconds - to execute. the queries use only the primary > key and nothing else in the where clause. no joins. a sample query is: > > select * from weather where weatherid = 12372; > > from the command line, it seems that the first query can take a very long > time but subsequent queries happen quickly ( < 1 sec). i'm guessing that > this is the result of caching or something. > > do the long times make sense? what can i do to shorten them? would a > smaller text field help? i have no reason to think that this would be > faster or slower in another db, so it might be unrelated to postgres itself. > > i greatly appreciate your help. > > chris >
* adb <adb@Beast.COM> [010129 13:40] wrote: > > One other thing is that if you only have 1100 rows of the sizes > you describe, even a table scan shouldn't take as long as you > say unless the machine is either under serious load or doesn't > have enough memory to run postgres without swapping. (or the disk is > freaking out with io errors) Increasing the amount of shared memory for postgresql can help.
vacuum analyze seems to have done the trick...this is the sort of thing that happens when a non-dba is doing dba work... sincerest thanks to all that responded! chris -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of adb Sent: Monday, January 29, 2001 4:36 PM To: chris markiewicz Cc: pgsql-general@postgresql.org Subject: Re: performance... First thing to try is to do explain select * from weather where weatherid = 12372; and see if it's doing a sequential scan on your table. If it is, the index and table stats may be out of date and I think you fix those using vacuum analyze. One other thing is that if you only have 1100 rows of the sizes you describe, even a table scan shouldn't take as long as you say unless the machine is either under serious load or doesn't have enough memory to run postgres without swapping. (or the disk is freaking out with io errors) Alex. On Mon, 29 Jan 2001, chris markiewicz wrote: > hello. > > this might be as much of a general database question as it is a postgres > question... > > i have a table with 5 columns...a primary key (integer), three small (10 > character) text fields, and one semi-large (1400 characters) text field. > note that only a small percentage (5% ?) of the rows contain 1400 characters > in the 5th column...the other 95% have approx 10 characters. it has 1100 > rows. > > the problem is this - queries (command line) often take a very long time - > anywhere from 5-15 seconds - to execute. the queries use only the primary > key and nothing else in the where clause. no joins. a sample query is: > > select * from weather where weatherid = 12372; > > from the command line, it seems that the first query can take a very long > time but subsequent queries happen quickly ( < 1 sec). i'm guessing that > this is the result of caching or something. > > do the long times make sense? what can i do to shorten them? would a > smaller text field help? i have no reason to think that this would be > faster or slower in another db, so it might be unrelated to postgres itself. > > i greatly appreciate your help. > > chris >
You have weatherid indexed, right? If not, index it and run VACUUM ANALYZE -- if so make then sure you've run VACUUM ANALYZE after you created the index.. Also, you might want to EXPLAIN that query to get the query plan (just type EXPLAIN <query>)if the above doesn't work. Post the ouput of EXPLAIN along with the schema of your tables and perhaps someone can help some more.. -Mitch On Mon, 29 Jan 2001, chris markiewicz wrote: > hello. > > this might be as much of a general database question as it is a postgres > question... > > i have a table with 5 columns...a primary key (integer), three small (10 > character) text fields, and one semi-large (1400 characters) text field. > note that only a small percentage (5% ?) of the rows contain 1400 characters > in the 5th column...the other 95% have approx 10 characters. it has 1100 > rows. > > the problem is this - queries (command line) often take a very long time - > anywhere from 5-15 seconds - to execute. the queries use only the primary > key and nothing else in the where clause. no joins. a sample query is: > > select * from weather where weatherid = 12372; > > from the command line, it seems that the first query can take a very long > time but subsequent queries happen quickly ( < 1 sec). i'm guessing that > this is the result of caching or something. > > do the long times make sense? what can i do to shorten them? would a > smaller text field help? i have no reason to think that this would be > faster or slower in another db, so it might be unrelated to postgres itself. > > i greatly appreciate your help. > > chris > >