Thread: how to avoid repeating expensive computation in select
I have been searching through the docs and mailing list and haven't found a way to do this, so I thought I would ask thecommunity. I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the resultis needed both as a returned value and as an expression in the WHERE clause. As a simple example, consider the following query on a table with 'id' and 'value' columns, and an expensive computationrepresented as a function: SELECT id, expensivefunc(value) AS score FROM mytable WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5; It would be great if I could find a way to only compute expensivefunc(value) at most once per row, and not at all if theother WHERE constraints are not satisfied. For this simple case I know that I could rewrite the SELECT as something like the following: WITH other_where AS ( SELECT id, value FROM mytable WHERE id LIKE '%z%' ), calc_scores AS ( SELECT id, expensivefunc(value) AS score FROM other_where ) SELECT id, score from calc_scores WHERE score > 0.5; This works in this simple case, but my guess is that it probably adds a lot of overhead (is this true?), and I also haveto deal with much more complicated scenarios with multiple expensive calculations that may not fit into this kind ofrewrite. Does anyone know of a simpler way to accomplish this? For example, it would be great if there were a function that could reference the Nth select list item so it is only computedonce, like: SELECT id, expensivefunc(value) AS score FROM mytable WHERE id LIKE '%z%' AND sel_list_item(2) > 0.5; or if there were temporary variables in the WHERE expressions like: SELECT id, tmp1 AS score FROM mytable WHERE id LIKE '%z%' AND (tmp1 = expensivefunc(value)) > 0.5; Any ideas anyone! Thanks in advance! Bob
One last question. Are there any pitfalls if I roll my own ability to check for duplicate calls? Since I am using my own defined data type, and my own function, I could do this by: 1. in my data type X, adding fields for: a table oid, a row oid, a copy of a reference to the last 2nd argument, and a copy of the last result (in my case a double) 2. in my function (which takes two X args), if the new table/row/ref-2nd-arg match the last data (saved in the first arg), then return the last result, otherwise compute the new result and save this info to use on the next call This should enable only a single computation of the data for any given row as long as the same args are used as parameters. But, is this safe, or have any pitfalls I would need to look out for? Thanks! Bob --- On Thu, 2/3/11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > From: Tom Lane <tgl@sss.pgh.pa.us> > Subject: Re: [GENERAL] how to avoid repeating expensive computation in select > To: "Bob Price" <rjp_email@yahoo.com> > Cc: "Bill Moran" <wmoran@potentialtech.com>, pgsql-general@postgresql.org > Date: Thursday, February 3, 2011, 5:33 PM > Bob Price <rjp_email@yahoo.com> > writes: > > If I set the COST of expensivefunc high, and label it > IMMUTABLE, will the query executor note that the two > invocations to expensivefunc have the same inputs so it can > only call it once and re-use the result the second time? > > No. There is a myth prevalent among certain wishful > thinkers that > IMMUTABLE does something like that, but it doesn't. > IMMUTABLE only > licenses the planner to fold a call *with constant > arguments* into a > constant result, by executing the function once before the > query > actually starts. Textually distinct calls of a > function are not folded > together in any case. > > > regards, tom lane >
On 2011-02-03 18:07, Bob Price wrote: > I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where theresult is needed both as a returned value and as an expression in the WHERE clause. I think I've seen it said here that PG avoids redundant multiple calculations of an expression. Even so, have you thought about using subqueries? > SELECT id, expensivefunc(value) AS score FROM mytable > WHERE id LIKE '%z%' AND expensivefunc(value)> 0.5; SELECT id, expensivefunc(value) FROM ( (SELECT id, value FROM mytable WHERE id LIKE '%z%') ) WHERE expensivefunc(value) > 0.5; or even SELECT id, score FROM ( SELECT id, expensivefunc(value) AS score FROM ( (SELECT id, value FROM mytable WHERE id LIKE '%z%') ) ) WHERE score > 0.5 -- Orhan Kavrakoğlu orhan@tart.com.tr Tart New Media w : http://www.tart.com.tr t : +90 212 263 0 666 / ext: 142 f : TBA a : TBA
On Tue, Mar 1, 2011 at 2:51 AM, Orhan Kavrakoglu <orhan@tart.com.tr> wrote: >> I would like to know if there is a way in PostgreSQL to avoid repeating an >> expensive computation in a SELECT where the result is needed both as a >> returned value and as an expression in the WHERE clause. > > I think I've seen it said here that PG avoids redundant multiple > calculations of an expression. > > Even so, have you thought about using subqueries? > >> SELECT id, expensivefunc(value) AS score FROM mytable >> WHERE id LIKE '%z%' AND expensivefunc(value)> 0.5; > > SELECT id, expensivefunc(value) FROM ( > (SELECT id, value FROM mytable WHERE id LIKE '%z%') > ) WHERE expensivefunc(value) > 0.5; > > or even > > SELECT id, score FROM ( > SELECT id, expensivefunc(value) AS score FROM ( > (SELECT id, value FROM mytable WHERE id LIKE '%z%') > ) > ) WHERE score > 0.5 you missed the point: even when you use subqueries postgres can inline them, 'unsubquerying' your query. I think the OP nailed probably the best and most logical approach -- use a CTE. It's more formal, and while not super efficient today, isn't terrible. merlin