Thread: how to avoid repeating expensive computation in select

how to avoid repeating expensive computation in select

From

Bob Price

Date:

03 February 2011, 18:35:39

I have been searching through the docs and mailing list and haven't found a way to do this, so I thought I would ask
thecommunity. 

I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the
resultis needed both as a returned value and as an expression in the WHERE clause. 

As a simple example, consider the following query on a table with 'id' and 'value' columns, and an expensive
computationrepresented as a function: 

  SELECT id, expensivefunc(value) AS score FROM mytable
     WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5;

It would be great if I could find a way to only compute expensivefunc(value) at most once per row, and not at all if
theother WHERE constraints are not satisfied. 

For this simple case I know that I could rewrite the SELECT as something like the following:

WITH other_where AS (
    SELECT id, value FROM mytable WHERE id LIKE '%z%'
  ), calc_scores AS (
    SELECT id, expensivefunc(value) AS score FROM other_where
  )
SELECT id, score from calc_scores WHERE score > 0.5;

This works in this simple case, but my guess is that it probably adds a lot of overhead (is this true?), and I also
haveto deal with much more complicated scenarios with multiple expensive calculations that may not fit into this kind
ofrewrite. 

Does anyone know of a simpler way to accomplish this?

For example, it would be great if there were a function that could reference the Nth select list item so it is only
computedonce, like: 

  SELECT id, expensivefunc(value) AS score FROM mytable
     WHERE id LIKE '%z%' AND sel_list_item(2) > 0.5;

or if there were temporary variables in the WHERE expressions like:

  SELECT id, tmp1 AS score FROM mytable
     WHERE id LIKE '%z%' AND (tmp1 = expensivefunc(value)) > 0.5;

Any ideas anyone!

Thanks in advance!
Bob

Re: how to avoid repeating expensive computation in select

From

Bob Price

Date:

04 February 2011, 15:08:31

One last question.  Are there any pitfalls if I roll my own ability to check for duplicate calls?

Since I am using my own defined data type, and my own function, I could
do this by:

1. in my data type X, adding fields for: a table oid, a row oid, a copy
of a reference to the last 2nd argument, and a copy of the last result
(in my case a double)

2. in my function (which takes two X args), if the new
table/row/ref-2nd-arg match the last data (saved in the first arg), then
return the last result, otherwise compute the new result and save this
info to use on the next call

This should enable only a single computation of the data for any given
row as long as the same args are used as parameters.

But, is this safe, or have any pitfalls I would need to look out for?

Thanks!
Bob

--- On Thu, 2/3/11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> From: Tom Lane <tgl@sss.pgh.pa.us>
> Subject: Re: [GENERAL] how to avoid repeating expensive computation in select
> To: "Bob Price" <rjp_email@yahoo.com>
> Cc: "Bill Moran" <wmoran@potentialtech.com>, pgsql-general@postgresql.org
> Date: Thursday, February 3, 2011, 5:33 PM
> Bob Price <rjp_email@yahoo.com>
> writes:
> > If I set the COST of expensivefunc high, and label it
> IMMUTABLE, will the query executor note that the two
> invocations to expensivefunc have the same inputs so it can
> only call it once and re-use the result the second time?
>
> No.  There is a myth prevalent among certain wishful
> thinkers that
> IMMUTABLE does something like that, but it doesn't.
> IMMUTABLE only
> licenses the planner to fold a call *with constant
> arguments* into a
> constant result, by executing the function once before the
> query
> actually starts.  Textually distinct calls of a
> function are not folded
> together in any case.
>
>
> regards, tom lane
>

Re: how to avoid repeating expensive computation in select

From

Orhan Kavrakoglu

Date:

01 March 2011, 07:57:55

On 2011-02-03 18:07, Bob Price wrote:
> I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where
theresult is needed both as a returned value and as an expression in the WHERE clause. 

I think I've seen it said here that PG avoids redundant multiple
calculations of an expression.

Even so, have you thought about using subqueries?

>    SELECT id, expensivefunc(value) AS score FROM mytable
>       WHERE id LIKE '%z%' AND expensivefunc(value)>  0.5;

SELECT id, expensivefunc(value) FROM (
(SELECT id, value FROM mytable WHERE id LIKE '%z%')
) WHERE expensivefunc(value) > 0.5;

or even

SELECT id, score FROM (
SELECT id, expensivefunc(value) AS score FROM (
(SELECT id, value FROM mytable WHERE id LIKE '%z%')
)
) WHERE score > 0.5

--
Orhan Kavrakoğlu
orhan@tart.com.tr

Tart New Media
w : http://www.tart.com.tr
t : +90 212 263 0 666 / ext: 142
f : TBA
a : TBA

Re: how to avoid repeating expensive computation in select

From

Merlin Moncure

Date:

02 March 2011, 22:50:33

On Tue, Mar 1, 2011 at 2:51 AM, Orhan Kavrakoglu <orhan@tart.com.tr> wrote:
>> I would like to know if there is a way in PostgreSQL to avoid repeating an
>> expensive computation in a SELECT where the result is needed both as a
>> returned value and as an expression in the WHERE clause.
>
> I think I've seen it said here that PG avoids redundant multiple
> calculations of an expression.
>
> Even so, have you thought about using subqueries?
>
>>   SELECT id, expensivefunc(value) AS score FROM mytable
>>      WHERE id LIKE '%z%' AND expensivefunc(value)>  0.5;
>
> SELECT id, expensivefunc(value) FROM (
> (SELECT id, value FROM mytable WHERE id LIKE '%z%')
> ) WHERE expensivefunc(value) > 0.5;
>
> or even
>
> SELECT id, score FROM (
> SELECT id, expensivefunc(value) AS score FROM (
> (SELECT id, value FROM mytable WHERE id LIKE '%z%')
> )
> ) WHERE score > 0.5

you missed the point: even when you use subqueries postgres can inline
them, 'unsubquerying' your query.  I think the OP nailed probably the
best and most logical approach -- use a CTE.  It's more formal, and
while not super efficient today, isn't terrible.

merlin