Thread: how to avoid repeating expensive computation in select

how to avoid repeating expensive computation in select

From
Bob Price
Date:
I have been searching through the docs and mailing list and haven't found a way to do this, so I thought I would ask
thecommunity. 

I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where the
resultis needed both as a returned value and as an expression in the WHERE clause. 

As a simple example, consider the following query on a table with 'id' and 'value' columns, and an expensive
computationrepresented as a function: 

  SELECT id, expensivefunc(value) AS score FROM mytable
     WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5;

It would be great if I could find a way to only compute expensivefunc(value) at most once per row, and not at all if
theother WHERE constraints are not satisfied. 

For this simple case I know that I could rewrite the SELECT as something like the following:

WITH other_where AS (
    SELECT id, value FROM mytable WHERE id LIKE '%z%'
  ), calc_scores AS (
    SELECT id, expensivefunc(value) AS score FROM other_where
  )
SELECT id, score from calc_scores WHERE score > 0.5;

This works in this simple case, but my guess is that it probably adds a lot of overhead (is this true?), and I also
haveto deal with much more complicated scenarios with multiple expensive calculations that may not fit into this kind
ofrewrite. 

Does anyone know of a simpler way to accomplish this?

For example, it would be great if there were a function that could reference the Nth select list item so it is only
computedonce, like: 

  SELECT id, expensivefunc(value) AS score FROM mytable
     WHERE id LIKE '%z%' AND sel_list_item(2) > 0.5;

or if there were temporary variables in the WHERE expressions like:

  SELECT id, tmp1 AS score FROM mytable
     WHERE id LIKE '%z%' AND (tmp1 = expensivefunc(value)) > 0.5;

Any ideas anyone!

Thanks in advance!
Bob





Re: how to avoid repeating expensive computation in select

From
Bill Moran
Date:
In response to Bob Price <rjp_email@yahoo.com>:

> I have been searching through the docs and mailing list and haven't found a way to do this, so I thought I would ask
thecommunity. 
>
> I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where
theresult is needed both as a returned value and as an expression in the WHERE clause. 
>
> As a simple example, consider the following query on a table with 'id' and 'value' columns, and an expensive
computationrepresented as a function: 
>
>   SELECT id, expensivefunc(value) AS score FROM mytable
>      WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5;
>
> It would be great if I could find a way to only compute expensivefunc(value) at most once per row, and not at all if
theother WHERE constraints are not satisfied. 

Two ways that I can think of:
1) If expensivefunc() doesn't have any side-effects, you can create it
   as IMMUTABLE, which tells PostgreSQL that it can cache the result
   for optimization purposes.  IMMUTABLE is not the default.
2) Create a new column in the table that stores the value of
   expensivefunc(value) and add a trigger to the table to ensure that
   column is updated any time value is changed.  This will slow down
   inserts and updates a bit, but it means you can select/compare the
   generated column directly with no calculation.

Which one of these is more practical for you depends on a number of
factors about the table, the data, and the function.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: how to avoid repeating expensive computation in select

From
"David Johnston"
Date:
Is using a pl/pgsql function a viable option.  Within or without the use of
a function you can create a temporary table to hold the needed intermediate
results.  You can even use a permanent working table and write functions to
perform the needed queries against it.

Especially for expensive calculation you want to consider whether it is
safe/reasonable to pre-calculate and store values instead of running the
calculation during each query.

If you need procedural language capabilities (variables, multiple uses of
the same data) trying to work out a solution in pure transactional SQL can
be difficult or outright impossible; you really need to use the procedural
facilities built into the server OR your application environment.  In other
words put down the hammer and go find yourself a chainsaw :)

David J

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bob Price
Sent: Thursday, February 03, 2011 12:18 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] how to avoid repeating expensive computation in select

I have been searching through the docs and mailing list and haven't found a
way to do this, so I thought I would ask the community.

I would like to know if there is a way in PostgreSQL to avoid repeating an
expensive computation in a SELECT where the result is needed both as a
returned value and as an expression in the WHERE clause.

As a simple example, consider the following query on a table with 'id' and
'value' columns, and an expensive computation represented as a function:

  SELECT id, expensivefunc(value) AS score FROM mytable
     WHERE id LIKE '%z%' AND expensivefunc(value) > 0.5;

It would be great if I could find a way to only compute expensivefunc(value)
at most once per row, and not at all if the other WHERE constraints are not
satisfied.

For this simple case I know that I could rewrite the SELECT as something
like the following:

WITH other_where AS (
    SELECT id, value FROM mytable WHERE id LIKE '%z%'
  ), calc_scores AS (
    SELECT id, expensivefunc(value) AS score FROM other_where
  )
SELECT id, score from calc_scores WHERE score > 0.5;

This works in this simple case, but my guess is that it probably adds a lot
of overhead (is this true?), and I also have to deal with much more
complicated scenarios with multiple expensive calculations that may not fit
into this kind of rewrite.

Does anyone know of a simpler way to accomplish this?

For example, it would be great if there were a function that could reference
the Nth select list item so it is only computed once, like:

  SELECT id, expensivefunc(value) AS score FROM mytable
     WHERE id LIKE '%z%' AND sel_list_item(2) > 0.5;

or if there were temporary variables in the WHERE expressions like:

  SELECT id, tmp1 AS score FROM mytable
     WHERE id LIKE '%z%' AND (tmp1 = expensivefunc(value)) > 0.5;

Any ideas anyone!

Thanks in advance!
Bob





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: how to avoid repeating expensive computation in select

From
Tom Lane
Date:
Bob Price <rjp_email@yahoo.com> writes:
> I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where
theresult is needed both as a returned value and as an expression in the WHERE clause. 

Use a subselect.  You might need OFFSET 0 to prevent the planner from
"flattening" the subselect, eg

    SELECT whatever FROM
      (SELECT *, expensivefunc(value) AS score FROM mytable OFFSET 0) ss
    WHERE id LIKE '%z%' AND score > 0.5;

Keep in mind that in the above formulation, expensivefunc will be
evaluated at rows that don't pass the LIKE test.  So you probably want
to push down as much as you can into the sub-select's WHERE clause.
The planner will not help you with that if you put in the OFFSET 0
optimization-fence.  It's a good idea to use EXPLAIN (or even better
EXPLAIN VERBOSE, if you're using >= 8.4) to confirm that you're getting
the plan you want.

            regards, tom lane

Re: how to avoid repeating expensive computation in select

From
Pavel Stehule
Date:
Hello

2011/2/3 Tom Lane <tgl@sss.pgh.pa.us>:
> Bob Price <rjp_email@yahoo.com> writes:
>> I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where
theresult is needed both as a returned value and as an expression in the WHERE clause. 
>
> Use a subselect.  You might need OFFSET 0 to prevent the planner from
> "flattening" the subselect, eg
>
>    SELECT whatever FROM
>      (SELECT *, expensivefunc(value) AS score FROM mytable OFFSET 0) ss
>    WHERE id LIKE '%z%' AND score > 0.5;
>
> Keep in mind that in the above formulation, expensivefunc will be
> evaluated at rows that don't pass the LIKE test.  So you probably want
> to push down as much as you can into the sub-select's WHERE clause.
> The planner will not help you with that if you put in the OFFSET 0
> optimization-fence.  It's a good idea to use EXPLAIN (or even better
> EXPLAIN VERBOSE, if you're using >= 8.4) to confirm that you're getting
> the plan you want.

What about to increase a COST value? Can it help?

Regards

Pavel


>
>                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: how to avoid repeating expensive computation in select

From
Bob Price
Date:
Thanks for all of the responses.

If the IMMUTABLE flag on a function does what the docs say then it might satisfy my needs.

Here is a more specific example of what I need to do, with a new custom data type and a new function:

- define new complex data type X
- create table mytable ( id varchar, value X )
- create function expensivefunc(X,X) which is implemented in C
- select id, expensivefunc(value, 'constantdata...'::X) as score
   from mytable where expensivefunc(value, 'constantdata...'::X) > 0.5;

If I set the COST of expensivefunc high, and label it IMMUTABLE, will the query executor note that the two invocations
toexpensivefunc have the same inputs so it can only call it once and re-use the result the second time? 

I imagine that it might be a problem to pass 'constantdata...'::X to both invocations.  I guess that I could create a
one-timeuse function that declared a variable with this 'constantdata...'::X value, and then pass this variable in both
calls.

Would this work?

Thanks again!
Bob



--- On Thu, 2/3/11, Bill Moran <wmoran@potentialtech.com> wrote:

> From: Bill Moran <wmoran@potentialtech.com>
> Subject: Re: [GENERAL] how to avoid repeating expensive computation in select
> To: "Bob Price" <rjp_email@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Thursday, February 3, 2011, 12:23 PM
> In response to Bob Price <rjp_email@yahoo.com>:
>
> > I have been searching through the docs and mailing
> list and haven't found a way to do this, so I thought I
> would ask the community.
> >
> > I would like to know if there is a way in PostgreSQL
> to avoid repeating an expensive computation in a SELECT
> where the result is needed both as a returned value and as
> an expression in the WHERE clause.
> >
> > As a simple example, consider the following query on a
> table with 'id' and 'value' columns, and an expensive
> computation represented as a function:
> >
> >   SELECT id, expensivefunc(value) AS
> score FROM mytable
> >      WHERE id LIKE '%z%' AND
> expensivefunc(value) > 0.5;
> >
> > It would be great if I could find a way to only
> compute expensivefunc(value) at most once per row, and not
> at all if the other WHERE constraints are not satisfied.
>
> Two ways that I can think of:
> 1) If expensivefunc() doesn't have any side-effects, you
> can create it
>    as IMMUTABLE, which tells PostgreSQL that
> it can cache the result
>    for optimization purposes.
> IMMUTABLE is not the default.
> 2) Create a new column in the table that stores the value
> of
>    expensivefunc(value) and add a trigger to
> the table to ensure that
>    column is updated any time value is
> changed.  This will slow down
>    inserts and updates a bit, but it means
> you can select/compare the
>    generated column directly with no
> calculation.
>
> Which one of these is more practical for you depends on a
> number of
> factors about the table, the data, and the function.
>
> --
> Bill Moran
> http://www.potentialtech.com
> http://people.collaborativefusion.com/~wmoran/
>




Re: how to avoid repeating expensive computation in select

From
Nicklas Avén
Date:
That's interesting Tom.


This is a discussion coming up sometimes at PostGIS lists since PostGIS
often handles "expensive" calculations.

Regina wrote a blog post about it:
http://postgresonline.com/journal/archives/113-How-to-force-PostgreSQL-to-use-a-pre-calculated-value.html

I thought the "offset 0" trick was just a dirty hack, but coming from
you, Tom, I assume it is a robust way of doing it.

I also tried some of the queries we tried then, almost 2 years ago, and
I think it seems like PostgreSQL handles this much better in 9.0. Is
that possible?
What was strange then was that PostGIS functions marked immutable also
was recalculated between Select and where-clause and also if used
multiple times in the Select part.

But I think (from very few tests) that the result of the function was
reused in a better way now.

Can that be the case?

Thanks

Nicklas Avén


On Thu, 2011-02-03 at 13:16 -0500, Tom Lane wrote:
> Bob Price <rjp_email@yahoo.com> writes:
> > I would like to know if there is a way in PostgreSQL to avoid repeating an expensive computation in a SELECT where
theresult is needed both as a returned value and as an expression in the WHERE clause. 
>
> Use a subselect.  You might need OFFSET 0 to prevent the planner from
> "flattening" the subselect, eg
>
>     SELECT whatever FROM
>       (SELECT *, expensivefunc(value) AS score FROM mytable OFFSET 0) ss
>     WHERE id LIKE '%z%' AND score > 0.5;
>
> Keep in mind that in the above formulation, expensivefunc will be
> evaluated at rows that don't pass the LIKE test.  So you probably want
> to push down as much as you can into the sub-select's WHERE clause.
> The planner will not help you with that if you put in the OFFSET 0
> optimization-fence.  It's a good idea to use EXPLAIN (or even better
> EXPLAIN VERBOSE, if you're using >= 8.4) to confirm that you're getting
> the plan you want.
>
>             regards, tom lane
>



Re: how to avoid repeating expensive computation in select

From
Tom Lane
Date:
Bob Price <rjp_email@yahoo.com> writes:
> If I set the COST of expensivefunc high, and label it IMMUTABLE, will the query executor note that the two
invocationsto expensivefunc have the same inputs so it can only call it once and re-use the result the second time? 

No.  There is a myth prevalent among certain wishful thinkers that
IMMUTABLE does something like that, but it doesn't.  IMMUTABLE only
licenses the planner to fold a call *with constant arguments* into a
constant result, by executing the function once before the query
actually starts.  Textually distinct calls of a function are not folded
together in any case.

            regards, tom lane

Re: how to avoid repeating expensive computation in select

From
Tom Lane
Date:
Nicklas =?ISO-8859-1?Q?Av=E9n?= <nicklas.aven@jordogskog.no> writes:
> I thought the "offset 0" trick was just a dirty hack, but coming from
> you, Tom, I assume it is a robust way of doing it.

Well, I can't deny it's a dirty hack ... but it's not something we'll
break until we have a better solution.

> I also tried some of the queries we tried then, almost 2 years ago, and
> I think it seems like PostgreSQL handles this much better in 9.0. Is
> that possible?

That observation is too vague to comment on.  There are surely things
that are better in 9.0, but we haven't done anything lately that would
be likely to reduce the number of calls to a user-defined function per
se.

            regards, tom lane