Thread: using calculated column in where-clause

using calculated column in where-clause

From
Patrick Scharrenberg
Date:
Hi!

I'd like to do some calculation with values from the table, show them a
new column and use the values in a where-clause.

Something like this
select a, b , a*b as c from ta where c=2;

But postgresql complains, that column "c" does not exist.

Do I have to repeat the calculation (which might be even more complex
:-) ) in the "where"-clause, or is there a better way?


Thanks in advance.

Best regards
Patrick


Re: using calculated column in where-clause

From
Andreas Kretschmer
Date:
Patrick Scharrenberg <pittipatti@web.de> schrieb:
> Something like this
> select a, b , a*b as c from ta where c=2;
> 
> But postgresql complains, that column "c" does not exist.
> 
> Do I have to repeat the calculation (which might be even more complex

yes.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: using calculated column in where-clause

From
"Scott Marlowe"
Date:
On Tue, Jun 17, 2008 at 2:46 PM, Patrick Scharrenberg <pittipatti@web.de> wrote:
> Hi!
>
> I'd like to do some calculation with values from the table, show them a
> new column and use the values in a where-clause.
>
> Something like this
> select a, b , a*b as c from ta where c=2;
>
> But postgresql complains, that column "c" does not exist.
>
> Do I have to repeat the calculation (which might be even more complex
> :-) ) in the "where"-clause, or is there a better way?

Pretty much yes.   Trying to do tricks using subselects may result in
substandard performing query plans.  You can always do something like:

select * from (select a,b,a*b as c) as z where z.c = 2 but if it's
slower don't blame me.

The nice thing here is that you can index on that function, which is
the real issue with performance, since otherwise you'll likely see a
sequential scan every time.

create index ta_atimesb on ta ((a*b));

and from then on the query should run pretty fast. That's really more
important than if you have to put it twice on the same query line.


Re: using calculated column in where-clause

From
Patrick Scharrenberg
Date:
Andreas Kretschmer wrote:

>> Do I have to repeat the calculation (which might be even more complex
> yes.

Short and pregnant! :-)

Thanks!
Patrick


Re: using calculated column in where-clause

From
"Fernando Hevia"
Date:
> -----Mensaje original-----
> De: pgsql-sql-owner@postgresql.org 
> [mailto:pgsql-sql-owner@postgresql.org] En nombre de Patrick 
> Scharrenberg
> Enviado el: Martes, 17 de Junio de 2008 17:46
> Para: pgsql-sql@postgresql.org
> Asunto: [SQL] using calculated column in where-clause
> 
> Hi!
> 
> I'd like to do some calculation with values from the table, 
> show them a new column and use the values in a where-clause.
> 
> Something like this
> select a, b , a*b as c from ta where c=2;
> 
> But postgresql complains, that column "c" does not exist.
> 
> Do I have to repeat the calculation (which might be even more complex
> :-) ) in the "where"-clause, or is there a better way?
> 

For complex calculations I have obtained better performance using nested
queries. For example:

select a, b, c select   ( select a, b, a*b as c from ta) subquery1
where c = 2;

This nesting is probably overhead in such a simple case as this, but in more
complex ones and specially with volatile functions it will provide an
improvement.

Regards,
Fernando.



Re: using calculated column in where-clause

From
"Scott Marlowe"
Date:
On Wed, Jun 18, 2008 at 1:35 PM, Fernando Hevia <fhevia@ip-tel.com.ar> wrote:
>
>> -----Mensaje original-----
>> De: pgsql-sql-owner@postgresql.org
>> [mailto:pgsql-sql-owner@postgresql.org] En nombre de Patrick
>> Scharrenberg
>> Enviado el: Martes, 17 de Junio de 2008 17:46
>> Para: pgsql-sql@postgresql.org
>> Asunto: [SQL] using calculated column in where-clause
>>
>> Hi!
>>
>> I'd like to do some calculation with values from the table,
>> show them a new column and use the values in a where-clause.
>>
>> Something like this
>> select a, b , a*b as c from ta where c=2;
>>
>> But postgresql complains, that column "c" does not exist.
>>
>> Do I have to repeat the calculation (which might be even more complex
>> :-) ) in the "where"-clause, or is there a better way?
>>
>
> For complex calculations I have obtained better performance using nested
> queries. For example:
>
> select a, b, c select
>   ( select a, b, a*b as c from ta) subquery1
> where c = 2;
>
> This nesting is probably overhead in such a simple case as this, but in more
> complex ones and specially with volatile functions it will provide an
> improvement.

I was under the impresion from previous discussions that the query
planner flattened these out to be the same query.  Do you get
different query plans when you re-arrange this way?


Re: using calculated column in where-clause

From
"Fernando Hevia"
Date:
> -----Mensaje original-----
> De: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Enviado el: Miércoles, 18 de Junio de 2008 17:47
> Para: Fernando Hevia

> >
> > For complex calculations I have obtained better performance using
> > nested queries. For example:
> >
> > select a, b, c select
> >   ( select a, b, a*b as c from ta) subquery1 where c = 2;
> >
> > This nesting is probably overhead in such a simple case as
> this, but
> > in more complex ones and specially with volatile functions it will
> > provide an improvement.
>
> I was under the impresion from previous discussions that the
> query planner flattened these out to be the same query.  Do
> you get different query plans when you re-arrange this way?
>

Take a look at this example (tried on 8.2.7 & 8.1.11):

create or replace function test(p1 integer, p2 integer) returns integer[] as
$BODY$
declare  retval   integer[];
begin  raise info 'called test(%, %)', p1, p2;  retval[0] = p1 + p2;  retval[1] = p1 * p2;  retval[2] = p1 - p2;
returnretval; 
end;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

-- In this case function test is called three times:
pg=# select (test(1, 2))[0] as sum, (test(1, 2))[1] as prod, (test(1, 2))[2]
as dif;
INFO:  called test(1, 2)
INFO:  called test(1, 2)
INFO:  called test(1, 2)sum | prod | dif
-----+------+-----  3 |    2 |  -1
(1 row)


-- In this case function test is called only once:
pg=# select res[0] as sum, res[1] as prod, res[2] as dif from
pg-# (select (test(1, 2))::integer[] as res) t ;
INFO:  called test(1, 2)sum | prod | dif
-----+------+-----  3 |    2 |  -1
(1 row)

I assume the second form will perform better since test is being called only
once.
I might be missing something in this assumption but at first glance it seems
pretty straightforward.

Regards,
Fernando.



Re: using calculated column in where-clause

From
"Fernando Hevia"
Date:
> > -----Mensaje original-----
> > De: Scott Marlowe [mailto:scott.marlowe@gmail.com] Enviado el:
> > Miércoles, 18 de Junio de 2008 17:47
> > Para: Fernando Hevia
>
> > >
> > > For complex calculations I have obtained better performance using
> > > nested queries. For example:
> > >
> > > select a, b, c select
> > >   ( select a, b, a*b as c from ta) subquery1 where c = 2;
> > >
> > > This nesting is probably overhead in such a simple case as
> > this, but
> > > in more complex ones and specially with volatile
> functions it will
> > > provide an improvement.
> >
> > I was under the impresion from previous discussions that the query
> > planner flattened these out to be the same query.  Do you get
> > different query plans when you re-arrange this way?
> >
>
> Take a look at this example (tried on 8.2.7 & 8.1.11):
>
> create or replace function test(p1 integer, p2 integer)
> returns integer[] as $BODY$ declare
>    retval   integer[];
> begin
>    raise info 'called test(%, %)', p1, p2;
>    retval[0] = p1 + p2;
>    retval[1] = p1 * p2;
>    retval[2] = p1 - p2;
>    return retval;
> end;
> $BODY$
> LANGUAGE 'plpgsql' IMMUTABLE;
>
> -- In this case function test is called three times:
> pg=# select (test(1, 2))[0] as sum, (test(1, 2))[1] as prod,
> (test(1, 2))[2] as dif;
> INFO:  called test(1, 2)
> INFO:  called test(1, 2)
> INFO:  called test(1, 2)
>  sum | prod | dif
> -----+------+-----
>    3 |    2 |  -1
> (1 row)
>
>
> -- In this case function test is called only once:
> pg=# select res[0] as sum, res[1] as prod, res[2] as dif from
> pg-# (select (test(1, 2))::integer[] as res) t ;
> INFO:  called test(1, 2)
>  sum | prod | dif
> -----+------+-----
>    3 |    2 |  -1
> (1 row)
>
> I assume the second form will perform better since test is
> being called only once.
> I might be missing something in this assumption but at first
> glance it seems pretty straightforward.
>
> Regards,
> Fernando.
>
--Follow up--

When I use one of the inner-query columns as a condition for the outer-query
the function is being called again:

pg=# select res[0] as sum, res[1] as prod, res[2] as dif from
pg-# (select (test(1, 2))::integer[] as res) t
pg-# where res[0] = 3;
INFO:  called test(1, 2)
INFO:  called test(1, 2)sum | prod | dif
-----+------+-----  3 |    2 |  -1
(1 row)

Seems this blows away my theory, at least part of it.



Re: using calculated column in where-clause

From
Tom Lane
Date:
"Fernando Hevia" <fhevia@ip-tel.com.ar> writes:
> -- In this case function test is called only once:
> pg=# select res[0] as sum, res[1] as prod, res[2] as dif from
> pg-# (select (test(1, 2))::integer[] as res) t ;

That's an implementation artifact, not a guaranteed behavior;
if you change the example a bit you'll find multiple calls
happening.

(In recent releases you'd actually have a better chance of
not having multiple calls if you'd declared it volatile
instead of immutable.)
        regards, tom lane