Thread: using calculated column in where-clause
Hi! I'd like to do some calculation with values from the table, show them a new column and use the values in a where-clause. Something like this select a, b , a*b as c from ta where c=2; But postgresql complains, that column "c" does not exist. Do I have to repeat the calculation (which might be even more complex :-) ) in the "where"-clause, or is there a better way? Thanks in advance. Best regards Patrick
Patrick Scharrenberg <pittipatti@web.de> schrieb: > Something like this > select a, b , a*b as c from ta where c=2; > > But postgresql complains, that column "c" does not exist. > > Do I have to repeat the calculation (which might be even more complex yes. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Tue, Jun 17, 2008 at 2:46 PM, Patrick Scharrenberg <pittipatti@web.de> wrote: > Hi! > > I'd like to do some calculation with values from the table, show them a > new column and use the values in a where-clause. > > Something like this > select a, b , a*b as c from ta where c=2; > > But postgresql complains, that column "c" does not exist. > > Do I have to repeat the calculation (which might be even more complex > :-) ) in the "where"-clause, or is there a better way? Pretty much yes. Trying to do tricks using subselects may result in substandard performing query plans. You can always do something like: select * from (select a,b,a*b as c) as z where z.c = 2 but if it's slower don't blame me. The nice thing here is that you can index on that function, which is the real issue with performance, since otherwise you'll likely see a sequential scan every time. create index ta_atimesb on ta ((a*b)); and from then on the query should run pretty fast. That's really more important than if you have to put it twice on the same query line.
Andreas Kretschmer wrote: >> Do I have to repeat the calculation (which might be even more complex > yes. Short and pregnant! :-) Thanks! Patrick
> -----Mensaje original----- > De: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org] En nombre de Patrick > Scharrenberg > Enviado el: Martes, 17 de Junio de 2008 17:46 > Para: pgsql-sql@postgresql.org > Asunto: [SQL] using calculated column in where-clause > > Hi! > > I'd like to do some calculation with values from the table, > show them a new column and use the values in a where-clause. > > Something like this > select a, b , a*b as c from ta where c=2; > > But postgresql complains, that column "c" does not exist. > > Do I have to repeat the calculation (which might be even more complex > :-) ) in the "where"-clause, or is there a better way? > For complex calculations I have obtained better performance using nested queries. For example: select a, b, c select ( select a, b, a*b as c from ta) subquery1 where c = 2; This nesting is probably overhead in such a simple case as this, but in more complex ones and specially with volatile functions it will provide an improvement. Regards, Fernando.
On Wed, Jun 18, 2008 at 1:35 PM, Fernando Hevia <fhevia@ip-tel.com.ar> wrote: > >> -----Mensaje original----- >> De: pgsql-sql-owner@postgresql.org >> [mailto:pgsql-sql-owner@postgresql.org] En nombre de Patrick >> Scharrenberg >> Enviado el: Martes, 17 de Junio de 2008 17:46 >> Para: pgsql-sql@postgresql.org >> Asunto: [SQL] using calculated column in where-clause >> >> Hi! >> >> I'd like to do some calculation with values from the table, >> show them a new column and use the values in a where-clause. >> >> Something like this >> select a, b , a*b as c from ta where c=2; >> >> But postgresql complains, that column "c" does not exist. >> >> Do I have to repeat the calculation (which might be even more complex >> :-) ) in the "where"-clause, or is there a better way? >> > > For complex calculations I have obtained better performance using nested > queries. For example: > > select a, b, c select > ( select a, b, a*b as c from ta) subquery1 > where c = 2; > > This nesting is probably overhead in such a simple case as this, but in more > complex ones and specially with volatile functions it will provide an > improvement. I was under the impresion from previous discussions that the query planner flattened these out to be the same query. Do you get different query plans when you re-arrange this way?
> -----Mensaje original----- > De: Scott Marlowe [mailto:scott.marlowe@gmail.com] > Enviado el: Miércoles, 18 de Junio de 2008 17:47 > Para: Fernando Hevia > > > > For complex calculations I have obtained better performance using > > nested queries. For example: > > > > select a, b, c select > > ( select a, b, a*b as c from ta) subquery1 where c = 2; > > > > This nesting is probably overhead in such a simple case as > this, but > > in more complex ones and specially with volatile functions it will > > provide an improvement. > > I was under the impresion from previous discussions that the > query planner flattened these out to be the same query. Do > you get different query plans when you re-arrange this way? > Take a look at this example (tried on 8.2.7 & 8.1.11): create or replace function test(p1 integer, p2 integer) returns integer[] as $BODY$ declare retval integer[]; begin raise info 'called test(%, %)', p1, p2; retval[0] = p1 + p2; retval[1] = p1 * p2; retval[2] = p1 - p2; returnretval; end; $BODY$ LANGUAGE 'plpgsql' IMMUTABLE; -- In this case function test is called three times: pg=# select (test(1, 2))[0] as sum, (test(1, 2))[1] as prod, (test(1, 2))[2] as dif; INFO: called test(1, 2) INFO: called test(1, 2) INFO: called test(1, 2)sum | prod | dif -----+------+----- 3 | 2 | -1 (1 row) -- In this case function test is called only once: pg=# select res[0] as sum, res[1] as prod, res[2] as dif from pg-# (select (test(1, 2))::integer[] as res) t ; INFO: called test(1, 2)sum | prod | dif -----+------+----- 3 | 2 | -1 (1 row) I assume the second form will perform better since test is being called only once. I might be missing something in this assumption but at first glance it seems pretty straightforward. Regards, Fernando.
> > -----Mensaje original----- > > De: Scott Marlowe [mailto:scott.marlowe@gmail.com] Enviado el: > > Miércoles, 18 de Junio de 2008 17:47 > > Para: Fernando Hevia > > > > > > > For complex calculations I have obtained better performance using > > > nested queries. For example: > > > > > > select a, b, c select > > > ( select a, b, a*b as c from ta) subquery1 where c = 2; > > > > > > This nesting is probably overhead in such a simple case as > > this, but > > > in more complex ones and specially with volatile > functions it will > > > provide an improvement. > > > > I was under the impresion from previous discussions that the query > > planner flattened these out to be the same query. Do you get > > different query plans when you re-arrange this way? > > > > Take a look at this example (tried on 8.2.7 & 8.1.11): > > create or replace function test(p1 integer, p2 integer) > returns integer[] as $BODY$ declare > retval integer[]; > begin > raise info 'called test(%, %)', p1, p2; > retval[0] = p1 + p2; > retval[1] = p1 * p2; > retval[2] = p1 - p2; > return retval; > end; > $BODY$ > LANGUAGE 'plpgsql' IMMUTABLE; > > -- In this case function test is called three times: > pg=# select (test(1, 2))[0] as sum, (test(1, 2))[1] as prod, > (test(1, 2))[2] as dif; > INFO: called test(1, 2) > INFO: called test(1, 2) > INFO: called test(1, 2) > sum | prod | dif > -----+------+----- > 3 | 2 | -1 > (1 row) > > > -- In this case function test is called only once: > pg=# select res[0] as sum, res[1] as prod, res[2] as dif from > pg-# (select (test(1, 2))::integer[] as res) t ; > INFO: called test(1, 2) > sum | prod | dif > -----+------+----- > 3 | 2 | -1 > (1 row) > > I assume the second form will perform better since test is > being called only once. > I might be missing something in this assumption but at first > glance it seems pretty straightforward. > > Regards, > Fernando. > --Follow up-- When I use one of the inner-query columns as a condition for the outer-query the function is being called again: pg=# select res[0] as sum, res[1] as prod, res[2] as dif from pg-# (select (test(1, 2))::integer[] as res) t pg-# where res[0] = 3; INFO: called test(1, 2) INFO: called test(1, 2)sum | prod | dif -----+------+----- 3 | 2 | -1 (1 row) Seems this blows away my theory, at least part of it.
"Fernando Hevia" <fhevia@ip-tel.com.ar> writes: > -- In this case function test is called only once: > pg=# select res[0] as sum, res[1] as prod, res[2] as dif from > pg-# (select (test(1, 2))::integer[] as res) t ; That's an implementation artifact, not a guaranteed behavior; if you change the example a bit you'll find multiple calls happening. (In recent releases you'd actually have a better chance of not having multiple calls if you'd declared it volatile instead of immutable.) regards, tom lane