Thread: Query optimization with repeated calculations

Query optimization with repeated calculations

From
Robert McGehee
Date:
Hello,
I have a general query optimization question involving repeated calculations.

I frequently want to make views that repeat a calculation over and over again in lots of columns. In the example below, let’s say it’s (a+b), but we can imagine the calculation being much more complicated.

For example:
CREATE VIEW AS
SELECT (a+b)*c as c1, (a+b)*d as d1, (a+b)*e as e1
FROM table;

My question is, is PostgreSQL "smart" enough to cache this (a+b) calculation in the above example, or does it recalculate it in each column? If it recalculates it in each column, would I generally get better performance by using a subquery to convince PostgreSQL to cache the result?  For example:

CREATE VIEW AS
SELECT x*c as c1, x*d as d1, x*e as e1
FROM (SELECT (a+b) as x, * FROM table) x;

Or perhaps I could use a CTE? In some instances, I might need 2 or more subqueries to “cache” all the calculations (if one column depends on another column that depends on another column), and I feel that sometimes Postgres makes better optimization decisions when there are not subqueries involved, but I am not sure.

Obviously I could benchmark everything every single time this comes up, but there is overhead in that and some kind of rule of thumb or general guideline would be helpful as it is a frequent question for me.

Thank you!
Robert

Re: Query optimization with repeated calculations

From
Tom Lane
Date:
Robert McGehee <rmcgehee@gmail.com> writes:
> I frequently want to make views that repeat a calculation over and over
> again in lots of columns. In the example below, let’s say it’s (a+b), but
> we can imagine the calculation being much more complicated.

> For example:
> CREATE VIEW AS
> SELECT (a+b)*c as c1, (a+b)*d as d1, (a+b)*e as e1
> FROM table;

> My question is, is PostgreSQL "smart" enough to cache this (a+b)
> calculation in the above example, or does it recalculate it in each column?

Postgres will not notice that there's duplicate subexpressions.

> If it recalculates it in each column, would I generally get better
> performance by using a subquery to convince PostgreSQL to cache the
> result?  For example:

> CREATE VIEW AS
> SELECT x*c as c1, x*d as d1, x*e as e1
> FROM (SELECT (a+b) as x, * FROM table) x;

Maybe, if the subexpression is expensive enough (a+b probably doesn't
meet that threshold).  You'd need to take care to prevent the
subquery from being "flattened" into the upper query.  Typically
this might require an optimization fence like OFFSET 0, which is
problematic for a view because it could prevent optimizations that
are far more important than avoiding duplicated calculations.
CTEs are likewise a rather blunt tool that could cost more than
they save.

It might work better to put the redundant calculations in a LATERAL
subquery, ie

CREATE VIEW AS
SELECT x*c as c1, x*d as d1, x*e as e1
FROM tab, LATERAL (SELECT a+b as x OFFSET 0) ss;

I haven't really experimented, but I think that this'd dodge
most of the optimization pitfalls, except one: in this formulation,
a+b will be evaluated for every "tab" row even if the particular
use of the view does not demand any of the columns that use "x".

            regards, tom lane