Thread: Query becomes slow when written as view

Query becomes slow when written as view

From
Jan Strube
Date:
Hi,

I have the following query which runs reasonably fast under PostgreSQL=20
9.1.8:

SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND=20
cc.cache_time >=3D b._last_modified
WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1)

Here is the query plan:

Nested Loop Left Join (cost=3D0.08..16.65 rows=3D1 width=3D1053)
Join Filter: (cc.cache_time >=3D b._last_modified)
-> Nested Loop (cost=3D0.08..8.67 rows=3D1 width=3D644)
-> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13)
-> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13=
)
-> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13)
-> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13=
)
-> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=3D0.00..8.57 rows=3D=
1=20
width=3D644)
Index Cond: (("ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar)
-> Index Scan using cached_comments_pkey on cached_comments cc=20
(cost=3D0.00..7.71 rows=3D1 width=3D425)
Index Cond: ((b."ISIN")::bpchar =3D ("ISIN")::bpchar)

When I=C2=B4m trying to put this into a view, it becomes extremely slow:

CREATE VIEW export_comments AS
SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND=20
cc.cache_time >=3D b._last_modified

SELECT *
FROM export_comments
WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

The query plan now is:

Hash Join (cost=3D79926.52..906644.87 rows=3D818684 width=3D45)
Hash Cond: ((b."ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar)
-> Hash Left Join (cost=3D79926.42..884049.08 rows=3D1637368 width=3D1053=
)
Hash Cond: ((b."ISIN")::bpchar =3D (cc."ISIN")::bpchar)
Join Filter: (cc.cache_time >=3D b._last_modified)
-> Seq Scan on "Z_BASE" b (cost=3D0.00..106515.68 rows=3D1637368 width=3D=
644)
-> Hash (cost=3D74620.41..74620.41 rows=3D77841 width=3D425)
-> Seq Scan on cached_comments cc (cost=3D0.00..74620.41 rows=3D77841 wid=
th=3D425)
-> Hash (cost=3D0.09..0.09 rows=3D1 width=3D13)
-> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13)
-> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13=
)
-> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13)
-> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13=
)

By the way I get the same behaviour and query plan when I try this:

SELECT *
FROM (
-- above view definition
) x
WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

We already found out that the problem is the Perl function "get_comment"=20
which is very expensive. In the first case the function is called at=20
most once, but in the second case it is called many times. I believe=20
this is because of the hash join which causes the view to fetch=20
everything from dtng."Z_BASE" first?
The question is, how to avoid this? We tried to set the functions cost=20
from 100 to 10000000 but that did not help. (Because of the architecture=20
of the software that uses this query, we have the constraint that=20
structure of the final WHERE clause (WHERE "ISIN" IN (...)) must not be=20
altered.)

Thanks a lot for any idea,
Jan

Re: Query becomes slow when written as view

From
Merlin Moncure
Date:
On Thu, Feb 14, 2013 at 7:23 AM, Jan Strube <js@deriva.de> wrote:
> Hi,
>
> I have the following query which runs reasonably fast under PostgreSQL
> 9.1.8:
>
> SELECT
> b."ISIN",
> CASE
> WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
> WHEN cc."ISIN" IS NOT NULL THEN cc.comment
> ELSE get_comment(b."ISIN")
> END AS "COMMENT"
> FROM dtng."Z_BASE" b
> LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND cc.cache_=
time
>>=3D b._last_modified
> WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1)
>
> Here is the query plan:
>
> Nested Loop Left Join (cost=3D0.08..16.65 rows=3D1 width=3D1053)
> Join Filter: (cc.cache_time >=3D b._last_modified)
> -> Nested Loop (cost=3D0.08..8.67 rows=3D1 width=3D644)
> -> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13)
> -> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13=
)
> -> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13)
> -> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13=
)
> -> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=3D0.00..8.57 rows=
=3D1
> width=3D644)
> Index Cond: (("ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar)
> -> Index Scan using cached_comments_pkey on cached_comments cc
> (cost=3D0.00..7.71 rows=3D1 width=3D425)
> Index Cond: ((b."ISIN")::bpchar =3D ("ISIN")::bpchar)
>
> When I=B4m trying to put this into a view, it becomes extremely slow:
>
> CREATE VIEW export_comments AS
> SELECT
> b."ISIN",
> CASE
> WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
> WHEN cc."ISIN" IS NOT NULL THEN cc.comment
> ELSE get_comment(b."ISIN")
> END AS "COMMENT"
> FROM dtng."Z_BASE" b
> LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND cc.cache_=
time
>>=3D b._last_modified
>
> SELECT *
> FROM export_comments
> WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)
>
> The query plan now is:
>
> Hash Join (cost=3D79926.52..906644.87 rows=3D818684 width=3D45)
> Hash Cond: ((b."ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar)
> -> Hash Left Join (cost=3D79926.42..884049.08 rows=3D1637368 width=3D1053=
)
> Hash Cond: ((b."ISIN")::bpchar =3D (cc."ISIN")::bpchar)
> Join Filter: (cc.cache_time >=3D b._last_modified)
> -> Seq Scan on "Z_BASE" b (cost=3D0.00..106515.68 rows=3D1637368 width=3D=
644)
> -> Hash (cost=3D74620.41..74620.41 rows=3D77841 width=3D425)
> -> Seq Scan on cached_comments cc (cost=3D0.00..74620.41 rows=3D77841 wid=
th=3D425)
> -> Hash (cost=3D0.09..0.09 rows=3D1 width=3D13)
> -> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13)
> -> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13=
)
> -> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13)
> -> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13=
)
>
> By the way I get the same behaviour and query plan when I try this:
>
> SELECT *
> FROM (
> -- above view definition
> ) x
> WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)
>
> We already found out that the problem is the Perl function "get_comment"
> which is very expensive. In the first case the function is called at most
> once, but in the second case it is called many times. I believe this is
> because of the hash join which causes the view to fetch everything from
> dtng."Z_BASE" first?
> The question is, how to avoid this? We tried to set the functions cost fr=
om
> 100 to 10000000 but that did not help. (Because of the architecture of th=
e
> software that uses this query, we have the constraint that structure of t=
he
> final WHERE clause (WHERE "ISIN" IN (...)) must not be altered.)
>
> Thanks a lot for any idea,
> Jan

is your function stable/immutable, and if so is it decorated as such.

merlin

Re: Query becomes slow when written as view

From
Jan Strube
Date:
> is your function stable/immutable, and if so is it decorated as such.
>
> merlin

No, it=B4s volatile.

Jan

Re: Query becomes slow when written as view

From
Tom Lane
Date:
Jan Strube <js@deriva.de> writes:
>> is your function stable/immutable, and if so is it decorated as such.

> No, it´s volatile.

Well, that's your problem.  The planner won't push down the IN clause
past the volatile function for fear of changing the query's side-effects.

I'd question whether it's sane to have a view with volatile functions in
it at all.  It certainly won't act much like the normal understanding of
a view ...

            regards, tom lane

Re: Query becomes slow when written as view

From
Jan Strube
Date:
>>> is your function stable/immutable, and if so is it decorated as such.
>>>
>> No, it=B4s volatile.
> Well, that's your problem.  The planner won't push down the IN clause
> past the volatile function for fear of changing the query's side-effect=
s.
>
> I'd question whether it's sane to have a view with volatile functions i=
n
> it at all.  It certainly won't act much like the normal understanding o=
f
> a view ...

I see, thanks for the explanation.
In this case, the side effect is desired. The view should always return=20
a COMMENT. Either directly from one of the tables or generated from the=20
function which stores the COMMENT in cached_comments for the next select.
Is there perhaps a best practice to do a thing like that? Of course we=20
could declare the original function stable and call another volatile=20
function to store the data, as noted in the docs. But that would be=20
cheating...

Regards,
Jan