Thread: Query becomes slow when written as view
Hi, I have the following query which runs reasonably fast under PostgreSQL=20 9.1.8: SELECT b."ISIN", CASE WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT" WHEN cc."ISIN" IS NOT NULL THEN cc.comment ELSE get_comment(b."ISIN") END AS "COMMENT" FROM dtng."Z_BASE" b LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND=20 cc.cache_time >=3D b._last_modified WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1) Here is the query plan: Nested Loop Left Join (cost=3D0.08..16.65 rows=3D1 width=3D1053) Join Filter: (cc.cache_time >=3D b._last_modified) -> Nested Loop (cost=3D0.08..8.67 rows=3D1 width=3D644) -> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13) -> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13= ) -> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13) -> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13= ) -> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=3D0.00..8.57 rows=3D= 1=20 width=3D644) Index Cond: (("ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar) -> Index Scan using cached_comments_pkey on cached_comments cc=20 (cost=3D0.00..7.71 rows=3D1 width=3D425) Index Cond: ((b."ISIN")::bpchar =3D ("ISIN")::bpchar) When I=C2=B4m trying to put this into a view, it becomes extremely slow: CREATE VIEW export_comments AS SELECT b."ISIN", CASE WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT" WHEN cc."ISIN" IS NOT NULL THEN cc.comment ELSE get_comment(b."ISIN") END AS "COMMENT" FROM dtng."Z_BASE" b LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND=20 cc.cache_time >=3D b._last_modified SELECT * FROM export_comments WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1) The query plan now is: Hash Join (cost=3D79926.52..906644.87 rows=3D818684 width=3D45) Hash Cond: ((b."ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar) -> Hash Left Join (cost=3D79926.42..884049.08 rows=3D1637368 width=3D1053= ) Hash Cond: ((b."ISIN")::bpchar =3D (cc."ISIN")::bpchar) Join Filter: (cc.cache_time >=3D b._last_modified) -> Seq Scan on "Z_BASE" b (cost=3D0.00..106515.68 rows=3D1637368 width=3D= 644) -> Hash (cost=3D74620.41..74620.41 rows=3D77841 width=3D425) -> Seq Scan on cached_comments cc (cost=3D0.00..74620.41 rows=3D77841 wid= th=3D425) -> Hash (cost=3D0.09..0.09 rows=3D1 width=3D13) -> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13) -> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13= ) -> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13) -> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13= ) By the way I get the same behaviour and query plan when I try this: SELECT * FROM ( -- above view definition ) x WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1) We already found out that the problem is the Perl function "get_comment"=20 which is very expensive. In the first case the function is called at=20 most once, but in the second case it is called many times. I believe=20 this is because of the hash join which causes the view to fetch=20 everything from dtng."Z_BASE" first? The question is, how to avoid this? We tried to set the functions cost=20 from 100 to 10000000 but that did not help. (Because of the architecture=20 of the software that uses this query, we have the constraint that=20 structure of the final WHERE clause (WHERE "ISIN" IN (...)) must not be=20 altered.) Thanks a lot for any idea, Jan
On Thu, Feb 14, 2013 at 7:23 AM, Jan Strube <js@deriva.de> wrote: > Hi, > > I have the following query which runs reasonably fast under PostgreSQL > 9.1.8: > > SELECT > b."ISIN", > CASE > WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT" > WHEN cc."ISIN" IS NOT NULL THEN cc.comment > ELSE get_comment(b."ISIN") > END AS "COMMENT" > FROM dtng."Z_BASE" b > LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND cc.cache_= time >>=3D b._last_modified > WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1) > > Here is the query plan: > > Nested Loop Left Join (cost=3D0.08..16.65 rows=3D1 width=3D1053) > Join Filter: (cc.cache_time >=3D b._last_modified) > -> Nested Loop (cost=3D0.08..8.67 rows=3D1 width=3D644) > -> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13) > -> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13= ) > -> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13) > -> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13= ) > -> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=3D0.00..8.57 rows= =3D1 > width=3D644) > Index Cond: (("ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar) > -> Index Scan using cached_comments_pkey on cached_comments cc > (cost=3D0.00..7.71 rows=3D1 width=3D425) > Index Cond: ((b."ISIN")::bpchar =3D ("ISIN")::bpchar) > > When I=B4m trying to put this into a view, it becomes extremely slow: > > CREATE VIEW export_comments AS > SELECT > b."ISIN", > CASE > WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT" > WHEN cc."ISIN" IS NOT NULL THEN cc.comment > ELSE get_comment(b."ISIN") > END AS "COMMENT" > FROM dtng."Z_BASE" b > LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND cc.cache_= time >>=3D b._last_modified > > SELECT * > FROM export_comments > WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1) > > The query plan now is: > > Hash Join (cost=3D79926.52..906644.87 rows=3D818684 width=3D45) > Hash Cond: ((b."ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar) > -> Hash Left Join (cost=3D79926.42..884049.08 rows=3D1637368 width=3D1053= ) > Hash Cond: ((b."ISIN")::bpchar =3D (cc."ISIN")::bpchar) > Join Filter: (cc.cache_time >=3D b._last_modified) > -> Seq Scan on "Z_BASE" b (cost=3D0.00..106515.68 rows=3D1637368 width=3D= 644) > -> Hash (cost=3D74620.41..74620.41 rows=3D77841 width=3D425) > -> Seq Scan on cached_comments cc (cost=3D0.00..74620.41 rows=3D77841 wid= th=3D425) > -> Hash (cost=3D0.09..0.09 rows=3D1 width=3D13) > -> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13) > -> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13= ) > -> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13) > -> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13= ) > > By the way I get the same behaviour and query plan when I try this: > > SELECT * > FROM ( > -- above view definition > ) x > WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1) > > We already found out that the problem is the Perl function "get_comment" > which is very expensive. In the first case the function is called at most > once, but in the second case it is called many times. I believe this is > because of the hash join which causes the view to fetch everything from > dtng."Z_BASE" first? > The question is, how to avoid this? We tried to set the functions cost fr= om > 100 to 10000000 but that did not help. (Because of the architecture of th= e > software that uses this query, we have the constraint that structure of t= he > final WHERE clause (WHERE "ISIN" IN (...)) must not be altered.) > > Thanks a lot for any idea, > Jan is your function stable/immutable, and if so is it decorated as such. merlin
> is your function stable/immutable, and if so is it decorated as such. > > merlin No, it=B4s volatile. Jan
Jan Strube <js@deriva.de> writes: >> is your function stable/immutable, and if so is it decorated as such. > No, it´s volatile. Well, that's your problem. The planner won't push down the IN clause past the volatile function for fear of changing the query's side-effects. I'd question whether it's sane to have a view with volatile functions in it at all. It certainly won't act much like the normal understanding of a view ... regards, tom lane
>>> is your function stable/immutable, and if so is it decorated as such. >>> >> No, it=B4s volatile. > Well, that's your problem. The planner won't push down the IN clause > past the volatile function for fear of changing the query's side-effect= s. > > I'd question whether it's sane to have a view with volatile functions i= n > it at all. It certainly won't act much like the normal understanding o= f > a view ... I see, thanks for the explanation. In this case, the side effect is desired. The view should always return=20 a COMMENT. Either directly from one of the tables or generated from the=20 function which stores the COMMENT in cached_comments for the next select. Is there perhaps a best practice to do a thing like that? Of course we=20 could declare the original function stable and call another volatile=20 function to store the data, as noted in the docs. But that would be=20 cheating... Regards, Jan