Query becomes slow when written as view - Mailing list pgsql-general

From Jan Strube
Subject Query becomes slow when written as view
Date
Msg-id 511CE547.4040500@deriva.de
Whole thread Raw
Responses Re: Query becomes slow when written as view
List pgsql-general
Hi,

I have the following query which runs reasonably fast under PostgreSQL=20
9.1.8:

SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND=20
cc.cache_time >=3D b._last_modified
WHERE b."ISIN" IN (SELECT "ISIN" from dtng."Z_BASE" LIMIT 1)

Here is the query plan:

Nested Loop Left Join (cost=3D0.08..16.65 rows=3D1 width=3D1053)
Join Filter: (cc.cache_time >=3D b._last_modified)
-> Nested Loop (cost=3D0.08..8.67 rows=3D1 width=3D644)
-> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13)
-> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13=
)
-> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13)
-> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13=
)
-> Index Scan using "Z_BASE_pkey" on "Z_BASE" b (cost=3D0.00..8.57 rows=3D=
1=20
width=3D644)
Index Cond: (("ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar)
-> Index Scan using cached_comments_pkey on cached_comments cc=20
(cost=3D0.00..7.71 rows=3D1 width=3D425)
Index Cond: ((b."ISIN")::bpchar =3D ("ISIN")::bpchar)

When I=C2=B4m trying to put this into a view, it becomes extremely slow:

CREATE VIEW export_comments AS
SELECT
b."ISIN",
CASE
WHEN b."COMMENT" IS NOT NULL THEN b."COMMENT"
WHEN cc."ISIN" IS NOT NULL THEN cc.comment
ELSE get_comment(b."ISIN")
END AS "COMMENT"
FROM dtng."Z_BASE" b
LEFT JOIN dtng.cached_comments cc on b."ISIN" =3D cc."ISIN" AND=20
cc.cache_time >=3D b._last_modified

SELECT *
FROM export_comments
WHERE "ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

The query plan now is:

Hash Join (cost=3D79926.52..906644.87 rows=3D818684 width=3D45)
Hash Cond: ((b."ISIN")::bpchar =3D ("ANY_subquery"."ISIN")::bpchar)
-> Hash Left Join (cost=3D79926.42..884049.08 rows=3D1637368 width=3D1053=
)
Hash Cond: ((b."ISIN")::bpchar =3D (cc."ISIN")::bpchar)
Join Filter: (cc.cache_time >=3D b._last_modified)
-> Seq Scan on "Z_BASE" b (cost=3D0.00..106515.68 rows=3D1637368 width=3D=
644)
-> Hash (cost=3D74620.41..74620.41 rows=3D77841 width=3D425)
-> Seq Scan on cached_comments cc (cost=3D0.00..74620.41 rows=3D77841 wid=
th=3D425)
-> Hash (cost=3D0.09..0.09 rows=3D1 width=3D13)
-> HashAggregate (cost=3D0.08..0.09 rows=3D1 width=3D13)
-> Subquery Scan on "ANY_subquery" (cost=3D0.00..0.08 rows=3D1 width=3D13=
)
-> Limit (cost=3D0.00..0.07 rows=3D1 width=3D13)
-> Seq Scan on "Z_BASE" (cost=3D0.00..106515.68 rows=3D1637368 width=3D13=
)

By the way I get the same behaviour and query plan when I try this:

SELECT *
FROM (
-- above view definition
) x
WHERE x."ISIN" IN (SELECT "ISIN" FROM dtng."Z_BASE" LIMIT 1)

We already found out that the problem is the Perl function "get_comment"=20
which is very expensive. In the first case the function is called at=20
most once, but in the second case it is called many times. I believe=20
this is because of the hash join which causes the view to fetch=20
everything from dtng."Z_BASE" first?
The question is, how to avoid this? We tried to set the functions cost=20
from 100 to 10000000 but that did not help. (Because of the architecture=20
of the software that uses this query, we have the constraint that=20
structure of the final WHERE clause (WHERE "ISIN" IN (...)) must not be=20
altered.)

Thanks a lot for any idea,
Jan

pgsql-general by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Join query query
Next
From: Vick Khera
Date:
Subject: Re: Runtime variations during day