PostgreSQL 8.3.14
OS: Linux Redhat 5.4
Note: I have used the same subject for this email taken from an email: Post=
ed 2011-02-24 13:29:22-08 by "David Johnston", because this seems to be a v=
ery similar observation.
Bug/Problem Summary:
We are using a simple query based on a simple view and the query optimizer =
is not choosing an index.=20
The same query without the view is using an index.
The same query on an almost identical view, but having either removed a sin=
gle column which was generated via a function call or replace the function =
call with equivalent SQL, then the query optimizer is choosing an index.
Other notes:
We observe the same behaviour on two separate databases running on separate=
servers (both the same version).
The genetic query optimizer settings are all on defaults for these versions.
Details;
With the following view:
CREATE OR REPLACE VIEW vu_tbl_news_web AS=20
SELECT=20
n.id, n.id_cmpy, n.id_news, n.id_newshdline,=20
n.s_origcmpyname, n.s_hdline, n.s_news,=20
n.b_amend, n.b_replace,=20
n.dt_publish,=20
n.tsv_hdline, n.tsv_news,=20
n.b_hasorigdoc,=20
(SELECT h.s_hdline FROM tbl_newshdline h WHERE h.id =3D n.id_newshdline LI=
MIT 1) AS s_hdlinetype,=20
get_cmpyname(n.id_contrib) AS s_provider
FROM tbl_news n
WHERE n.dt_publish IS NOT NULL;
Query:
SELECT n.id, n.id_cmpy, n.s_origcmpyname,=20
n.s_hdline, n.dt_publish=20
FROM vu_tbl_news_web n=20
ORDER BY n.dt_publish DESC=20
LIMIT 25
Analyse produces:
Limit (cost=3D180017.37..180017.43 rows=3D25 width=3D80)
-> Sort (cost=3D180017.37..180110.54 rows=3D37267 width=3D80)
Sort Key: n.dt_publish
-> Subquery Scan n (cost=3D0.00..178965.72 rows=3D37267 width=3D8=
0)
-> Seq Scan on tbl_news n (cost=3D0.00..178593.05 rows=3D37=
267 width=3D1152)
Filter: (dt_publish IS NOT NULL)
SubPlan
-> Limit (cost=3D0.00..4.29 rows=3D1 width=3D22)
-> Seq Scan on tbl_newshdline h (cost=3D0.00.=
.4.29 rows=3D1 width=3D22)
Filter: (id =3D $0)
Although the number of rows (37k) is small, there is a lot of data in some =
of the columns so, with a seq. scan its taking nearly 2 seconds. Compared t=
o 16ms when using an index.
Now if I perform the same query without using the view, such as:
SELECT n.id, n.id_cmpy, n.s_origcmpyname,=20
n.s_hdline, n.dt_publish=20
FROM tbl_news n=20
WHERE n.dt_publish IS NOT NULL=20
ORDER BY n.dt_publish DESC=20
LIMIT 25
Analyse produces:
Limit (cost=3D0.00..21.34 rows=3D25 width=3D73)
-> Index Scan Backward using tbl_news_publish1_key on tbl_news n (cost=
=3D0.00..31807.05 rows=3D37267 width=3D73)
Finally I discovered that if I remove a column from the view, that is gener=
ated via a function, then all works as expected?
CREATE OR REPLACE VIEW vu_tbl_news_web3 AS=20
SELECT=20
n.id, n.id_cmpy, n.id_news, n.id_newshdline,=20
n.s_origcmpyname, n.s_hdline, n.s_news,=20
n.b_amend, n.b_replace,=20
n.dt_publish, n.tsv_hdline, n.tsv_news,=20
n.b_hasorigdoc,
(SELECT h.s_hdline FROM tbl_newshdline h WHERE h.id =3D n.id_newshdline LI=
MIT 1) AS s_hdlinetype
FROM tbl_news n
WHERE n.dt_publish IS NOT NULL;
SELECT n.id, n.id_cmpy, n.s_origcmpyname,=20
n.s_hdline, n.dt_publish=20
FROM vu_tbl_news_web3 n=20
ORDER BY n.dt_publish DESC=20
LIMIT 25
Analyse produces:
Limit (cost=3D0.00..21.34 rows=3D25 width=3D73)
-> Index Scan Backward using tbl_news_publish1_key on tbl_news n (cost=
=3D0.00..31807.05 rows=3D37267 width=3D73)
The definition of the function used is:
CREATE OR REPLACE FUNCTION get_cmpyname(integer) RETURNS citext AS
$BODY$
DECLARE
idcmpy ALIAS FOR $1;
sumb citext;
sres citext;
BEGIN
SELECT INTO sumb, sres s_umbname, s_name FROM tbl_cmpy WHERE (id =3D idcmp=
y) LIMIT 1;
IF length(sumb) > 0 THEN
sres :=3D sumb || ' - ' || sres;
END IF;
RETURN sres;
END $BODY$
LANGUAGE 'plpgsql' VOLATILE STRICT SECURITY DEFINER COST 100;
I also noted that if I replace the function call in the view with equivalen=
t SQL:
(SELECT (CASE WHEN length(c.s_umbname) > 0 THEN c.s_umbname || ' - '::citex=
t || c.s_name ELSE c.s_name END) AS s_cmpyname FROM tbl_cmpy c WHERE c.id =
=3D n.id_contrib LIMIT 1) AS s_provider
the problem also goes away.
Regards
Donald Fraser=