Hi,
I am having performance problems running a number of queries
involving views based on non-strict functions. I have reproduced the
problem with the simple test-case below which shows how the query plan
is different depending on whether the view uses strict or non-strict
functions (even though those columns do not appear in the WHERE
clause).
CREATE OR REPLACE FUNCTION times_ten_strict(int) RETURNS int
AS 'SELECT $1*10' LANGUAGE SQL IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION times_ten_nonstrict(int) RETURNS int
AS 'SELECT COALESCE($1*10, 0)' LANGUAGE SQL IMMUTABLE;
CREATE OR REPLACE FUNCTION setup()
RETURNS void AS
$$
DECLARE
val int;
BEGIN
DROP TABLE IF EXISTS t1 CASCADE;
DROP TABLE IF EXISTS t2 CASCADE;
CREATE TABLE t1
(
a1 int PRIMARY KEY,
b1 int
);
val := 0;
WHILE val < 10000 LOOP
INSERT INTO t1 VALUES(val, val);
val := val+1;
END LOOP;
CREATE TABLE t2
(
a2 int PRIMARY KEY
);
val := 0;
WHILE val < 10000 LOOP
INSERT INTO t2 VALUES(val);
val := val+1;
END LOOP;
CREATE VIEW v2_strict AS SELECT a2, times_ten_strict(a2) AS b2 FROM t2;
CREATE VIEW v2_nonstrict AS SELECT a2, times_ten_nonstrict(a2) AS b2 FROM t2;
END;
$$ LANGUAGE plpgsql;
SELECT setup();
ANALYZE t1;
ANALYZE t2;
EXPLAIN ANALYZE SELECT * FROM t1 LEFT OUTER JOIN v2_strict v2 ON v2.a2=t1.b1 WHERE t1.a1=3;
EXPLAIN ANALYZE SELECT * FROM t1 LEFT OUTER JOIN v2_nonstrict v2 ON v2.a2=t1.b1 WHERE t1.a1=3;
(I know that I don't really need a left outer join in this example,
but my real data does, and it suffers from the same performance
problem, but worse because there is more data and the joins are more
complex.)
The first query, from the view using a strict function has the
expected plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..16.55 rows=1 width=12) (actual time=0.044..0.055 rows=1 loops=1)
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=8) (actual time=0.015..0.017 rows=1 loops=1)
Index Cond: (a1 = 3)
-> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.016 rows=1 loops=1)
Index Cond: (t2.a2 = t1.b1)
Total runtime: 0.182 ms
However, the second query, which is almost identical except that one
of the columns being returned uses a non-strict function, has the
following plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..413.27 rows=1 width=16) (actual time=0.057..47.511 rows=1 loops=1)
Join Filter: (v2.a2 = t1.b1)
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=8) (actual time=0.012..0.019 rows=1 loops=1)
Index Cond: (a1 = 3)
-> Seq Scan on t2 (cost=0.00..180.00 rows=10000 width=4) (actual time=0.016..26.217 rows=10000 loops=1)
Total runtime: 47.644 ms
Rather than using the primary key on t2, it does a full table
scan. With multiple joins, this starts doing nested full table scans
and becomes very inefficient, especially when the tables are much
bigger than this.
Both functions have a volatility of IMMUTABLE, and I don't understand
why the strictness of the function should affect the query plan.
Any ideas what is going on?
Thanks,
Dean.
_________________________________________________________________
100’s of Music vouchers to be won with MSN Music
https://www.musicmashup.co.uk