PostgreSQL versions: 7.4.3, 8.0.0beta1
Joins against set-returning functions can be slow. Here's a simple
example (in 8.0.0beta1, the gen_series function can be replaced
with generate_series):
CREATE FUNCTION gen_series(INTEGER, INTEGER) RETURNS SETOF INTEGER AS '
DECLARE
xstart ALIAS FOR $1;
xend ALIAS FOR $2;
x INTEGER;
BEGIN
FOR x IN xstart .. xend LOOP
RETURN NEXT x;
END LOOP;
RETURN;
END;
' LANGUAGE plpgsql IMMUTABLE STRICT;
CREATE TABLE stuff (
id INTEGER NOT NULL PRIMARY KEY,
item VARCHAR(32) NOT NULL
);
INSERT INTO stuff (id, item)
SELECT id, 'Item ' || id FROM gen_series(1, 100000) AS f(id);
ANALYZE stuff;
Here are two queries; notice how the second query, which uses higher
numbers for the join key, is much slower than the first, apparently
due to an inefficient index scan:
EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id);
Merge Join (cost=62.33..2544.33 rows=1001 width=17) (actual time=1.398..1.950 rows=10 loops=1)
Merge Cond: ("outer".id = "inner".id)
-> Index Scan using stuff_pkey on stuff (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.667..0.860
rows=11loops=1)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.646..0.670 rows=10 loops=1)
Sort Key: f.id
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.403..0.478 rows=10
loops=1)
Total runtime: 2.529 ms
(7 rows)
EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id);
Merge Join (cost=62.33..2544.33 rows=1001 width=17) (actual time=2907.078..2907.618 rows=10 loops=1)
Merge Cond: ("outer".id = "inner".id)
-> Index Scan using stuff_pkey on stuff (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.107..2270.722
rows=100000loops=1)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.630..0.654 rows=10 loops=1)
Sort Key: f.id
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.392..0.469 rows=10
loops=1)
Total runtime: 2908.205 ms
(7 rows)
If I turn off enable_mergejoin then both queries are fast:
SET enable_mergejoin TO off;
EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id);
Nested Loop (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.600..1.912 rows=10 loops=1)
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.395..0.482 rows=10 loops=1)
-> Index Scan using stuff_pkey on stuff (cost=0.00..3.01 rows=1 width=17) (actual time=0.091..0.107 rows=1
loops=10)
Index Cond: (stuff.id = "outer".id)
Total runtime: 2.401 ms
(5 rows)
EXPLAIN ANALYZE
SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id);
Nested Loop (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.586..1.891 rows=10 loops=1)
-> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.394..0.479 rows=10 loops=1)
-> Index Scan using stuff_pkey on stuff (cost=0.00..3.01 rows=1 width=17) (actual time=0.089..0.105 rows=1
loops=10)
Index Cond: (stuff.id = "outer".id)
Total runtime: 2.374 ms
(5 rows)
Is the planner doing something wrong here?
Thanks.
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/