Slow joins against set-returning functions - Mailing list pgsql-performance
From | Michael Fuhr |
---|---|
Subject | Slow joins against set-returning functions |
Date | |
Msg-id | 20040815160014.GA85211@winnie.fuhr.org Whole thread Raw |
Responses |
Re: Slow joins against set-returning functions
|
List | pgsql-performance |
PostgreSQL versions: 7.4.3, 8.0.0beta1 Joins against set-returning functions can be slow. Here's a simple example (in 8.0.0beta1, the gen_series function can be replaced with generate_series): CREATE FUNCTION gen_series(INTEGER, INTEGER) RETURNS SETOF INTEGER AS ' DECLARE xstart ALIAS FOR $1; xend ALIAS FOR $2; x INTEGER; BEGIN FOR x IN xstart .. xend LOOP RETURN NEXT x; END LOOP; RETURN; END; ' LANGUAGE plpgsql IMMUTABLE STRICT; CREATE TABLE stuff ( id INTEGER NOT NULL PRIMARY KEY, item VARCHAR(32) NOT NULL ); INSERT INTO stuff (id, item) SELECT id, 'Item ' || id FROM gen_series(1, 100000) AS f(id); ANALYZE stuff; Here are two queries; notice how the second query, which uses higher numbers for the join key, is much slower than the first, apparently due to an inefficient index scan: EXPLAIN ANALYZE SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id); Merge Join (cost=62.33..2544.33 rows=1001 width=17) (actual time=1.398..1.950 rows=10 loops=1) Merge Cond: ("outer".id = "inner".id) -> Index Scan using stuff_pkey on stuff (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.667..0.860 rows=11loops=1) -> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.646..0.670 rows=10 loops=1) Sort Key: f.id -> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.403..0.478 rows=10 loops=1) Total runtime: 2.529 ms (7 rows) EXPLAIN ANALYZE SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id); Merge Join (cost=62.33..2544.33 rows=1001 width=17) (actual time=2907.078..2907.618 rows=10 loops=1) Merge Cond: ("outer".id = "inner".id) -> Index Scan using stuff_pkey on stuff (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.107..2270.722 rows=100000loops=1) -> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.630..0.654 rows=10 loops=1) Sort Key: f.id -> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.392..0.469 rows=10 loops=1) Total runtime: 2908.205 ms (7 rows) If I turn off enable_mergejoin then both queries are fast: SET enable_mergejoin TO off; EXPLAIN ANALYZE SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id); Nested Loop (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.600..1.912 rows=10 loops=1) -> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.395..0.482 rows=10 loops=1) -> Index Scan using stuff_pkey on stuff (cost=0.00..3.01 rows=1 width=17) (actual time=0.091..0.107 rows=1 loops=10) Index Cond: (stuff.id = "outer".id) Total runtime: 2.401 ms (5 rows) EXPLAIN ANALYZE SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id); Nested Loop (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.586..1.891 rows=10 loops=1) -> Function Scan on gen_series f (cost=0.00..12.50 rows=1000 width=4) (actual time=0.394..0.479 rows=10 loops=1) -> Index Scan using stuff_pkey on stuff (cost=0.00..3.01 rows=1 width=17) (actual time=0.089..0.105 rows=1 loops=10) Index Cond: (stuff.id = "outer".id) Total runtime: 2.374 ms (5 rows) Is the planner doing something wrong here? Thanks. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
pgsql-performance by date: