Slow joins against set-returning functions - Mailing list pgsql-performance

From Michael Fuhr
Subject Slow joins against set-returning functions
Date
Msg-id 20040815160014.GA85211@winnie.fuhr.org
Whole thread Raw
Responses Re: Slow joins against set-returning functions
List pgsql-performance
PostgreSQL versions: 7.4.3, 8.0.0beta1

Joins against set-returning functions can be slow.  Here's a simple
example (in 8.0.0beta1, the gen_series function can be replaced
with generate_series):

  CREATE FUNCTION gen_series(INTEGER, INTEGER) RETURNS SETOF INTEGER AS '
  DECLARE
      xstart  ALIAS FOR $1;
      xend    ALIAS FOR $2;
      x       INTEGER;
  BEGIN
      FOR x IN xstart .. xend LOOP
          RETURN NEXT x;
      END LOOP;

      RETURN;
  END;
  ' LANGUAGE plpgsql IMMUTABLE STRICT;

  CREATE TABLE stuff (
      id    INTEGER NOT NULL PRIMARY KEY,
      item  VARCHAR(32) NOT NULL
  );

  INSERT INTO stuff (id, item)
    SELECT id, 'Item ' || id FROM gen_series(1, 100000) AS f(id);

  ANALYZE stuff;

Here are two queries; notice how the second query, which uses higher
numbers for the join key, is much slower than the first, apparently
due to an inefficient index scan:

  EXPLAIN ANALYZE
    SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id);

   Merge Join  (cost=62.33..2544.33 rows=1001 width=17) (actual time=1.398..1.950 rows=10 loops=1)
     Merge Cond: ("outer".id = "inner".id)
     ->  Index Scan using stuff_pkey on stuff  (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.667..0.860
rows=11loops=1) 
     ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual time=0.646..0.670 rows=10 loops=1)
           Sort Key: f.id
           ->  Function Scan on gen_series f  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.403..0.478 rows=10
loops=1)
   Total runtime: 2.529 ms
  (7 rows)

  EXPLAIN ANALYZE
    SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id);

   Merge Join  (cost=62.33..2544.33 rows=1001 width=17) (actual time=2907.078..2907.618 rows=10 loops=1)
     Merge Cond: ("outer".id = "inner".id)
     ->  Index Scan using stuff_pkey on stuff  (cost=0.00..2217.00 rows=100000 width=17) (actual time=0.107..2270.722
rows=100000loops=1) 
     ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual time=0.630..0.654 rows=10 loops=1)
           Sort Key: f.id
           ->  Function Scan on gen_series f  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.392..0.469 rows=10
loops=1)
   Total runtime: 2908.205 ms
  (7 rows)

If I turn off enable_mergejoin then both queries are fast:

  SET enable_mergejoin TO off;

  EXPLAIN ANALYZE
    SELECT stuff.* FROM stuff JOIN gen_series(1, 10) AS f(id) USING (id);

   Nested Loop  (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.600..1.912 rows=10 loops=1)
     ->  Function Scan on gen_series f  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.395..0.482 rows=10 loops=1)
     ->  Index Scan using stuff_pkey on stuff  (cost=0.00..3.01 rows=1 width=17) (actual time=0.091..0.107 rows=1
loops=10)
           Index Cond: (stuff.id = "outer".id)
   Total runtime: 2.401 ms
  (5 rows)

  EXPLAIN ANALYZE
    SELECT stuff.* FROM stuff JOIN gen_series(99991, 100000) AS f(id) USING (id);

   Nested Loop  (cost=0.00..3038.50 rows=1001 width=17) (actual time=0.586..1.891 rows=10 loops=1)
     ->  Function Scan on gen_series f  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.394..0.479 rows=10 loops=1)
     ->  Index Scan using stuff_pkey on stuff  (cost=0.00..3.01 rows=1 width=17) (actual time=0.089..0.105 rows=1
loops=10)
           Index Cond: (stuff.id = "outer".id)
   Total runtime: 2.374 ms
  (5 rows)

Is the planner doing something wrong here?

Thanks.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Faster with a sub-query then without
Next
From: Martin Foster
Date:
Subject: Re: Faster with a sub-query then without