Poor plan when joining against a union containing a join - Mailing list pgsql-performance

From David Leverton
Subject Poor plan when joining against a union containing a join
Date
Msg-id CALc3eMXjdWDMRr+48R2LH9fimKHD4JG9aWHDfqmmG=SQPTBVeA@mail.gmail.com
Whole thread Raw
Responses Re: Poor plan when joining against a union containing a join
List pgsql-performance
Hi all,

I'm encountering very poor query plans when joining against a union,
where one branch of the union is itself a join: specifically, Postgres
does the entire inside join and then filters the result, rather than
pushing the filters down to the joined tables.  I've provided a
standalone test case below, including some variations that don't show
the problem for comparison.  The fourth query is the problematic one -
I would have expected the Append portion of the plan to be essentially
the same as the one in the second query.

I'm running Postgres 9.2.3 on Oracle Enterprise Linux 5 (some machines
5.3, some 5.6) x86_64, installed using the RPMs from
http://yum.pgrpms.org/.  The problem occurs with identical results
(other than raw speed) on both a rather underpowered test server with
default Postgres settings and a much better live server which I've
attempted to tune somewhat sensibly.

Thanks for any advice.

START TRANSACTION;

CREATE TABLE bundle_contents(
    item_id     INTEGER PRIMARY KEY,
    bundle_type INTEGER NOT NULL
);
INSERT INTO bundle_contents(item_id, bundle_type)
    SELECT i * 10 + j, i
    FROM GENERATE_SERIES(1, 100) i,
         GENERATE_SERIES(1,  10) j;
CREATE INDEX ON bundle_contents(bundle_type, item_id);
ANALYSE bundle_contents;

CREATE TABLE bundle(
    bundle_id   INTEGER PRIMARY KEY,
    bundle_type INTEGER NOT NULL
);
INSERT INTO bundle(bundle_id, bundle_type)
    SELECT i * 1000 + j, i
    FROM GENERATE_SERIES(1, 100) i,
         GENERATE_SERIES(1, 1000) j;
CREATE INDEX ON bundle(bundle_type, bundle_id);
ANALYSE bundle;

CREATE VIEW bundled_item AS
    SELECT bundle_id AS item_id_a, item_id AS item_id_b
    FROM bundle NATURAL JOIN bundle_contents;

CREATE TABLE unbundled_item(
    item_id_a INTEGER,
    item_id_b INTEGER,
    PRIMARY KEY (item_id_a, item_id_b)
);
INSERT INTO unbundled_item(item_id_a, item_id_b)
    SELECT i, 1 FROM GENERATE_SERIES(1001, 100000, 10) i;
INSERT INTO unbundled_item(item_id_a, item_id_b)
    SELECT i, 2 FROM GENERATE_SERIES(1001, 100000, 20) i;
INSERT INTO unbundled_item(item_id_a, item_id_b)
    SELECT i, 3 FROM GENERATE_SERIES(1001, 100000, 23) i;
INSERT INTO unbundled_item(item_id_a, item_id_b)
    SELECT i, 4 FROM GENERATE_SERIES(1001, 100000, 41) i;
CREATE INDEX ON unbundled_item(item_id_b);
ANALYSE unbundled_item;

CREATE VIEW item AS
    SELECT item_id_a, item_id_b FROM bundled_item
    UNION ALL
    SELECT item_id_a, item_id_b FROM unbundled_item;

CREATE TABLE item_reference(
    reference_id INTEGER PRIMARY KEY,
    item_id_a    INTEGER NOT NULL,
    item_id_b    INTEGER NOT NULL
);
INSERT INTO item_reference(reference_id, item_id_a, item_id_b)
    VALUES(1, 1472, 16),
          (2, 1299, 3);
CREATE INDEX ON item_reference(item_id_a, item_id_b);
CREATE INDEX ON item_reference(item_id_b);
ANALYSE item_reference;

-- no union, nice and fast
EXPLAIN (ANALYSE, BUFFERS)
    SELECT *
    FROM bundled_item
    WHERE (item_id_a, item_id_b) = (1472, 16);
/* http://explain.depesz.com/s/1ye
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..16.56 rows=1 width=8) (actual
time=0.040..0.041 rows=1 loops=1)
   Join Filter: (bundle.bundle_type = bundle_contents.bundle_type)
   Buffers: shared hit=8
   ->  Index Scan using bundle_pkey on bundle  (cost=0.00..8.28 rows=1
width=8) (actual time=0.017..0.017 rows=1 loops=1)
         Index Cond: (bundle_id = 1472)
         Buffers: shared hit=4
   ->  Index Scan using bundle_contents_pkey on bundle_contents
(cost=0.00..8.27 rows=1 width=8) (actual time=0.012..0.013 rows=1
loops=1)
         Index Cond: (item_id = 16)
         Buffers: shared hit=4
 Total runtime: 0.085 ms
(10 rows)
*/

-- using the union, but querying it directly rather
-- than joining to it, still fast
EXPLAIN (ANALYSE, BUFFERS)
    SELECT *
    FROM item
    WHERE (item_id_a, item_id_b) = (1472, 16);
/* http://explain.depesz.com/s/rwA

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..24.84 rows=2 width=8) (actual time=0.017..0.068
rows=1 loops=1)
   Buffers: shared hit=6 read=3
   I/O Timings: read=0.032
   ->  Append  (cost=0.00..24.84 rows=2 width=8) (actual
time=0.015..0.066 rows=1 loops=1)
         Buffers: shared hit=6 read=3
         I/O Timings: read=0.032
         ->  Nested Loop  (cost=0.00..16.56 rows=1 width=8) (actual
time=0.015..0.017 rows=1 loops=1)
               Join Filter: (bundle.bundle_type = bundle_contents.bundle_type)
               Buffers: shared hit=6
               ->  Index Scan using bundle_pkey on bundle
(cost=0.00..8.28 rows=1 width=8) (actual time=0.006..0.007 rows=1
loops=1)
                     Index Cond: (bundle_id = 1472)
                     Buffers: shared hit=3
               ->  Index Scan using bundle_contents_pkey on
bundle_contents  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.004..0.005 rows=1 loops=1)
                     Index Cond: (item_id = 16)
                     Buffers: shared hit=3
         ->  Index Scan using unbundled_item_item_id_b_idx on
unbundled_item  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.049..0.049 rows=0 loops=1)
               Index Cond: (item_id_b = 16)
               Filter: (item_id_a = 1472)
               Buffers: shared read=3
               I/O Timings: read=0.032
 Total runtime: 0.116 ms
(21 rows)
*/

-- join but no union, still fast
EXPLAIN (ANALYSE, BUFFERS)
    SELECT *
    FROM bundled_item
    NATURAL JOIN item_reference
    WHERE reference_id = 1;
/* http://explain.depesz.com/s/oOy
                                                                      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.04..4.50 rows=1 width=12) (actual
time=0.092..0.096 rows=1 loops=1)
   Buffers: shared hit=5 read=3
   I/O Timings: read=0.038
   ->  Merge Join  (cost=1.04..1.33 rows=1 width=16) (actual
time=0.032..0.035 rows=1 loops=1)
         Merge Cond: (bundle_contents.item_id = item_reference.item_id_b)
         Buffers: shared hit=4
         ->  Index Scan using bundle_contents_pkey on bundle_contents
(cost=0.00..43.25 rows=1000 width=8) (actual time=0.010..0.013 rows=7
loops=1)
               Buffers: shared hit=3
         ->  Sort  (cost=1.03..1.04 rows=1 width=12) (actual
time=0.012..0.012 rows=1 loops=1)
               Sort Key: item_reference.item_id_b
               Sort Method: quicksort  Memory: 25kB
               Buffers: shared hit=1
               ->  Seq Scan on item_reference  (cost=0.00..1.02 rows=1
width=12) (actual time=0.005..0.007 rows=1 loops=1)
                     Filter: (reference_id = 1)
                     Rows Removed by Filter: 1
                     Buffers: shared hit=1
   ->  Index Only Scan using bundle_bundle_type_bundle_id_idx on
bundle  (cost=0.00..3.16 rows=1 width=8) (actual time=0.056..0.057
rows=1 loops=1)
         Index Cond: ((bundle_type = bundle_contents.bundle_type) AND
(bundle_id = item_reference.item_id_a))
         Heap Fetches: 1
         Buffers: shared hit=1 read=3
         I/O Timings: read=0.038
 Total runtime: 0.139 ms
(22 rows)
*/

-- join to the union, slow.  the conditions are pushed into the
-- index scan in the non-join branch of the union, but in the join
-- branch they're applied after the entire join is built
EXPLAIN (ANALYSE, BUFFERS)
    SELECT *
    FROM item
    NATURAL JOIN item_reference
    WHERE reference_id = 1;
/* http://explain.depesz.com/s/M73
                                                                     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=27.50..27979.82 rows=26 width=12) (actual
time=3.230..554.473 rows=1 loops=1)
   Buffers: shared hit=452
   ->  Seq Scan on item_reference  (cost=0.00..1.02 rows=1 width=12)
(actual time=0.013..0.016 rows=1 loops=1)
         Filter: (reference_id = 1)
         Rows Removed by Filter: 1
         Buffers: shared hit=1
   ->  Append  (cost=27.50..27978.78 rows=2 width=8) (actual
time=3.214..554.454 rows=1 loops=1)
         Buffers: shared hit=451
         ->  Subquery Scan on "*SELECT* 1"  (cost=27.50..27970.50
rows=1 width=8) (actual time=3.212..554.422 rows=1 loops=1)
               Filter: ((item_reference.item_id_a = "*SELECT*
1".item_id_a) AND (item_reference.item_id_b = "*SELECT* 1".item_id_b))
               Rows Removed by Filter: 999999
               Buffers: shared hit=448
               ->  Hash Join  (cost=27.50..12970.50 rows=1000000
width=8) (actual time=0.617..344.127 rows=1000000 loops=1)
                     Hash Cond: (bundle.bundle_type =
bundle_contents.bundle_type)
                     Buffers: shared hit=448
                     ->  Seq Scan on bundle  (cost=0.00..1443.00
rows=100000 width=8) (actual time=0.009..22.066 rows=100000 loops=1)
                           Buffers: shared hit=443
                     ->  Hash  (cost=15.00..15.00 rows=1000 width=8)
(actual time=0.594..0.594 rows=1000 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 40kB
                           Buffers: shared hit=5
                           ->  Seq Scan on bundle_contents
(cost=0.00..15.00 rows=1000 width=8) (actual time=0.008..0.207
rows=1000 loops=1)
                                 Buffers: shared hit=5
         ->  Index Only Scan using unbundled_item_pkey on
unbundled_item  (cost=0.00..8.28 rows=1 width=8) (actual
time=0.021..0.021 rows=0 loops=1)
               Index Cond: ((item_id_a = item_reference.item_id_a) AND
(item_id_b = item_reference.item_id_b))
               Heap Fetches: 0
               Buffers: shared hit=3
 Total runtime: 554.533 ms
(27 rows)
*/

ROLLBACK;


pgsql-performance by date:

Previous
From: Julien Cigar
Date:
Subject: Re: Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)
Next
From: Jeff Janes
Date:
Subject: Re: Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)