BUG #5539: Generally incorrect planner estimations on join with UNION ALL - Mailing list pgsql-bugs
From | Maksym Boguk |
---|---|
Subject | BUG #5539: Generally incorrect planner estimations on join with UNION ALL |
Date | |
Msg-id | 201007040047.o640lNpD039563@wwwmaster.postgresql.org Whole thread Raw |
Responses |
Re: BUG #5539: Generally incorrect planner estimations on join
with UNION ALL
|
List | pgsql-bugs |
The following bug has been logged online: Bug reference: 5539 Logged by: Maksym Boguk Email address: Maxim.Boguk@gmail.com PostgreSQL version: 8.4.3 Operating system: FreeBSD 7.2 Description: Generally incorrect planner estimations on join with UNION ALL Details: If query contains an join with '(t1 union all t2) on something' planner estimated rows in results widely incorrect, thus leads to very bad plans if query contained more then that single join. However, looking on explain analyze output, planner have all required information to produce correct estimation. The test case. Step1, initial data preparation: mboguk_billing=# CREATE TABLE t1 AS SELECT v as id FROM generate_series(1,100000) as g(v); SELECT mboguk_billing=# CREATE TABLE t2 AS SELECT v as id FROM generate_series(1,100000) as g(v); SELECT mboguk_billing=# CREATE TABLE t3 AS SELECT v as id,v as value FROM generate_series(1,100000) as g(v); SELECT mboguk_billing=# CREATE INDEX t1_id_key on t1(id); CREATE INDEX mboguk_billing=# CREATE INDEX t2_id_key on t2(id); CREATE INDEX mboguk_billing=# CREATE INDEX t3_id_key on t3(id); CREATE INDEX mboguk_billing=# CREATE INDEX t3_value_key on t3(value); CREATE INDEX mboguk_billing=# ANALYZE t1; ANALYZE mboguk_billing=# ANALYZE t2; ANALYZE mboguk_billing=# ANALYZE t3; ANALYZE Ok now we have 3 table with 100000 rows each. Step 2: test query: EXPLAIN ANALYZE SELECT t3.*,t_all.* FROM t3 JOIN ( SELECT * FROM t1 UNION ALL SELECT * FROM t2 ) as t_all ON t_all.id=t3.id WHERE t3.value=100; QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------- Nested Loop (cost=0.00..1.49 rows=1000 width=12) (actual time=0.076..0.116 rows=2 loops=1) Join Filter: (t3.id = t1.id) -> Index Scan using t3_value_key on t3 (cost=0.00..0.49 rows=1 width=8) (actual time=0.038..0.040 rows=1 loops=1) Index Cond: (value = 100) -> Append (cost=0.00..0.98 rows=2 width=4) (actual time=0.018..0.047 rows=2 loops=1) -> Index Scan using t1_id_key on t1 (cost=0.00..0.49 rows=1 width=4) (actual time=0.015..0.019 rows=1 loops=1) Index Cond: (t1.id = t3.id) -> Index Scan using t2_id_key on t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1) Index Cond: (t2.id = t3.id) Total runtime: 0.235 ms (10 rows) Here problems already easy visible: Nested Loop (cost=0.00..1.49 rows=1000 width=12) vs actual: (actual time=0.076..0.116 rows=2 loops=1) 1000 rows predicted (look like hard-coded heuristic value) vs 2 row in reality, but if careful examine key parts of the plan: Nested Loop (cost=0.00..1.49 rows=1000 width=12) (actual time=0.076..0.116 rows=2 loops=1) Join Filter: (t3.id = t1.id) -> Index Scan using t3_value_key on t3 (cost=0.00..0.49 rows=1 width=8) (actual time=0.038..0.040 rows=1 loops=1) Index Cond: (value = 100) -> Append (cost=0.00..0.98 rows=2 width=4) (actual time=0.018..0.047 rows=2 loops=1) notable planner had correctly estimated amount of rows in one side of the join: (Append (cost=0.00..0.98 rows=2 width=4)) and planner hard correctly estimated amount of rows in the another side of the join: Index Scan using t3_value_key on t3 (cost=0.00..0.49 rows=1 width=8) And instead of use 2(from append)*1(from scan on t3)=2 as final row estimation, planner chooses use heuristic value 1000. That isn't problem in this simple case, but once you try to add another join on t_all with some big table you screwed because planner choose slow hash or merge join with that big table instead of fast inner loop over two values. PS: another small question... are Join Filter: (t3.id = t1.id) is really required in that case? And if yes, probably in explain analyze it should produce something like Join Filter: (t3.id = t_all.id) instead. Thanks for your time and attention.
pgsql-bugs by date: