Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Parameterized-path cost comparisons need some work |
Date | |
Msg-id | 29248.1330468514@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Parameterized-path cost comparisons need some work
Re: Parameterized-path cost comparisons need some work Re: Parameterized-path cost comparisons need some work |
List | pgsql-hackers |
I was experimenting today with a test case sent to me by somebody at Red Hat. The details aren't too important, except that it involves an inner join on the inside of a left join (where it cannot commute with the left join). I can reproduce the behavior with a standard regression test table, if I crank random_page_cost up a bit: regression=# set random_page_cost TO 5; SET regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred= t2.hundred where t1.unique1 = 1; QUERY PLAN ------------------------------------------------------------------------------------------------Nested Loop Left Join (cost=0.00..507.16rows=100 width=732) -> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.00..10.28 rows=1 width=244) Index Cond: (unique1 = 1) -> Nested Loop (cost=0.00..495.63 rows=100 width=488) -> Index Scanusing tenk1_hundred on tenk1 t2 (cost=0.00..446.98 rows=100 width=244) Index Cond: (t1.hundred = hundred) -> Index Scan using tenk1_unique2 on tenk1 t3 (cost=0.00..0.48 rows=1 width=244) Index Cond:(unique2 = t2.thousand) (8 rows) regression=# set random_page_cost TO 6; SET regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred= t2.hundred where t1.unique1 = 1; QUERY PLAN ---------------------------------------------------------------------------------------------Hash Right Join (cost=928.30..2573.80rows=100 width=732) Hash Cond: (t2.hundred = t1.hundred) -> Hash Join (cost=916.00..2523.00 rows=10000width=488) Hash Cond: (t2.thousand = t3.unique2) -> Seq Scan on tenk1 t2 (cost=0.00..458.00 rows=10000width=244) -> Hash (cost=458.00..458.00 rows=10000 width=244) -> Seq Scan on tenk1 t3 (cost=0.00..458.00rows=10000 width=244) -> Hash (cost=12.29..12.29 rows=1 width=244) -> Index Scan using tenk1_unique1on tenk1 t1 (cost=0.00..12.29 rows=1 width=244) Index Cond: (unique1 = 1) (10 rows) WTF? How did it suddenly fail to find the double-nested-loop plan, and have to settle for a much worse plan? The reason seems to be that for large enough random_page_cost, the seqscan on t2 is costed as cheaper than an indexscan that selects a significant fraction of the table. (Notice the t2 scans are nearly the same cost in these two examples.) That causes add_path to decide that the unparameterized seqscan is unconditionally better than the parameterized indexscan, and throw out the latter. Now it is impossible to form the parameterized nestloop subplan. The flaw in this logic, of course, is that the seqscan might be cheaper than the parameterized indexscan, but *it produces a whole lot more rows*, meaning that any subsequent join will be a lot more expensive. Previously add_path didn't have to worry about that, because all ordinary paths for a given relation produce the same number of rows (and we studiously kept things like inner indexscan paths out of add_path's view of the world). The most obvious thing to do about this is to consider that one path can dominate another on cost only if it is both cheaper *and* produces no more rows. But I'm concerned about the cost of inserting yet another test condition into add_path, which is slow enough already. Has anyone got an idea for a different formulation that would avoid that? regards, tom lane
pgsql-hackers by date: