Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers

I was experimenting today with a test case sent to me by somebody at Red
Hat.  The details aren't too important, except that it involves an inner
join on the inside of a left join (where it cannot commute with the left
join).  I can reproduce the behavior with a standard regression test
table, if I crank random_page_cost up a bit:

regression=# set random_page_cost TO 5;
SET
regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on
t1.hundred= t2.hundred where t1.unique1 = 1;                                          QUERY PLAN
                  
 
------------------------------------------------------------------------------------------------Nested Loop Left Join
(cost=0.00..507.16rows=100 width=732)  ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.00..10.28 rows=1
width=244)       Index Cond: (unique1 = 1)  ->  Nested Loop  (cost=0.00..495.63 rows=100 width=488)        ->  Index
Scanusing tenk1_hundred on tenk1 t2  (cost=0.00..446.98 rows=100 width=244)              Index Cond: (t1.hundred =
hundred)       ->  Index Scan using tenk1_unique2 on tenk1 t3  (cost=0.00..0.48 rows=1 width=244)              Index
Cond:(unique2 = t2.thousand)
 
(8 rows)

regression=# set random_page_cost TO 6;
SET
regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on
t1.hundred= t2.hundred where t1.unique1 = 1;                                        QUERY PLAN
               
 
---------------------------------------------------------------------------------------------Hash Right Join
(cost=928.30..2573.80rows=100 width=732)  Hash Cond: (t2.hundred = t1.hundred)  ->  Hash Join  (cost=916.00..2523.00
rows=10000width=488)        Hash Cond: (t2.thousand = t3.unique2)        ->  Seq Scan on tenk1 t2  (cost=0.00..458.00
rows=10000width=244)        ->  Hash  (cost=458.00..458.00 rows=10000 width=244)              ->  Seq Scan on tenk1 t3
(cost=0.00..458.00rows=10000 width=244)  ->  Hash  (cost=12.29..12.29 rows=1 width=244)        ->  Index Scan using
tenk1_unique1on tenk1 t1  (cost=0.00..12.29 rows=1 width=244)              Index Cond: (unique1 = 1)
 
(10 rows)

WTF?  How did it suddenly fail to find the double-nested-loop plan, and
have to settle for a much worse plan?

The reason seems to be that for large enough random_page_cost, the
seqscan on t2 is costed as cheaper than an indexscan that selects a
significant fraction of the table.  (Notice the t2 scans are nearly the
same cost in these two examples.)  That causes add_path to decide that
the unparameterized seqscan is unconditionally better than the
parameterized indexscan, and throw out the latter.  Now it is impossible
to form the parameterized nestloop subplan.

The flaw in this logic, of course, is that the seqscan might be cheaper
than the parameterized indexscan, but *it produces a whole lot more
rows*, meaning that any subsequent join will be a lot more expensive.
Previously add_path didn't have to worry about that, because all
ordinary paths for a given relation produce the same number of rows
(and we studiously kept things like inner indexscan paths out of
add_path's view of the world).

The most obvious thing to do about this is to consider that one path can
dominate another on cost only if it is both cheaper *and* produces no
more rows.  But I'm concerned about the cost of inserting yet another
test condition into add_path, which is slow enough already.  Has anyone
got an idea for a different formulation that would avoid that?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: NULL's support in SP-GiST
Next
From: Tom Lane
Date:
Subject: Re: Should we add crc32 in libpgport?