Re: planner chooses unoptimal plan on joins with complex key - Mailing list pgsql-performance

From Tom Lane
Subject Re: planner chooses unoptimal plan on joins with complex key
Date
Msg-id 26416.1201643984@sss.pgh.pa.us
Whole thread Raw
In response to Re: planner chooses unoptimal plan on joins with complex key  ("Dmitry Potapov" <fortune.fish@gmail.com>)
Responses Re: planner chooses unoptimal plan on joins with complex key
List pgsql-performance
"Dmitry Potapov" <fortune.fish@gmail.com> writes:
> 2008/1/25, Tom Lane <tgl@sss.pgh.pa.us>:
>> It's hard to tell whether the planner is just being overoptimistic
>> about the results of ANDing the three join conditions, or if one or
>> more of the basic condition selectivities were misestimated.

> I wonder if it will ever finish the work :( So, for now, the only
> thing I can show is:

> =# explain select 1 from t1, t2 where t1.m1=t2.m1 ;
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..162742993234.25 rows=57462242912003 width=0)
>    ->  Seq Scan on t2  (cost=0.00..690784.54 rows=30820054 width=4)
>    ->  Index Scan using i_t1_m1 on t1  (cost=0.00..3080.74 rows=175973 width=4)
>          Index Cond: (t1.m1 = t2.m1)
> (4 rows)

So this is estimating the join selectivity for m1 alone as

regression=# select 57462242912003 / (30820054.0 * 21292688);
        ?column?
------------------------
 0.08756260792962293676
(1 row)

which would suggest that there are only 11 or so distinct values for m1;
is that about right?

There's something a bit funny here --- looking at the plan components,
the output row estimate should be about 30820054 * 175973, but it's ten
times that.  I'm not sure if this is just a presentation artifact or
if it points to a real bug.

Could you show us the pg_stats rows for t1.m1 and t2.m1?

The m2 estimate seems nearly dead on, and the m3 estimate is within 10%
or so which is certainly close enough.  Barring the possibility that
there's something wrong underneath that m1 estimate, the products of
the selectivities do indeed work out to give us about 1 row out of the
join with all three conditions.  So my conclusion is that these three
values are not independent, but are highly correlated (in fact, it seems
that m2 alone provides most of the join selectivity).  Do you actually
need to compare all three to make the join, or is one field perhaps
dependent on the other two?  Can you refactor the data to make it so?

            regards, tom lane

pgsql-performance by date:

Previous
From: "Mike Smith"
Date:
Subject: Re: 8x2.5" or 6x3.5" disks
Next
From: "Dmitry Potapov"
Date:
Subject: Re: planner chooses unoptimal plan on joins with complex key