Thread: Bug #4284

Bug #4284

From
"David Rowley"
Date:
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">I had an email today about an old bug that I reported back in July 2008.</span></font><p
class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial"><a
href="http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php">http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php</a></span></font><p
class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">I didn’t receive any response at the time and I didn’t really follow it up.</span></font><p
class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">My report contained a full re-creation script to reproduce the problem and tonight I’m having the
sameproblem with CVS Head. To my untrained eye it looks like the planner is not properly pushing down the row count.
</span></font><pclass="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">I know this is a busy time for all, but this seems to be distressing a few people. At the time I
didn’tfind anyway apart from disabling nested loops.</span></font><p class="MsoNormal"><font face="Arial"
size="2"><spanstyle="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">Would anyone gifted in the art of the query planner be able to look at this?</span></font><p
class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">David.</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial"> </span></font></div>

Re: Bug #4284

From
Tom Lane
Date:
"David Rowley" <dgrowley@gmail.com> writes:
> My report contained a full re-creation script to reproduce the problem and
> tonight I'm having the same problem with CVS Head. To my untrained eye it
> looks like the planner is not properly pushing down the row count. 

It looks more like a multicolumn selectivity issue to me.  The planner
is supposing that the join condition

ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid  AND t1.partcode = t2.partcode

is going to eliminate some fair-size fraction of t1 rows, whereas in
fact the construction of t2 is such that it won't eliminate any of them.
This is less obviously true for the join to t4, but I imagine from the
rowcounts that it's also true there.  So you get an unreasonably small
rowcount for whichever join gets done first, and then the nestloop plan
looks like a good idea for the second join.
        regards, tom lane


Re: Bug #4284

From
"David Rowley"
Date:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 10 February 2009 22:30
> To: David Rowley
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Bug #4284
>
> "David Rowley" <dgrowley@gmail.com> writes:
> > My report contained a full re-creation script to reproduce the problem
> and
> > tonight I'm having the same problem with CVS Head. To my untrained eye
> it
> > looks like the planner is not properly pushing down the row count.
>
> It looks more like a multicolumn selectivity issue to me.  The planner
> is supposing that the join condition
>
> ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid
>    AND t1.partcode = t2.partcode
>
> is going to eliminate some fair-size fraction of t1 rows, whereas in
> fact the construction of t2 is such that it won't eliminate any of them.
> This is less obviously true for the join to t4, but I imagine from the
> rowcounts that it's also true there.  So you get an unreasonably small
> rowcount for whichever join gets done first, and then the nestloop plan
> looks like a good idea for the second join.

At work I've been (unwillingly) working with MS SQL Server 2008. I notice
that when I request a query plan it's an "Estimated" query plan. Perhaps
this is to get around problems with bad row estimates. In theory at least it
should be possible to revert back to another plan after the inner nested
queries have processed and before the join takes place... I know that's a
major change, I'm just theorising.

Apart from that, do you think that this could only be fixed with stats that
span multiple columns? I know this was talked about not so long ago.

David.





Re: Bug #4284

From
"David Rowley"
Date:
Tom Lane Wrote:
> "David Rowley" <dgrowley@gmail.com> writes:
> > My report contained a full re-creation script to reproduce the problem
> and
> > tonight I'm having the same problem with CVS Head. To my untrained eye
> it
> > looks like the planner is not properly pushing down the row count.
>
> It looks more like a multicolumn selectivity issue to me.  The planner
> is supposing that the join condition
>
> ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid
>    AND t1.partcode = t2.partcode
>
> is going to eliminate some fair-size fraction of t1 rows, whereas in
> fact the construction of t2 is such that it won't eliminate any of them.
> This is less obviously true for the join to t4, but I imagine from the
> rowcounts that it's also true there.  So you get an unreasonably small
> rowcount for whichever join gets done first, and then the nestloop plan
> looks like a good idea for the second join.

I thought about this after sending my reply to this last night. I remembered
when I created my test case I had to add the other tables to get the nest
loop behaviour. I'm not sure your guess about the multicolumn selectivity
issue is correct. I re-tested with the following query.

EXPLAIN ANALYZE SELECT t1.productiondate,      t1.partcode,      t1.batchcode,      t1.bestbefore
FROM batches t1
LEFT OUTER JOIN (SELECT productiondate,                       lineid,                       partcode,
   SUM(quantity) AS quantity          FROM production          GROUP BY productiondate,partcode,lineid 
) t4 ON t1.productiondate = t4.productiondate AND t1.lineid = t4.lineid AND
t1.partcode = t4.partcode;

The top line of the explain analyze is:
Merge Left Join  (cost=464.51..510.72 rows=4200 width=21) (actual
time=107.872..157.882 rows=4200 loops=1)

Perfect row estimate!

It seems to be something to do with having those other tables in there.

David.




Re: Bug #4284

From
Tom Lane
Date:
"David Rowley" <dgrowley@gmail.com> writes:
> I thought about this after sending my reply to this last night. I remembered
> when I created my test case I had to add the other tables to get the nest
> loop behaviour. I'm not sure your guess about the multicolumn selectivity
> issue is correct. I re-tested with the following query.
> ...
> Perfect row estimate!

The reason for that is that the planner knows that a LEFT JOIN result
can't be smaller than the left input, so whatever join size estimate it
comes up with from statistics will be clamped to be at least that much.
If you change the test case to an inner join you still get the one-row
rowcount estimate, same as for the join to t2.
        regards, tom lane