Thread: Bug #4284
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">I had an email today about an old bug that I reported back in July 2008.</span></font><p class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"><a href="http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php">http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php</a></span></font><p class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">I didn’t receive any response at the time and I didn’t really follow it up.</span></font><p class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">My report contained a full re-creation script to reproduce the problem and tonight I’m having the sameproblem with CVS Head. To my untrained eye it looks like the planner is not properly pushing down the row count. </span></font><pclass="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">I know this is a busy time for all, but this seems to be distressing a few people. At the time I didn’tfind anyway apart from disabling nested loops.</span></font><p class="MsoNormal"><font face="Arial" size="2"><spanstyle="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">Would anyone gifted in the art of the query planner be able to look at this?</span></font><p class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">David.</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font></div>
"David Rowley" <dgrowley@gmail.com> writes: > My report contained a full re-creation script to reproduce the problem and > tonight I'm having the same problem with CVS Head. To my untrained eye it > looks like the planner is not properly pushing down the row count. It looks more like a multicolumn selectivity issue to me. The planner is supposing that the join condition ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid AND t1.partcode = t2.partcode is going to eliminate some fair-size fraction of t1 rows, whereas in fact the construction of t2 is such that it won't eliminate any of them. This is less obviously true for the join to t4, but I imagine from the rowcounts that it's also true there. So you get an unreasonably small rowcount for whichever join gets done first, and then the nestloop plan looks like a good idea for the second join. regards, tom lane
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 10 February 2009 22:30 > To: David Rowley > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Bug #4284 > > "David Rowley" <dgrowley@gmail.com> writes: > > My report contained a full re-creation script to reproduce the problem > and > > tonight I'm having the same problem with CVS Head. To my untrained eye > it > > looks like the planner is not properly pushing down the row count. > > It looks more like a multicolumn selectivity issue to me. The planner > is supposing that the join condition > > ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid > AND t1.partcode = t2.partcode > > is going to eliminate some fair-size fraction of t1 rows, whereas in > fact the construction of t2 is such that it won't eliminate any of them. > This is less obviously true for the join to t4, but I imagine from the > rowcounts that it's also true there. So you get an unreasonably small > rowcount for whichever join gets done first, and then the nestloop plan > looks like a good idea for the second join. At work I've been (unwillingly) working with MS SQL Server 2008. I notice that when I request a query plan it's an "Estimated" query plan. Perhaps this is to get around problems with bad row estimates. In theory at least it should be possible to revert back to another plan after the inner nested queries have processed and before the join takes place... I know that's a major change, I'm just theorising. Apart from that, do you think that this could only be fixed with stats that span multiple columns? I know this was talked about not so long ago. David.
Tom Lane Wrote: > "David Rowley" <dgrowley@gmail.com> writes: > > My report contained a full re-creation script to reproduce the problem > and > > tonight I'm having the same problem with CVS Head. To my untrained eye > it > > looks like the planner is not properly pushing down the row count. > > It looks more like a multicolumn selectivity issue to me. The planner > is supposing that the join condition > > ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid > AND t1.partcode = t2.partcode > > is going to eliminate some fair-size fraction of t1 rows, whereas in > fact the construction of t2 is such that it won't eliminate any of them. > This is less obviously true for the join to t4, but I imagine from the > rowcounts that it's also true there. So you get an unreasonably small > rowcount for whichever join gets done first, and then the nestloop plan > looks like a good idea for the second join. I thought about this after sending my reply to this last night. I remembered when I created my test case I had to add the other tables to get the nest loop behaviour. I'm not sure your guess about the multicolumn selectivity issue is correct. I re-tested with the following query. EXPLAIN ANALYZE SELECT t1.productiondate, t1.partcode, t1.batchcode, t1.bestbefore FROM batches t1 LEFT OUTER JOIN (SELECT productiondate, lineid, partcode, SUM(quantity) AS quantity FROM production GROUP BY productiondate,partcode,lineid ) t4 ON t1.productiondate = t4.productiondate AND t1.lineid = t4.lineid AND t1.partcode = t4.partcode; The top line of the explain analyze is: Merge Left Join (cost=464.51..510.72 rows=4200 width=21) (actual time=107.872..157.882 rows=4200 loops=1) Perfect row estimate! It seems to be something to do with having those other tables in there. David.
"David Rowley" <dgrowley@gmail.com> writes: > I thought about this after sending my reply to this last night. I remembered > when I created my test case I had to add the other tables to get the nest > loop behaviour. I'm not sure your guess about the multicolumn selectivity > issue is correct. I re-tested with the following query. > ... > Perfect row estimate! The reason for that is that the planner knows that a LEFT JOIN result can't be smaller than the left input, so whatever join size estimate it comes up with from statistics will be clamped to be at least that much. If you change the test case to an inner join you still get the one-row rowcount estimate, same as for the join to t2. regards, tom lane