Thread: Unnecessary repeat condition for a self inner join
<span style="font-family: verdana,sans-serif;">Hi,</span><br style="font-family: verdana,sans-serif;" /><br style="font-family:verdana,sans-serif;" /><span style="font-family: verdana,sans-serif;">I am not sure if this is a simple(... stupid) question but I just wasted two hours optimizing a query, so I thought I should drop in to ask.</span><brstyle="font-family: verdana,sans-serif;" /><br style="font-family: verdana,sans-serif;" /><span style="font-family:verdana,sans-serif;">The only difference between query1 and query2 (below) is that despite an explicitINNER JOIN, I have repeated the same condition for n2 (as given for n1) and this makes a whole lot of differencein performance (since it now uses the same index for n2 that it is using for n1).</span><br style="font-family:verdana,sans-serif;" /><br style="font-family: verdana,sans-serif;" /><span style="font-family: verdana,sans-serif;">Incase of an INNER JOIN, shouldn't the second condition (in Query2) be unnecessary ? <br />Or am I beingunreasonable in this expectation ?</span><br style="font-family: verdana,sans-serif;" /><br style="font-family: verdana,sans-serif;"/><span style="font-family: verdana,sans-serif;">Regards,</span><br style="font-family: verdana,sans-serif;"/><b style="font-family: verdana,sans-serif;">Robins Tharakan</b><br /><br /><span style="font-family:verdana,sans-serif;">p.s.: The query below is just a simplification, and provides only EXPLAIN, but Ithink an EXPLAIN ANALYSE should be unnecessary here. In case anyone still needs it, please do tell.</span><br /><br /><bstyle="font-family: courier new,monospace;">Query 1</b><span style="font-family: courier new,monospace;">:</span><brstyle="font-family: courier new,monospace;" /><br style="font-family: courier new,monospace;"/><span style="font-family: courier new,monospace;">SELECT n1.scheme_code</span><br style="font-family: couriernew,monospace;" /><span style="font-family: courier new,monospace;">FROM nav n1</span><br style="font-family: couriernew,monospace;" /><span style="font-family: courier new,monospace;"> INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">WHEREn1.scheme_code BETWEEN 100 AND 200</span><br style="font-family: courier new,monospace;" /><br style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">"Merge Join (cost=903471.23..10248343.37rows=622920912 width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family:courier new,monospace;">" Merge Cond: (n1.scheme_code = n2.scheme_code)"</span><br style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">" -> Sort (cost=110929.32..111458.60rows=211712 width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family:courier new,monospace;">" Sort Key: n1.scheme_code"</span><br style="font-family: courier new,monospace;"/><span style="font-family: courier new,monospace;">" -> Bitmap Heap Scan on nav n1 (cost=8623.86..92201.54rows=211712 width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family:courier new,monospace;">" Recheck Cond: ((scheme_code >= 100) AND (scheme_code <= 200))"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">" -> Bitmap Index Scan on pk_fs_nav (cost=0.00..8570.94 rows=211712 width=0)"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">" Index Cond: ((scheme_code >= 100) AND (scheme_code <= 200))"</span><br style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">" -> Sort (cost=792541.91..805391.17rows=5139702 width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family:courier new,monospace;">" Sort Key: n2.scheme_code"</span><br style="font-family: courier new,monospace;"/><span style="font-family: courier new,monospace;">" -> Seq Scan on nav n2 (cost=0.00..131799.02rows=5139702 width=4)"</span><br style="font-family: courier new,monospace;" /><br style="font-family:courier new,monospace;" /><br style="font-family: courier new,monospace;" /><b style="font-family: couriernew,monospace;">Query 2</b><span style="font-family: courier new,monospace;">:</span><br style="font-family: couriernew,monospace;" /><br style="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">SELECTn1.scheme_code</span><br style="font-family: courier new,monospace;" /><span style="font-family: couriernew,monospace;">FROM nav n1</span><br style="font-family: courier new,monospace;" /><span style="font-family: couriernew,monospace;"> INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code</span><br style="font-family: courier new,monospace;"/><span style="font-family: courier new,monospace;">WHERE n1.scheme_code BETWEEN 100 AND 200</span><br style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;"> AND n2.scheme_code BETWEEN100 AND 200</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;"> </span><br style="font-family: courier new,monospace;" /><br style="font-family: courier new,monospace;"/><span style="font-family: courier new,monospace;">"Merge Join (cost=221858.63..607790.72 rows=25659043width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">" Merge Cond: (n2.scheme_code = n1.scheme_code)"</span><br style="font-family: courier new,monospace;" /><spanstyle="font-family: courier new,monospace;">" -> Sort (cost=110929.32..111458.60 rows=211712 width=4)"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">" Sort Key: n2.scheme_code"</span><br style="font-family: courier new,monospace;" /><span style="font-family:courier new,monospace;">" -> Bitmap Heap Scan on nav n2 (cost=8623.86..92201.54 rows=211712width=4)"</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">" Recheck Cond: ((scheme_code >= 100) AND (scheme_code <= 200))"</span><br style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">" -> BitmapIndex Scan on pk_fs_nav (cost=0.00..8570.94 rows=211712 width=0)"</span><br style="font-family: courier new,monospace;"/><span style="font-family: courier new,monospace;">" Index Cond: ((scheme_code >= 100)AND (scheme_code <= 200))"</span><br style="font-family: courier new,monospace;" /><span style="font-family: couriernew,monospace;">" -> Sort (cost=110929.32..111458.60 rows=211712 width=4)"</span><br style="font-family: couriernew,monospace;" /><span style="font-family: courier new,monospace;">" Sort Key: n1.scheme_code"</span><br style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">" -> Bitmap HeapScan on nav n1 (cost=8623.86..92201.54 rows=211712 width=4)"</span><br style="font-family: courier new,monospace;" /><spanstyle="font-family: courier new,monospace;">" Recheck Cond: ((scheme_code >= 100) AND (scheme_code<= 200))"</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">" -> Bitmap Index Scan on pk_fs_nav (cost=0.00..8570.94 rows=211712 width=0)"</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">" Index Cond: ((scheme_code >= 100) AND (scheme_code <= 200))"</span><br style="font-family:courier new,monospace;" /><br style="font-family: courier new,monospace;" /><br />
"Robins Tharakan" <tharakan@gmail.com> writes: > In case of an INNER JOIN, shouldn't the second condition (in Query2) be > unnecessary ? > Or am I being unreasonable in this expectation ? > SELECT n1.scheme_code > FROM nav n1 > INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code > WHERE n1.scheme_code BETWEEN 100 AND 200 > AND n2.scheme_code BETWEEN 100 AND 200 While the optimizer theoretically could deduce the extra restriction condition, it doesn't attempt to. It's extremely unclear that the extra cycles to look for such cases would be repaid on average, because cases like this aren't that common. The current state of affairs is that the system will deduce implied equality conditions, but not implied inequality conditions. [ thinks for a bit... ] The current policy has been driven in part by the assumption that looking for cases where such a deduction could apply would be pretty expensive. I wonder though whether the recent EquivalenceClass work has changed the landscape. We now store an explicit representation of the btree opclasses associated with each equivalence condition, which is one of the pieces that would be needed to match up the equivalences with inequality conditions. I'm still dubious, but that's at least one less catalog search ... regards, tom lane
While the optimizer theoretically could deduce the extra restriction
condition, it doesn't attempt to. It's extremely unclear that the extra
cycles to look for such cases would be repaid on average, because cases
like this aren't that common. The current state of affairs is that
the system will deduce implied equality conditions, but not implied
inequality conditions.
One good thing is that the equality conditions are taken care of. But I fail to understand why do you believe that the second case is rare. I think the optimizer would (in all self-join inequality conditions) tend towards a table scan, which for a large table is a disaster. (Of course, the index scan would help only if the result-set is small)
Besides, I did a simple test and although you are right about the optimizer deducing implied equality conditions, this holds true only for a direct join. In the second query, the optimizer recommends a table scan even for a simple IN() condition.
Is that normal ?
Regards,
Robins Tharakan
Query 1:
SELECT n1.scheme_code
FROM nav n1
INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code
WHERE n1.scheme_code = 290
"Nested Loop (cost=0.00..16147232.47 rows=4796100 width=4)"
" -> Index Scan using nav__schemecode_date_lookup3b on nav n1 (cost=0.00..7347.91 rows=2190 width=4)"
" Index Cond: (scheme_code = 290)"
" -> Index Scan using nav__schemecode_date_lookup3b on nav n2 (cost=0.00..7347.91 rows=2190 width=4)"
" Index Cond: (290 = scheme_code)"
Query 2:
SELECT n1.scheme_code
FROM nav n1
INNER JOIN nav n2 ON n1.scheme_code = n2.scheme_code
WHERE n1.scheme_code IN (1, 2)
"Hash Join (cost=206004.00..431864.83 rows=10720451 width=4)"
" Hash Cond: (n1.scheme_code = n2.scheme_code)"
" -> Bitmap Heap Scan on nav n1 (cost=139.62..13663.13 rows=4378 width=4)"
" Recheck Cond: (scheme_code = ANY ('{1,2}'::integer[]))"
" -> Bitmap Index Scan on nav__schemecode_date_lookup3b (cost=0.00..138.53 rows=4378 width=0)"
" Index Cond: (scheme_code = ANY ('{1,2}'::integer[]))"
" -> Hash (cost=112078.06..112078.06 rows=5395306 width=4)"
" -> Seq Scan on nav n2 (cost=0.00..112078.06 rows=5395306 width=4)"
"Robins Tharakan" <tharakan@gmail.com> writes: > Besides, I did a simple test and although you are right about the optimizer > deducing implied equality conditions, this holds true only for a direct > join. In the second query, the optimizer recommends a table scan even for a > simple IN() condition. An IN is not an equivalence condition. regards, tom lane