Thread: Spurious rows returned with left join?
I think I have stumbled on a bug, though I'm not entirely sure about that. Things do seem to get a little fuzzy when using outer joins .... Consider the following: create table t1(t1_a int); insert into t1 values (1); insert into t1 values (2); insert into t1 values (3); create table t2(t2_a int , t2_b int); insert into t2 values (1, 1); insert into t2 values (2, 1); create table t3(t3_b int, t3_c int); insert into t3 values (1, 9); insert into t3 values (1, 10); select * from t1 left join t2 on t1_a = t2_a left join t3 on t2_b = t3_b order by t1_a, t2_b; t1_a | t2_a | t2_b | t3_b | t3_c ------+------+------+------+------ 1 | 1 | 1 | 1 | 9 1 | 1 | 1 | 1 | 10 2 | 2 | 1 | 1 | 9 2 | 2 | 1 | 1 | 10 3 | | | | 3 | | | | Note that I get 2 rows where t1_a = 3. My pgsql version is: PostgreSQL 8.1.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) (RedHat ES4 rpms from postgresql.org) I have tried this same select on pgsql ver-7.4.8, CVS tip, as well as SQL Server 2005, MySQL and Firebird, all of which return just one row for t1_a = 3. Since writing the original query I've realized that this is partly a problem with the INNER JOINS before OUTER JOINS query writing principle (law?) and the query should perhaps be better written as: select t1.*, t2.*, t3.* from t2 join t3 on t2_b = t3_b right join t1 on t1_a = t2_a; which does return just one row for t1_a =3 for all postgres versions I have currently available. Edmund
Edmund Bacon <ebacon@onesystem.com> writes: > Consider the following: > ... > Note that I get 2 rows where t1_a = 3. Are you getting a Merge Right Join plan for that? If so, you're likely getting bit by this bug: 2006-03-17 14:38 tgl * src/: backend/executor/nodeMergejoin.c, test/regress/expected/join.out, test/regress/expected/join_1.out, test/regress/sql/join.sql (REL8_1_STABLE), backend/executor/nodeMergejoin.c, test/regress/expected/join.out, test/regress/expected/join_1.out, test/regress/sql/join.sql: Fix bug introduced into mergejoin logic by performance improvement patch of 2005-05-13. When we find that a new inner tuple can't possibly match any outer tuple (because it contains a NULL), we can't immediately skip the tuple when we are in NEXTINNER state. Doing so can lead to emitting multiple copies of the tuple in FillInner mode, because we may rescan the tuple after returning to a previous marked tuple. Instead, proceed to NEXTOUTER state the same as we used to do. After we've found that there's no need to return to the marked position, we can go to SKIPINNER_ADVANCE state instead of SKIP_TEST when the inner tuple is unmatchable; this preserves the performance improvement. Per bug report from Bruce. I also made a couple of cosmetic code rearrangements and added a regression test for the problem. This'll be fixed in 8.1.4, or if you're in a hurry you can get the patch from our CVS server. regards, tom lane
Tom Lane wrote: >Edmund Bacon <ebacon@onesystem.com> writes: > > >>Consider the following: >>... >>Note that I get 2 rows where t1_a = 3. >> >> > >Are you getting a Merge Right Join plan for that? If so, you're likely >getting bit by this bug: > >2006-03-17 14:38 tgl > > > That's correct. After doing an analyze on t1,t2,t3 the plan changed to Hash Left Join, and I got the expected results. >This'll be fixed in 8.1.4, or if you're in a hurry you can get the patch >from our CVS server. > > regards, tom lane > > No great worries. As remarked in original post, by doing the inner join first, I got the right results. I didn't see anything in the archives about this, so I thought it might be an unknown issue. Admittedly, I didn't scour the archives too thouroghly .. Edmund