Thread: BUG #1241: returns different result for the same result with differnt plans.

BUG #1241: returns different result for the same result with differnt plans.

From
"PostgreSQL Bugs List"
Date:
The following bug has been logged online:

Bug reference:      1241
Logged by:          Xiaoyu Wang

Email address:      wangxy@cs.brandeis.edu

PostgreSQL version: 7.4.3

Operating system:   Mandrake Linux 9.2 3.3.1-2mdk

Description:        returns different result for the same result with
differnt plans.

Details:

database: TPC-H with scale factor=1.0
query: 13.sql (TPC-H)

select
  c_count,
  count(*) as custdist
from
  (
    select
      c_custkey,
      count(o_orderkey)
    from
      customer left outer join orders on
        c_custkey = o_custkey
        and o_comment not like '%special%requests'
    group by
      c_custkey
  ) as c_orders (c_custkey, c_count)
group by
  c_count
order by
  custdist desc,
  c_count desc;


I ran the query with enable_mergejoin set to on/off, the results are
different. Postgres chose Merge Left Join when enable_mergejoin is on and
Hash Left Join when it is off. I dumped the results to two files, merge.data
and hash.data. Here is the result when I do a diff:

diff merge.data hash.data
3d2
<        0 |    50004
41a41
>        0 |        4
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
> Description:        returns different result for the same result with
> differnt plans.

> database: TPC-H with scale factor=1.0
> query: 13.sql (TPC-H)

This is not *nearly* enough information to let someone else reproduce
the problem.  (Heck, I can't even tell which answer is wrong.)
Could you boil it down to a self-contained test script?

            regards, tom lane
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
> Description:        returns different result for the same result with
> differnt plans.

> I ran the query with enable_mergejoin set to on/off, the results are
> different. Postgres chose Merge Left Join when enable_mergejoin is on and
> Hash Left Join when it is off. I dumped the results to two files, merge.data
> and hash.data. Here is the result when I do a diff:

Found it.  This has actually been broken ever since we implemented outer
joins in PG 7.1.  If the join is large enough to need to be broken into
multiple hash batches, and some of the inner batches are completely empty,
the code would skip processing those batches altogether.  Which is fine ...
unless it's a LEFT JOIN :-(.  Here's the patch for the 7.4 branch:

Index: nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.57.2.1
diff -c -r1.57.2.1 nodeHashjoin.c
*** nodeHashjoin.c    25 Nov 2003 19:17:16 -0000    1.57.2.1
--- nodeHashjoin.c    17 Sep 2004 18:23:05 -0000
***************
*** 602,613 ****
      }

      /*
!      * We can skip over any batches that are empty on either side. Release
!      * associated temp files right away.
       */
      while (newbatch <= nbatch &&
!            (innerBatchSize[newbatch - 1] == 0L ||
!             outerBatchSize[newbatch - 1] == 0L))
      {
          BufFileClose(hashtable->innerBatchFile[newbatch - 1]);
          hashtable->innerBatchFile[newbatch - 1] = NULL;
--- 602,615 ----
      }

      /*
!      * Normally we can skip over any batches that are empty on either side
!      * --- but for JOIN_LEFT, can only skip when left side is empty.
!      * Release associated temp files right away.
       */
      while (newbatch <= nbatch &&
!            (outerBatchSize[newbatch - 1] == 0L ||
!             (innerBatchSize[newbatch - 1] == 0L &&
!              hjstate->js.jointype != JOIN_LEFT)))
      {
          BufFileClose(hashtable->innerBatchFile[newbatch - 1]);
          hashtable->innerBatchFile[newbatch - 1] = NULL;



            regards, tom lane
Mark Shewmaker <mark@primefactor.com> writes:
> (BTW, I had sent my first reply and this privately as I'm far from sure
> of myself in these questions, so but feel free to reply to either
> publicly if you want to, or I can re-reply publicly.)

cc'd to pgbugs in case anyone else is wondering the same.

>> On Fri, 2004-09-17 at 15:10, Tom Lane wrote:
>>> Mark Shewmaker <mark@primefactor.com> writes:
>>>> the code would skip processing those batches altogether.  Which is fine...
>>>> unless it's a LEFT JOIN :-(.
>>
>>> Would the same problem then also exist for right outer joins?
>>
>> Yup, if the planner chose to flip it into a left join and apply hash
>> joining (which is altogether likely --- merge join is the only executor
>> join method that supports right join directly, and even then only in
>> restricted cases).

> Okay, so there doesn't need to be any fix for right joins specifically
> then.  (I had seen the "!= JOIN_LEFT" in your patch, which made me
> wonder whether there was a need for something like a "!= (JOIN_LEFT |
> JOIN_RIGHT)" in there.  Obviously I'm pretty clueless as to whether
> that's really necessary of course.)

It's not.  If you look at ExecInitHashJoin you'll see that it rejects
JOIN_RIGHT, should the planner be buggy enough to ask it to do that.

            regards, tom lane