Re: BUG #14169: Incorrect merge join result in 9.5 - Mailing list pgsql-bugs

From Kevin Grittner
Subject Re: BUG #14169: Incorrect merge join result in 9.5
Date
Msg-id CACjxUsMH8TReUobME3S2BgqN9qifkSo18j8jfAQARn4GRcq9pg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #14169: Incorrect merge join result in 9.5  (Kevin Grittner <kgrittn@gmail.com>)
Responses Re: BUG #14169: Incorrect merge join result in 9.5  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Wed, Jun 1, 2016 at 11:15 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Wed, Jun 1, 2016 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> yancya@upec.jp writes:
>>> TRAP: FailedAssertion("!(compareResult < 0)", File: "nodeMergejoin.c", Line: 942)
>>
>> This is not a mergejoin logic bug, because nodeMergejoin.c didn't change
>> significantly between 9.4 and 9.5.  It must be that the input data is not
>> being delivered in the expected order.  I first thought that Peter G's
>> sorting optimizations must be at fault, but if you run either of the
>> mergejoin's subplans in isolation, you get correctly sorted data.  What
>> must be happening, then, is that mergejoin's mark/restore operations are
>> confusing the btree indexscan and causing it to deliver the wrong tuple(s)
>> after a restore.
>>
>> Armed with that conclusion about where the bug probably is, I looked
>> through the git history, and soon found that the crash goes away if
>> I manually revert commit 2ed5b87f96d473962ec5230fd820abfeaccb2069.
>>
>> In short: Kevin, you broke mark/restore.  Please fix.
>
> I'm on it.

Fix pushed.  Basically, I reverted an attempt to optimize repeated
restores to the same page.  I had a rather bad thinko there where I
essentially assumed that it was also to the same mark, so advancing
marks on the same page caused the bug.  There's probably room to
optimize that with more refined logic, but for now I just reverted
the problem code.

I reduced the test case supplied by the OP to something smaller
which still failed the assertion without the patch, but am having
trouble getting it to run in the regression test environment with a
stable plan.  In order to deal with 9.6beta1 issues I'm setting
that aside for the moment and will see if I can get something
commit-worthy into the regression tests for this once I clear the
beta issues.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #14171: Wrong FSM file after switching hot standby to master
Next
From: Tom Lane
Date:
Subject: Re: BUG #14169: Incorrect merge join result in 9.5