Thread: BUG #5590: undefined shift behavior

BUG #5590: undefined shift behavior

From
"John Regehr"
Date:
The following bug has been logged online:

Bug reference:      5590
Logged by:          John Regehr
Email address:      regehr@cs.utah.edu
PostgreSQL version: head 8/2/10
Operating system:   OSX
Description:        undefined shift behavior
Details:

During a "make check" the left-shift operator at tsquery_util.c 48:18 is
passed a negative right-hand argument a number of times.

Re: BUG #5590: undefined shift behavior

From
Tom Lane
Date:
"John Regehr" <regehr@cs.utah.edu> writes:
> Bug reference:      5590
> Logged by:          John Regehr
> Email address:      regehr@cs.utah.edu
> PostgreSQL version: head 8/2/10
> Operating system:   OSX
> Description:        undefined shift behavior
> Details:

> During a "make check" the left-shift operator at tsquery_util.c 48:18 is
> passed a negative right-hand argument a number of times.

Hmm.  valcrc is declared as signed int32, so depending on what your
compiler thinks the semantics of % is, this clearly can potentially
happen.  I notice the same problem in makeTSQuerySign() in tsquery_op.c.

The fix is presumably to cast the valcrc value to unsigned int before
executing %.  However, I'm a bit worried about whether this could change
the results, and if it did whether that would invalidate any on-disk
data structures.  Oleg, Teodor, do either TSQuerySign or QTNode.sign
ever get to disk?

John: how did you detect this?

            regards, tom lane

Re: BUG #5590: undefined shift behavior

From
John Regehr
Date:
Hi Tom,

One of my students has hacked Clang to detect integer undefined
behaviors in C, like this shift problem or signed overflows.  This was
the only problem that came up during a "make check" of a postgresql with
this checking turned on, which is pretty cool.

I'd expect to be able to find more problems if I could get hold of a
good fuzz tester for postgresql, or at least some much larger test
inputs. Are there any of these you folks would suggest that I use?

Thanks,

John


On 08/02/2010 09:06 AM, Tom Lane wrote:
> "John Regehr" <regehr@cs.utah.edu> writes:
>> Bug reference:      5590
>> Logged by:          John Regehr
>> Email address:      regehr@cs.utah.edu
>> PostgreSQL version: head 8/2/10
>> Operating system:   OSX
>> Description:        undefined shift behavior
>> Details:
>
>> During a "make check" the left-shift operator at tsquery_util.c 48:18 is
>> passed a negative right-hand argument a number of times.
>
> Hmm.  valcrc is declared as signed int32, so depending on what your
> compiler thinks the semantics of % is, this clearly can potentially
> happen.  I notice the same problem in makeTSQuerySign() in tsquery_op.c.
>
> The fix is presumably to cast the valcrc value to unsigned int before
> executing %.  However, I'm a bit worried about whether this could change
> the results, and if it did whether that would invalidate any on-disk
> data structures.  Oleg, Teodor, do either TSQuerySign or QTNode.sign
> ever get to disk?
>
> John: how did you detect this?
>
>             regards, tom lane
>

Re: BUG #5590: undefined shift behavior

From
Tom Lane
Date:
John Regehr <regehr@cs.utah.edu> writes:
> On 08/02/2010 09:06 AM, Tom Lane wrote:
>> John: how did you detect this?

> One of my students has hacked Clang to detect integer undefined
> behaviors in C, like this shift problem or signed overflows.

Cool.

> This was
> the only problem that came up during a "make check" of a postgresql with
> this checking turned on, which is pretty cool.

Hrm, I'd have expected you to see a few integer overflows during the
regression tests --- we do test that the overflow checks in places
like int4pl work.  You might be interested in this concurrent thread:
http://archives.postgresql.org/pgsql-hackers/2010-08/msg00024.php
particularly the comments about overflow.

> I'd expect to be able to find more problems if I could get hold of a
> good fuzz tester for postgresql, or at least some much larger test
> inputs. Are there any of these you folks would suggest that I use?

Yeah, the PG regression tests aren't amazingly good coverage-wise
(although running the contrib tests as well as core helps --- did you
do that?).  I'm afraid I haven't got a good suggestion for you.

            regards, tom lane

Re: BUG #5590: undefined shift behavior

From
John Regehr
Date:
> Hrm, I'd have expected you to see a few integer overflows during the
> regression tests --- we do test that the overflow checks in places
> like int4pl work.

I saw no signed overflows.  Our patch still has some rough edges, but
this part is pretty well tested.

Perhaps the int4pl checks fire before the overflowing operation can be
performed, stopping them from happening?

Anyway when I get a bit of time I'll stub out the checks and make sure
that I see some checks firing.

>  You might be interested in this concurrent thread:
> http://archives.postgresql.org/pgsql-hackers/2010-08/msg00024.php
> particularly the comments about overflow.

Thanks.  I've been corresponding with Neil about this a bit.

Nice to hear that Clang has -fwrapv now, I hadn't known about that.

> Yeah, the PG regression tests aren't amazingly good coverage-wise
> (although running the contrib tests as well as core helps --- did you
> do that?).  I'm afraid I haven't got a good suggestion for you.

I haven't tried the contrib tests, I'll do that, thanks.

Sooner or later we'll push our patch into LLVM and then anyone testing
your code can help out.

Just randomly: it's trivial to compile postgresql using clang on OSX.
However, I cannot get it to work on Linux with or without Peter
Eisentraut's patches.  It's no big deal since I have a Mac sitting
around but I could get stuff done faster on Linux!

John

Re: BUG #5590: undefined shift behavior

From
John Regehr
Date:
Tom, on the list you said:

"I would be ecstatic if we could write
int4pl like this:

    if (sum_overflows(arg1, arg2))
        elog(ERROR, "overflow");
    else
        return arg1 + arg2;
"

This is effectively what our clang patch does (automatically, for all
integer operations).

Right now our stuff isn't very efficient (~20% slowdown in SPEC INT) but
LLVM does support intrinsics for detecting overflows using processor
flags.  Currently we cannot use these because they're buggy but we're
working to get them fixed.  I'd expect our integer-checked binaries to
be pretty efficient once this is all working.

John





On 8/2/2010 10:16 AM, Tom Lane wrote:
> John Regehr<regehr@cs.utah.edu>  writes:
>> On 08/02/2010 09:06 AM, Tom Lane wrote:
>>> John: how did you detect this?
>
>> One of my students has hacked Clang to detect integer undefined
>> behaviors in C, like this shift problem or signed overflows.
>
> Cool.
>
>> This was
>> the only problem that came up during a "make check" of a postgresql with
>> this checking turned on, which is pretty cool.
>
> Hrm, I'd have expected you to see a few integer overflows during the
> regression tests --- we do test that the overflow checks in places
> like int4pl work.  You might be interested in this concurrent thread:
> http://archives.postgresql.org/pgsql-hackers/2010-08/msg00024.php
> particularly the comments about overflow.
>
>> I'd expect to be able to find more problems if I could get hold of a
>> good fuzz tester for postgresql, or at least some much larger test
>> inputs. Are there any of these you folks would suggest that I use?
>
> Yeah, the PG regression tests aren't amazingly good coverage-wise
> (although running the contrib tests as well as core helps --- did you
> do that?).  I'm afraid I haven't got a good suggestion for you.
>
>             regards, tom lane
>

Re: BUG #5590: undefined shift behavior

From
John Regehr
Date:
Aha-- the -fwrapv flag (which I had though was a nop) screws up our
checks.  Another rough edge to fix.  Removing this flag caused us to
find a bunch of integer overflows.  I'll start reporting them later today.

John

Re: BUG #5590: undefined shift behavior

From
Tom Lane
Date:
I wrote:
> "John Regehr" <regehr@cs.utah.edu> writes:
>> During a "make check" the left-shift operator at tsquery_util.c 48:18 is
>> passed a negative right-hand argument a number of times.

> Hmm.  valcrc is declared as signed int32, so depending on what your
> compiler thinks the semantics of % is, this clearly can potentially
> happen.  I notice the same problem in makeTSQuerySign() in tsquery_op.c.

On further investigation, the reason makeTSQuerySign didn't show up in
John's test is that it's coded like this:

            sign |= ((TSQuerySign) 1) << (ptr->qoperand.valcrc % TSQS_SIGLEN);

where TSQS_SIGLEN is a macro that involves sizeof(TSQuerySign), and
therefore will produce an unsigned result (on most machines anyway).
So the % is done in unsigned arithmetic and there's no problem.
I think it'd still be a good idea to cast valcrc to unsigned int
explicitly here, but this is just for future-proofing not because
there's a real bug ATM.  Which is a good thing, because TSQuerySign
does go to disk so we'd have a backwards-compatibility mess if we
had to change this computation.

The problem in QT2QTN() is real, on the other hand, because it's coded
as

        node->sign = 1 << (in->qoperand.valcrc % 32);

so the modulo is signed.  Some investigation shows that Intel machines
seem to give the right answer anyway, which has masked the problem for
most people.  But I find that on PPC, the result of the shift is *zero*
if valcrc is negative --- haven't checked, but I wonder if that hardware
interprets a negative left shift as a right shift.  So on PPC the "sign"
comes out as zero about half the time.

As best I can tell at the moment, this only results in some inefficiency
(because the Bloom filters don't work as intended) in some not-too-
heavily-used operations; and the QTNode structures aren't written to
disk so there's no compatibility issue from fixing the computation.

To sum up: we should insert these casts in HEAD but at the moment I
don't see a strong reason to back-patch, nor any indication that we'd
be creating a need for a forced initdb.

            regards, tom lane