Thread: Reliance on undefined behaviour in << operator

Reliance on undefined behaviour in << operator

From
Craig Ringer
Date:
Hi all

Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.

 Datum int8shl(PG_FUNCTION_ARGS) {         int64           arg1 = PG_GETARG_INT64(0);         int32           arg2 =
PG_GETARG_INT32(1);
         PG_RETURN_INT64(arg1 << arg2); }

This means that an operation like:
    1::bigint << 65

directly relies on the compiler and platforms' handling of the
undefined shift. On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.



Pg returns:

test=> SELECT BIGINT '1' << 66;?column?
----------       4
(1 row)

A test program:

#include "stdio.h"
int main(int argc, char * argv[])
{   printf("Result is %ld", 1l << 66);   return 0;
}

returns zero when the compiler constant-folds, but when done at runtime:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char * argv[])
{   const char * num = "66";   printf("Result is %ld", 1l << atoi(num));   return 0;
}


IMO we should specify the behaviour in this case. Then issue a WARNING
that gets promoted to an ERROR in a few versions.

Consideration of << with a negative right-operand, and of
out-of-bounds >>, is probably also needed.

Thoughts?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Reliance on undefined behaviour in << operator

From
Kyotaro HORIGUCHI
Date:
Hello, I don't think so many people have used shift operators
with too-large or negative shift amount relying on the
undetermined behavior.

But if explicit definition is required, I prefer the result of a
shift operation with too-large shift mount is simplly zero. And
shift left with negative shift amount should do right
shift. Addition to that, no error nor warning won't be needed.

Like this,
  Datum  int8shl(PG_FUNCTION_ARGS)  {          int64           arg1 = PG_GETARG_INT64(0);          int32           arg2
=PG_GETARG_INT32(1);
 
          if (arg2 > 63 || arg2 < -63) PG_RETURN_INT64(0L);          if (arg2 < 0)               PG_RETURN_INT64(arg1
>>(-arg2));
 
          PG_RETURN_INT64(arg1 << arg2);  }

The obvious problem on this is the lack of compatibility with
existing behavior:(

Thoughts? Opinions?

regards,


At Wed, 16 Sep 2015 15:16:27 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in
<CAMsr+YE+0KJuOJfbB2nLVfU+14R50Yi90e_8DewLV9jX+ro1zg@mail.gmail.com>
> Hi all
> 
> Our implementation of << is a direct wrapper around the C operator. It
> does not check the right-hand side's value.
> 
> 
>   Datum
>   int8shl(PG_FUNCTION_ARGS)
>   {
>           int64           arg1 = PG_GETARG_INT64(0);
>           int32           arg2 = PG_GETARG_INT32(1);
> 
>           PG_RETURN_INT64(arg1 << arg2);
>   }
> 
> This means that an operation like:
> 
>      1::bigint << 65
> 
> directly relies on the compiler and platforms' handling of the
> undefined shift. On x64 intel gcc linux it does a rotation but that's
> not AFAIK guaranteed by anything, and we should probably not be
> relying on this or exposing it at the user level.
> 
> 
> 
> Pg returns:
> 
> test=> SELECT BIGINT '1' << 66;
>  ?column?
> ----------
>         4
> (1 row)
> 
> A test program:
> 
> #include "stdio.h"
> int main(int argc, char * argv[])
> {
>     printf("Result is %ld", 1l << 66);
>     return 0;
> }
> 
> returns zero when the compiler constant-folds, but when done at runtime:
> 
> #include <stdio.h>
> #include <stdlib.h>
> int main(int argc, char * argv[])
> {
>     const char * num = "66";
>     printf("Result is %ld", 1l << atoi(num));
>     return 0;
> }
> 
> 
> IMO we should specify the behaviour in this case. Then issue a WARNING
> that gets promoted to an ERROR in a few versions.
> 
> Consideration of << with a negative right-operand, and of
> out-of-bounds >>, is probably also needed.
> 
> Thoughts?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Reliance on undefined behaviour in << operator

From
Robert Haas
Date:
On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Our implementation of << is a direct wrapper around the C operator. It
> does not check the right-hand side's value.
>
>   Datum
>   int8shl(PG_FUNCTION_ARGS)
>   {
>           int64           arg1 = PG_GETARG_INT64(0);
>           int32           arg2 = PG_GETARG_INT32(1);
>
>           PG_RETURN_INT64(arg1 << arg2);
>   }
>
> This means that an operation like:
>
>      1::bigint << 65
>
> directly relies on the compiler and platforms' handling of the
> undefined shift. On x64 intel gcc linux it does a rotation but that's
> not AFAIK guaranteed by anything, and we should probably not be
> relying on this or exposing it at the user level.

I agree.

> Pg returns:
>
> test=> SELECT BIGINT '1' << 66;
>  ?column?
> ----------
>         4
> (1 row)
>
> A test program:
>
> #include "stdio.h"
> int main(int argc, char * argv[])
> {
>     printf("Result is %ld", 1l << 66);
>     return 0;
> }
>
> returns zero when the compiler constant-folds, but when done at runtime:
>
> #include <stdio.h>
> #include <stdlib.h>
> int main(int argc, char * argv[])
> {
>     const char * num = "66";
>     printf("Result is %ld", 1l << atoi(num));
>     return 0;
> }
>
>
> IMO we should specify the behaviour in this case. Then issue a WARNING
> that gets promoted to an ERROR in a few versions.

I disagree.  Such warnings are prone to be really annoying, e.g. by
generating massive log spam.  I'd say that we should either make a
large shift return 0 - which seems like the intuitively right behavior
to me: if you shift in N zeros where N is greater than the the word
size, then you should end up with all zeroes - or throw an error right
away.  I'd vote for the former.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Reliance on undefined behaviour in << operator

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Our implementation of << is a direct wrapper around the C operator. It
>> does not check the right-hand side's value.
>> ... On x64 intel gcc linux it does a rotation but that's
>> not AFAIK guaranteed by anything, and we should probably not be
>> relying on this or exposing it at the user level.

> I agree.

As far as I'm concerned, what those operators mean is "whatever your
compiler makes them mean".  This is hardly the only place where we expose
platform-dependent behavior --- see also locale dependencies, timezones,
floating point, yadda yadda --- and I do not find it the most compelling
place to start reversing that general approach.
        regards, tom lane



Re: Reliance on undefined behaviour in << operator

From
Robert Haas
Date:
On Wed, Sep 16, 2015 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> Our implementation of << is a direct wrapper around the C operator. It
>>> does not check the right-hand side's value.
>>> ... On x64 intel gcc linux it does a rotation but that's
>>> not AFAIK guaranteed by anything, and we should probably not be
>>> relying on this or exposing it at the user level.
>
>> I agree.
>
> As far as I'm concerned, what those operators mean is "whatever your
> compiler makes them mean".  This is hardly the only place where we expose
> platform-dependent behavior --- see also locale dependencies, timezones,
> floating point, yadda yadda --- and I do not find it the most compelling
> place to start reversing that general approach.

No, I don't agree, in this case.  You could say that int32 + int32 is
platform-dependent behavior too, when it overflows.  But we don't say
that.  This case seems more like an overflow condition than it does
like platform-dependent behavior that we should just pass through.

Also, I think it's worth noting that, AFAICT, our users %@#! HATE the
places where we expose platform-dependent behavior.  That's why people
keep proposing ICU for collations, and cursing their fate when Red Hat
rolls out a glibc fix that changes some collation's ordering thus
leaving all their indexes "corrupted".

Of course, we're not in a position to eliminate all of those platform
dependencies, because it would require us to integrate with - or
develop - platform-dependent libraries for all of those things.  And
we don't really have the bandwidth for that, or at least it's not
likely the best use of our time.  But all things being equal, I don't
think the fact that we have platform dependencies that are hard to
eliminate means we should keep around the ones that are easy to
eliminate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Reliance on undefined behaviour in << operator

From
Andres Freund
Date:
On 2015-09-16 15:57:04 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> >> Our implementation of << is a direct wrapper around the C operator. It
> >> does not check the right-hand side's value.
> >> ... On x64 intel gcc linux it does a rotation but that's
> >> not AFAIK guaranteed by anything, and we should probably not be
> >> relying on this or exposing it at the user level.
> 
> > I agree.
> 
> As far as I'm concerned, what those operators mean is "whatever your
> compiler makes them mean".

According to C that's undefined behaviour. So in the extreme sense that
could mean that the instruction could trigger a SIGBUS or something.

> This is hardly the only place where we expose
> platform-dependent behavior --- see also locale dependencies, timezones,
> floating point, yadda yadda --- and I do not find it the most compelling
> place to start reversing that general approach.

But in other places We do overflow checks, so I don't think that'd be
reversal of a general approach.

Greetings,

Andres Freund



Re: Reliance on undefined behaviour in << operator

From
"ktm@rice.edu"
Date:
On Wed, Sep 16, 2015 at 03:57:04PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> >> Our implementation of << is a direct wrapper around the C operator. It
> >> does not check the right-hand side's value.
> >> ... On x64 intel gcc linux it does a rotation but that's
> >> not AFAIK guaranteed by anything, and we should probably not be
> >> relying on this or exposing it at the user level.
> 
> > I agree.
> 
> As far as I'm concerned, what those operators mean is "whatever your
> compiler makes them mean".  This is hardly the only place where we expose
> platform-dependent behavior --- see also locale dependencies, timezones,
> floating point, yadda yadda --- and I do not find it the most compelling
> place to start reversing that general approach.
> 
>             regards, tom lane
> 
+1

I tend to agree. Unless the behavior is mandated by the SQL standard, I have
always expected the behavior of those apps to follow that defined by the
compiler.

Regards,
Ken