Thread: Reliance on undefined behaviour in << operator
Hi all Our implementation of << is a direct wrapper around the C operator. It does not check the right-hand side's value. Datum int8shl(PG_FUNCTION_ARGS) { int64 arg1 = PG_GETARG_INT64(0); int32 arg2 = PG_GETARG_INT32(1); PG_RETURN_INT64(arg1 << arg2); } This means that an operation like: 1::bigint << 65 directly relies on the compiler and platforms' handling of the undefined shift. On x64 intel gcc linux it does a rotation but that's not AFAIK guaranteed by anything, and we should probably not be relying on this or exposing it at the user level. Pg returns: test=> SELECT BIGINT '1' << 66;?column? ---------- 4 (1 row) A test program: #include "stdio.h" int main(int argc, char * argv[]) { printf("Result is %ld", 1l << 66); return 0; } returns zero when the compiler constant-folds, but when done at runtime: #include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]) { const char * num = "66"; printf("Result is %ld", 1l << atoi(num)); return 0; } IMO we should specify the behaviour in this case. Then issue a WARNING that gets promoted to an ERROR in a few versions. Consideration of << with a negative right-operand, and of out-of-bounds >>, is probably also needed. Thoughts? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hello, I don't think so many people have used shift operators with too-large or negative shift amount relying on the undetermined behavior. But if explicit definition is required, I prefer the result of a shift operation with too-large shift mount is simplly zero. And shift left with negative shift amount should do right shift. Addition to that, no error nor warning won't be needed. Like this, Datum int8shl(PG_FUNCTION_ARGS) { int64 arg1 = PG_GETARG_INT64(0); int32 arg2 =PG_GETARG_INT32(1); if (arg2 > 63 || arg2 < -63) PG_RETURN_INT64(0L); if (arg2 < 0) PG_RETURN_INT64(arg1 >>(-arg2)); PG_RETURN_INT64(arg1 << arg2); } The obvious problem on this is the lack of compatibility with existing behavior:( Thoughts? Opinions? regards, At Wed, 16 Sep 2015 15:16:27 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YE+0KJuOJfbB2nLVfU+14R50Yi90e_8DewLV9jX+ro1zg@mail.gmail.com> > Hi all > > Our implementation of << is a direct wrapper around the C operator. It > does not check the right-hand side's value. > > > Datum > int8shl(PG_FUNCTION_ARGS) > { > int64 arg1 = PG_GETARG_INT64(0); > int32 arg2 = PG_GETARG_INT32(1); > > PG_RETURN_INT64(arg1 << arg2); > } > > This means that an operation like: > > 1::bigint << 65 > > directly relies on the compiler and platforms' handling of the > undefined shift. On x64 intel gcc linux it does a rotation but that's > not AFAIK guaranteed by anything, and we should probably not be > relying on this or exposing it at the user level. > > > > Pg returns: > > test=> SELECT BIGINT '1' << 66; > ?column? > ---------- > 4 > (1 row) > > A test program: > > #include "stdio.h" > int main(int argc, char * argv[]) > { > printf("Result is %ld", 1l << 66); > return 0; > } > > returns zero when the compiler constant-folds, but when done at runtime: > > #include <stdio.h> > #include <stdlib.h> > int main(int argc, char * argv[]) > { > const char * num = "66"; > printf("Result is %ld", 1l << atoi(num)); > return 0; > } > > > IMO we should specify the behaviour in this case. Then issue a WARNING > that gets promoted to an ERROR in a few versions. > > Consideration of << with a negative right-operand, and of > out-of-bounds >>, is probably also needed. > > Thoughts? -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > Our implementation of << is a direct wrapper around the C operator. It > does not check the right-hand side's value. > > Datum > int8shl(PG_FUNCTION_ARGS) > { > int64 arg1 = PG_GETARG_INT64(0); > int32 arg2 = PG_GETARG_INT32(1); > > PG_RETURN_INT64(arg1 << arg2); > } > > This means that an operation like: > > 1::bigint << 65 > > directly relies on the compiler and platforms' handling of the > undefined shift. On x64 intel gcc linux it does a rotation but that's > not AFAIK guaranteed by anything, and we should probably not be > relying on this or exposing it at the user level. I agree. > Pg returns: > > test=> SELECT BIGINT '1' << 66; > ?column? > ---------- > 4 > (1 row) > > A test program: > > #include "stdio.h" > int main(int argc, char * argv[]) > { > printf("Result is %ld", 1l << 66); > return 0; > } > > returns zero when the compiler constant-folds, but when done at runtime: > > #include <stdio.h> > #include <stdlib.h> > int main(int argc, char * argv[]) > { > const char * num = "66"; > printf("Result is %ld", 1l << atoi(num)); > return 0; > } > > > IMO we should specify the behaviour in this case. Then issue a WARNING > that gets promoted to an ERROR in a few versions. I disagree. Such warnings are prone to be really annoying, e.g. by generating massive log spam. I'd say that we should either make a large shift return 0 - which seems like the intuitively right behavior to me: if you shift in N zeros where N is greater than the the word size, then you should end up with all zeroes - or throw an error right away. I'd vote for the former. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> Our implementation of << is a direct wrapper around the C operator. It >> does not check the right-hand side's value. >> ... On x64 intel gcc linux it does a rotation but that's >> not AFAIK guaranteed by anything, and we should probably not be >> relying on this or exposing it at the user level. > I agree. As far as I'm concerned, what those operators mean is "whatever your compiler makes them mean". This is hardly the only place where we expose platform-dependent behavior --- see also locale dependencies, timezones, floating point, yadda yadda --- and I do not find it the most compelling place to start reversing that general approach. regards, tom lane
On Wed, Sep 16, 2015 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >>> Our implementation of << is a direct wrapper around the C operator. It >>> does not check the right-hand side's value. >>> ... On x64 intel gcc linux it does a rotation but that's >>> not AFAIK guaranteed by anything, and we should probably not be >>> relying on this or exposing it at the user level. > >> I agree. > > As far as I'm concerned, what those operators mean is "whatever your > compiler makes them mean". This is hardly the only place where we expose > platform-dependent behavior --- see also locale dependencies, timezones, > floating point, yadda yadda --- and I do not find it the most compelling > place to start reversing that general approach. No, I don't agree, in this case. You could say that int32 + int32 is platform-dependent behavior too, when it overflows. But we don't say that. This case seems more like an overflow condition than it does like platform-dependent behavior that we should just pass through. Also, I think it's worth noting that, AFAICT, our users %@#! HATE the places where we expose platform-dependent behavior. That's why people keep proposing ICU for collations, and cursing their fate when Red Hat rolls out a glibc fix that changes some collation's ordering thus leaving all their indexes "corrupted". Of course, we're not in a position to eliminate all of those platform dependencies, because it would require us to integrate with - or develop - platform-dependent libraries for all of those things. And we don't really have the bandwidth for that, or at least it's not likely the best use of our time. But all things being equal, I don't think the fact that we have platform dependencies that are hard to eliminate means we should keep around the ones that are easy to eliminate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-09-16 15:57:04 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > >> Our implementation of << is a direct wrapper around the C operator. It > >> does not check the right-hand side's value. > >> ... On x64 intel gcc linux it does a rotation but that's > >> not AFAIK guaranteed by anything, and we should probably not be > >> relying on this or exposing it at the user level. > > > I agree. > > As far as I'm concerned, what those operators mean is "whatever your > compiler makes them mean". According to C that's undefined behaviour. So in the extreme sense that could mean that the instruction could trigger a SIGBUS or something. > This is hardly the only place where we expose > platform-dependent behavior --- see also locale dependencies, timezones, > floating point, yadda yadda --- and I do not find it the most compelling > place to start reversing that general approach. But in other places We do overflow checks, so I don't think that'd be reversal of a general approach. Greetings, Andres Freund
On Wed, Sep 16, 2015 at 03:57:04PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > >> Our implementation of << is a direct wrapper around the C operator. It > >> does not check the right-hand side's value. > >> ... On x64 intel gcc linux it does a rotation but that's > >> not AFAIK guaranteed by anything, and we should probably not be > >> relying on this or exposing it at the user level. > > > I agree. > > As far as I'm concerned, what those operators mean is "whatever your > compiler makes them mean". This is hardly the only place where we expose > platform-dependent behavior --- see also locale dependencies, timezones, > floating point, yadda yadda --- and I do not find it the most compelling > place to start reversing that general approach. > > regards, tom lane > +1 I tend to agree. Unless the behavior is mandated by the SQL standard, I have always expected the behavior of those apps to follow that defined by the compiler. Regards, Ken