Thread: Overflow hazard in pgbench
moonjelly just reported an interesting failure [1]. It seems that with the latest bleeding-edge gcc, this code is misoptimized: /* check random range */ if (imin > imax) { pg_log_error("empty range given to random"); return false; } else if (imax - imin < 0 || (imax - imin) + 1 < 0) { /* prevent int overflows in random functions */ pg_log_error("random range is too large"); return false; } such that the second if-test doesn't fire. Now, according to the C99 spec this code is broken, because the compiler is allowed to assume that signed integer overflow doesn't happen, whereupon the second if-block is provably unreachable. The failure still represents a gcc bug, because we're using -fwrapv which should disable that assumption. However, not all compilers have that switch, so it'd be better to code this in a spec-compliant way. I suggest applying the attached in branches that have the required functions. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2021-06-26%2007%3A03%3A17 regards, tom lane diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e61055b6b7..c4023bfa27 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -2450,7 +2450,8 @@ evalStandardFunc(CState *st, case PGBENCH_RANDOM_ZIPFIAN: { int64 imin, - imax; + imax, + delta; Assert(nargs >= 2); @@ -2464,7 +2465,8 @@ evalStandardFunc(CState *st, pg_log_error("empty range given to random"); return false; } - else if (imax - imin < 0 || (imax - imin) + 1 < 0) + else if (pg_sub_s64_overflow(imax, imin, &delta) || + pg_add_s64_overflow(delta, 1, &delta)) { /* prevent int overflows in random functions */ pg_log_error("random range is too large");
I wrote: > ... according to the C99 > spec this code is broken, because the compiler is allowed to assume > that signed integer overflow doesn't happen, whereupon the second > if-block is provably unreachable. The failure still represents a gcc > bug, because we're using -fwrapv which should disable that assumption. > However, not all compilers have that switch, so it'd be better to code > this in a spec-compliant way. BTW, for grins I tried building today's HEAD without -fwrapv, using gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC) which is the newest version I have at hand. Not very surprisingly, that reproduced the failure shown on moonjelly. However, after adding the patch I proposed, "make check-world" passed! I was not expecting that result; I supposed we still had lots of lurking assumptions of traditional C overflow handling. I'm not in any hurry to remove -fwrapv, because (a) this result doesn't show that we have no such assumptions, only that they must be lurking in darker, poorly-tested corners, and (b) I'm not aware of any reason to think that removing -fwrapv would provide benefits worth taking any risks for. But we may be closer to being able to do without that switch than I thought. regards, tom lane
Hello Tom, > moonjelly just reported an interesting failure [1]. I noticed. I was planning to have a look at it, thanks for digging! > It seems that with the latest bleeding-edge gcc, this code is > misoptimized: > else if (imax - imin < 0 || (imax - imin) + 1 < 0) > { > /* prevent int overflows in random functions */ > pg_log_error("random range is too large"); > return false; > } > > such that the second if-test doesn't fire. Now, according to the C99 > spec this code is broken, because the compiler is allowed to assume > that signed integer overflow doesn't happen, Hmmm, so it is not possible to detect these with standard arithmetic as done by this code. Note that the code was written in 2016, ISTM pre C99 Postgres. I'm unsure about what a C compiler could assume on the previous standard wrt integer arithmetic. > whereupon the second if-block is provably unreachable. Indeed. > The failure still represents a gcc bug, because we're using -fwrapv > which should disable that assumption. Ok, I'll report it. I also see a good point with pgbench tests exercising edge cases. > However, not all compilers have that switch, so it'd be better to code > this in a spec-compliant way. Ok. > I suggest applying the attached in branches that have the required > functions. The latest gcc, recompiled yesterday, is still wrong, as shown by moonjelly current status. The proposed patch does fix the issue in pgbench TAP test. I'd suggest to add unlikely() on all these conditions, as already done elsewhere. See attached version. I confirm that check-world passed with gcc head and its broken -fwrapv. I also recompiled after removing manually -fwrapv: Make check, pgbench TAP tests and check-world all passed. I'm not sure that edge case are well enough tested in postgres to be sure that all is ok just from these runs though. -- Fabien.
Attachment
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> I suggest applying the attached in branches that have the required >> functions. > The proposed patch does fix the issue in pgbench TAP test. I'd suggest to > add unlikely() on all these conditions, as already done elsewhere. See > attached version. Done that way, though I'm skeptical that it makes any measurable difference. > I also recompiled after removing manually -fwrapv: Make check, pgbench TAP > tests and check-world all passed. I'm not sure that edge case are well > enough tested in postgres to be sure that all is ok just from these runs > though. Yeah, I'm afraid that in most places it'd take a specifically-designed test case to expose a problem, if there is one. regards, tom lane
>> The failure still represents a gcc bug, because we're using -fwrapv which >> should disable that assumption. > > Ok, I'll report it. Done at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101254 -- Fabien.
Hello Tom, >>> The failure still represents a gcc bug, because we're using -fwrapv which >>> should disable that assumption. >> >> Ok, I'll report it. > > Done at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101254 Fixed at r12-1916-ga96d8d67d0073a7031c0712bc3fb7759417b2125 https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a96d8d67d0073a7031c0712bc3fb7759417b2125 Just under 10 hours from the bug report… -- Fabien.
Hi, On 2021-06-27 16:21:46 -0400, Tom Lane wrote: > BTW, for grins I tried building today's HEAD without -fwrapv, using > gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC) > which is the newest version I have at hand. Not very surprisingly, > that reproduced the failure shown on moonjelly. However, after adding > the patch I proposed, "make check-world" passed! I was not expecting > that result; I supposed we still had lots of lurking assumptions of > traditional C overflow handling. We did fix a lot of them a few years back... > I'm not in any hurry to remove -fwrapv, because (a) this result doesn't > show that we have no such assumptions, only that they must be lurking > in darker, poorly-tested corners, and (b) I'm not aware of any reason > to think that removing -fwrapv would provide benefits worth taking any > risks for. But we may be closer to being able to do without that > switch than I thought. Lack of failures after removing frwapv itself doesn't prove that much - very commonly the compiler won't optimize based on the improved knowledge about value range. Additionally we probably don't exercise all effected places in our tests. ubsan is able to catch all signed overflows. The last time I played around with that, tests still were hitting quite a few cases of overflows. But most not in particularly interesting places (e.g. cash_out, RIGHTMOST_ONE()) but also a few where it might be worth being careful about it in case a compiler disregards -fwrapv or doesn't implement it (e.g. _dorand48()). It might be worth setting up a bf animal with ubsan and enabled overflow checking... Greetings, Andres Freund