Thread: Overflow hazard in pgbench

Overflow hazard in pgbench

From
Tom Lane
Date:
moonjelly just reported an interesting failure [1].  It seems that
with the latest bleeding-edge gcc, this code is misoptimized:

                /* check random range */
                if (imin > imax)
                {
                    pg_log_error("empty range given to random");
                    return false;
                }
                else if (imax - imin < 0 || (imax - imin) + 1 < 0)
                {
                    /* prevent int overflows in random functions */
                    pg_log_error("random range is too large");
                    return false;
                }

such that the second if-test doesn't fire.  Now, according to the C99
spec this code is broken, because the compiler is allowed to assume
that signed integer overflow doesn't happen, whereupon the second
if-block is provably unreachable.  The failure still represents a gcc
bug, because we're using -fwrapv which should disable that assumption.
However, not all compilers have that switch, so it'd be better to code
this in a spec-compliant way.  I suggest applying the attached in
branches that have the required functions.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2021-06-26%2007%3A03%3A17

            regards, tom lane

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e61055b6b7..c4023bfa27 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2450,7 +2450,8 @@ evalStandardFunc(CState *st,
         case PGBENCH_RANDOM_ZIPFIAN:
             {
                 int64        imin,
-                            imax;
+                            imax,
+                            delta;
 
                 Assert(nargs >= 2);
 
@@ -2464,7 +2465,8 @@ evalStandardFunc(CState *st,
                     pg_log_error("empty range given to random");
                     return false;
                 }
-                else if (imax - imin < 0 || (imax - imin) + 1 < 0)
+                else if (pg_sub_s64_overflow(imax, imin, &delta) ||
+                         pg_add_s64_overflow(delta, 1, &delta))
                 {
                     /* prevent int overflows in random functions */
                     pg_log_error("random range is too large");

Re: Overflow hazard in pgbench

From
Tom Lane
Date:
I wrote:
> ... according to the C99
> spec this code is broken, because the compiler is allowed to assume
> that signed integer overflow doesn't happen, whereupon the second
> if-block is provably unreachable.  The failure still represents a gcc
> bug, because we're using -fwrapv which should disable that assumption.
> However, not all compilers have that switch, so it'd be better to code
> this in a spec-compliant way.

BTW, for grins I tried building today's HEAD without -fwrapv, using
    gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC) 
which is the newest version I have at hand.  Not very surprisingly,
that reproduced the failure shown on moonjelly.  However, after adding
the patch I proposed, "make check-world" passed!  I was not expecting
that result; I supposed we still had lots of lurking assumptions of
traditional C overflow handling.

I'm not in any hurry to remove -fwrapv, because (a) this result doesn't
show that we have no such assumptions, only that they must be lurking
in darker, poorly-tested corners, and (b) I'm not aware of any reason
to think that removing -fwrapv would provide benefits worth taking any
risks for.  But we may be closer to being able to do without that
switch than I thought.

            regards, tom lane



Re: Overflow hazard in pgbench

From
Fabien COELHO
Date:
Hello Tom,

> moonjelly just reported an interesting failure [1].

I noticed. I was planning to have a look at it, thanks for digging!

> It seems that with the latest bleeding-edge gcc, this code is 
> misoptimized:

>                else if (imax - imin < 0 || (imax - imin) + 1 < 0)
>                {
>                    /* prevent int overflows in random functions */
>                    pg_log_error("random range is too large");
>                    return false;
>                }
>
> such that the second if-test doesn't fire.  Now, according to the C99
> spec this code is broken, because the compiler is allowed to assume
> that signed integer overflow doesn't happen,

Hmmm, so it is not possible to detect these with standard arithmetic as 
done by this code. Note that the code was written in 2016, ISTM pre C99 
Postgres. I'm unsure about what a C compiler could assume on the previous 
standard wrt integer arithmetic.

> whereupon the second if-block is provably unreachable.

Indeed.

> The failure still represents a gcc bug, because we're using -fwrapv 
> which should disable that assumption.

Ok, I'll report it.

I also see a good point with pgbench tests exercising edge cases.

> However, not all compilers have that switch, so it'd be better to code
> this in a spec-compliant way.

Ok.

> I suggest applying the attached in branches that have the required 
> functions.

The latest gcc, recompiled yesterday, is still wrong, as shown by 
moonjelly current status.

The proposed patch does fix the issue in pgbench TAP test. I'd suggest to 
add unlikely() on all these conditions, as already done elsewhere. See 
attached version.

I confirm that check-world passed with gcc head and its broken -fwrapv.

I also recompiled after removing manually -fwrapv: Make check, pgbench TAP 
tests and check-world all passed. I'm not sure that edge case are well 
enough tested in postgres to be sure that all is ok just from these runs 
though.

-- 
Fabien.
Attachment

Re: Overflow hazard in pgbench

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> I suggest applying the attached in branches that have the required 
>> functions.

> The proposed patch does fix the issue in pgbench TAP test. I'd suggest to 
> add unlikely() on all these conditions, as already done elsewhere. See 
> attached version.

Done that way, though I'm skeptical that it makes any measurable
difference.

> I also recompiled after removing manually -fwrapv: Make check, pgbench TAP 
> tests and check-world all passed. I'm not sure that edge case are well 
> enough tested in postgres to be sure that all is ok just from these runs 
> though.

Yeah, I'm afraid that in most places it'd take a specifically-designed
test case to expose a problem, if there is one.

            regards, tom lane



Re: Overflow hazard in pgbench

From
Fabien COELHO
Date:
>> The failure still represents a gcc bug, because we're using -fwrapv which 
>> should disable that assumption.
>
> Ok, I'll report it.

Done at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101254

-- 
Fabien.



Re: Overflow hazard in pgbench

From
Fabien COELHO
Date:
Hello Tom,

>>> The failure still represents a gcc bug, because we're using -fwrapv which 
>>> should disable that assumption.
>> 
>> Ok, I'll report it.
>
> Done at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101254

Fixed at r12-1916-ga96d8d67d0073a7031c0712bc3fb7759417b2125

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a96d8d67d0073a7031c0712bc3fb7759417b2125

Just under 10 hours from the bug report…

-- 
Fabien.

Re: Overflow hazard in pgbench

From
Andres Freund
Date:
Hi,

On 2021-06-27 16:21:46 -0400, Tom Lane wrote:
> BTW, for grins I tried building today's HEAD without -fwrapv, using
>     gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC)
> which is the newest version I have at hand.  Not very surprisingly,
> that reproduced the failure shown on moonjelly.  However, after adding
> the patch I proposed, "make check-world" passed!  I was not expecting
> that result; I supposed we still had lots of lurking assumptions of
> traditional C overflow handling.

We did fix a lot of them a few years back...


> I'm not in any hurry to remove -fwrapv, because (a) this result doesn't
> show that we have no such assumptions, only that they must be lurking
> in darker, poorly-tested corners, and (b) I'm not aware of any reason
> to think that removing -fwrapv would provide benefits worth taking any
> risks for.  But we may be closer to being able to do without that
> switch than I thought.

Lack of failures after removing frwapv itself doesn't prove that much -
very commonly the compiler won't optimize based on the improved
knowledge about value range. Additionally we probably don't exercise all
effected places in our tests.

ubsan is able to catch all signed overflows. The last time I played
around with that, tests still were hitting quite a few cases of
overflows. But most not in particularly interesting places
(e.g. cash_out, RIGHTMOST_ONE()) but also a few where it might be worth
being careful about it in case a compiler disregards -fwrapv or doesn't
implement it (e.g. _dorand48()).

It might be worth setting up a bf animal with ubsan and enabled overflow
checking...

Greetings,

Andres Freund