> On Apr 12, 2026, at 21:17, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 3 Apr 2026 at 15:08, David Rowley <dgrowleyml@gmail.com> wrote:
>> IMO, if we can make bitmapset.c work with INT_MAX members and get a
>> performance increase, then we should do it.
>
> Re-thinking this after a week's holiday, it seems fine to use an
> unsigned 32-bit int rather than a 64-bit int to fix this bug. I'd
> previously been uncertain if there were any guarantees in C to what
> (unsigned int) -1 would return, but going by [1] at 6.3.1.3, it says:
>
> "Otherwise, if the new type is unsigned, the value is converted by
> repeatedly adding or subtracting one more than the maximum value that
> can be represented in the new type until the value is in the range of
> the new type."
>
> So, it seems even on one's complement that -1 as an unsigned int will
> be UINT_MAX. When we add 1 to UINT_MAX, we're guaranteed to get 0, as
> it's unsigned maths and overflows are going to result in a value
> modulus the max value for the type.
>
> That leads me to the attached v2 patch. Compiler Explorer link showing
> the assembly at [2].
>
> Testing the performance, it's better than master, so I got rid of the
> size_t wordnum stuff. We're post-freeze now, so fiddling with other
> optimisations seems a bit off the table as there appears to be no
> performance regression to compensate for now, per:
>
> drowley@amd3990x:~$ gcc test_bms_next3.c -O2 -o test_bms_next3 &&
> ./test_bms_next3
> Benchmarking 100000000 bms_next_member iterations...
> master: 1.18330 seconds
> Patched: 1.05493 seconds
>
> Benchmarking 100000000 bms_prev_member iterations...
> master: 2.94522 seconds
> Patched: 1.86130 seconds
>
> drowley@amd3990x:~$ clang test_bms_next3.c -O2 -o test_bms_next3 &&
> ./test_bms_next3
> Benchmarking 100000000 bms_next_member iterations...
> master: 1.07860 seconds
> Patched: 1.07896 seconds
>
> Benchmarking 100000000 bms_prev_member iterations...
> master: 2.76550 seconds
> Patched: 2.12369 seconds
>
> David
>
> [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
> [2] https://godbolt.org/z/xW96rxd3P
> <test_bms_next3.c><v2-0001-Fix-unlikely-overflow-bug-in-bms_next_member.patch>
I just tried test_bms_next3 on my MacBook, looks like patched bms_prev_member is much faster there. I ran about 10
times,the results were consistent.
```
chaol@ChaodeMacBook-Air test % gcc test_bms_next3.c -O2 -o test_bms_next3
test_bms_next3.c:281:18: warning: format specifies type 'long' but the argument has type 'int64' (aka 'long long')
[-Wformat]
281 | printf("%ld\n", count);
| ~~~ ^~~~~
| %lld
1 warning generated.
chaol@ChaodeMacBook-Air test % ./test_bms_next3
Benchmarking 100000000 bms_next_member iterations...
master: 0.49287 seconds
Patched: 0.47315 seconds
Benchmarking 100000000 bms_prev_member iterations...
master: 1.04644 seconds
Patched: 0.58053 seconds
2800000000
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/