Okay, here is an updated patch set with first drafts of the commit
messages. I'm reasonably happy with these patches, but I'll admit my
justification for ripping out the 32-bit optimizations feels a bit flimsy.
I don't get the idea that we are all that concerned about things like
micro-regressions for popcount on 32-bit builds, but OTOH it isn't hard to
imagine someone objecting to these changes.
I ran the bms_num_members() benchmark on a couple of machines I had nearby:
apple-m3 (neon) intel-i5-13500T (sse4.2)
words HEAD v8 words HEAD v8
1 40 25 1 26 10
2 57 51 2 37 29
4 75 57 4 55 45
8 105 56 8 88 51
16 154 59 16 158 68
32 265 73 32 296 102
64 545 103 64 577 209
128 1027 178 128 1212 423
I was going to run it on machines with SVE/AVX-512, but John already tested
the AVX-512 case [0], and I have no reason to believe that we'll see
regressions on machines with SVE.
[0] https://postgr.es/m/CANWCAZbWLX%3DEDd1Bq-8oGK2ZLVNR4m4BkGe%3D288t2V5oLcqeZA%40mail.gmail.com
--
nathan