On Wed, Jan 10, 2024 at 09:06:08AM +0700, John Naylor wrote:
> If we have say 25 elements, I mean (for SSE2) check the first 16, then
> the last 16. Some will be checked twice, but that's okay.
I finally got around to trying this. 0001 adds this overlapping logic.
0002 is a rebased version of the AVX2 patch (it needed some updates after
commit 9f225e9). And 0003 is a benchmark for test_lfind32(). It runs
pg_lfind32() on an array of the given size 100M times.
I've also attached the results of running this benchmark on my machine at
HEAD, after applying 0001, and after applying both 0001 and 0002. 0001
appears to work pretty well. When there is a small "tail," it regresses a
small amount, but overall, it seems to improve more cases than it harms.
0002 does regress searches on smaller arrays quite a bit, since it
postpones the SIMD optimizations until the arrays are longer. It might be
possible to mitigate by using 2 registers when the "tail" is long enough,
but I have yet to try that.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com