On Wed, Nov 22, 2023 at 12:49:35PM -0600, Nathan Bossart wrote:
> On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote:
>> For reference, executing the page checksum 10M times on a AMD 3900X CPU:
>>
>> clang-14 -O2 4.292s (17.8 GiB/s)
>> clang-14 -O2 -msse4.1 2.859s (26.7 GiB/s)
>> clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s)
>
> Nice. I've noticed similar improvements with AVX2 intrinsics in simd.h.
I've alluded to this a few times now, so I figured I'd park the patch and
preliminary benchmarks in a new thread while we iron out how to support
newer instructions (see discussion here [0]).
Using the same benchmark as we did for the SSE2 linear searches in
XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following:
writers sse2 avx2 %
256 1195 1188 -1
512 928 1054 +14
1024 633 716 +13
2048 332 420 +27
4096 162 203 +25
8192 162 182 +12
It's been a while since I ran these benchmarks, but I vaguely recall also
seeing something like a 50% improvement for a dedicated pg_lfind32()
benchmark on long arrays.
As is, the patch likely won't do anything unless you add -mavx2 or
-march=native to your CFLAGS. I don't intend for this patch to be
seriously considered until we have better support for detecting/compiling
AVX2 instructions and a buildfarm machine that uses them.
I plan to start another thread for AVX2 support for the page checksums.
[0] https://postgr.es/m/20231107024734.GB729644%40nathanxps13
[1] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e@2ndquadrant.com
[2] https://postgr.es/m/20220713170950.GA3116318%40nathanxps13
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com