Use AVX2 for calculating page checksums where available
We already rely on autovectorization for computing page checksums,
but on x86 we can get a further several-fold performance increase by
annotating pg_checksum_block() with a function target attribute for
the AVX2 instruction set extension. Not only does that use 256-bit
registers, it can also use vector multiplication rather than the
vector shifts and adds used in SSE2.
Similar to other hardware-specific paths, we set a function pointer
on first use. We don't bother to avoid this on platforms without AVX2
since the overhead of indirect calls doesn't matter for multi-kilobyte
inputs. However, we do arrange so that only core has the function
pointer mechanism. External programs will continue to build a normal
static function and don't need to be aware of this.
This matters most when using io_uring since in that case the checksum
computation is not done in parallel by IO workers.
Co-authored-by: Matthew Sterrett <matthewsterrett2@gmail.com>
Co-authored-by: Andrew Kim <andrew.kim@intel.com>
Reviewed-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Tested-by: Ants Aasma <ants.aasma@cybertec.at>
Tested-by: Stepan Neretin <slpmcf@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CA+vA85_5GTu+HHniSbvvP+8k3=xZO=WE84NPwiKyxztqvpfZ3Q@mail.gmail.com
Discussion: https://postgr.es/m/20250911054220.3784-1-root%40ip-172-31-36-228.ec2.internal
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/5e13b0f240397b210a0af11f83204d0b4f1713c2
Modified Files
--------------
config/c-compiler.m4 | 25 +++++++++++++++
configure | 44 ++++++++++++++++++++++++++
configure.ac | 9 ++++++
meson.build | 27 ++++++++++++++++
src/backend/storage/page/checksum.c | 44 +++++++++++++++++++++++++-
src/include/pg_config.h.in | 3 ++
src/include/port/pg_cpu.h | 3 ++
src/include/storage/checksum_block.inc.c | 42 +++++++++++++++++++++++++
src/include/storage/checksum_impl.h | 53 ++++++++++++--------------------
src/port/pg_cpu_x86.c | 4 +++
10 files changed, 219 insertions(+), 35 deletions(-)