Re: Proposal for enabling auto-vectorization for checksum calculations - Mailing list pgsql-hackers
From | Andrew Kim |
---|---|
Subject | Re: Proposal for enabling auto-vectorization for checksum calculations |
Date | |
Msg-id | CAK64mneN20+sW5WhV+r7hMVo4Rd0z11B6=3L039rWMt1wK3nPg@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal for enabling auto-vectorization for checksum calculations (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: Proposal for enabling auto-vectorization for checksum calculations
|
List | pgsql-hackers |
Hi John, Thank you for your detailed and constructive feedback on the checksum AVX2 optimization patch. I've carefully addressed all of your concerns and am pleased to share the updated V6 implementation. V6 Implementation adds SIMD-optimized checksum calculation using AVX2 instructions with automatic fallback to portable implementation, incorporating all of your recommended improvements: 1. Code Organization Consolidated architecture: Moved all checksum logic into a single checksum.c file, eliminating the complexity of separate dispatch files Simplified build integration: Streamlined both autoconf and meson build configurations 2. Safety & Robustness Eliminated dangerous runtime patching: Replaced direct function pointer manipulation with safe dispatch through static function pointers Thread-safe design: All operations are now inherently thread-safe without requiring locks or synchronization 3. Code Readability Removed macro complexity: Replaced PG_DECLARE_CHECKSUM_ISA macros with explicit, clear function declarations PostgreSQL coding compliance: Follows established PostgreSQL conventions throughout Simplified conditional compilation: Removed redundant __x86_64__ guards, relying on configure script's platform detection 4. Compiler Detection & Compatibility Preserved robust testing: Maintained the comprehensive avx2_test function that validates both __attribute__((target("avx2"))) support and AVX2 intrinsics functionality Runtime feature detection: Uses __builtin_cpu_supports("avx2") for reliable CPU capability detection Build cleanly across all library variants (static, shared, server) Compile without warnings under strict compiler flags I believe this V6 implementation fully addresses your concerns while delivering the performance benefits of AVX2 optimization. Please find the V6 patch attached. I welcome any additional feedback you may have. Best regards, Andrew Kim On Wed, Oct 1, 2025 at 10:26 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Thu, Sep 25, 2025 at 4:50 AM Andrew Kim <tenistarkim@gmail.com> wrote: > > > > Thanks, John. I see the issue now — I’ll attach the entire patch > > series in a single email so it shows up properly in the commitfest and > > gets CI coverage. > > It's still picking up v4, and the archive link doesn't show any > further replies. Something must have happened with the email > threading, since you weren't on the thread at first. Please create an > account and edit the entry to point to a more recent message ID: > > https://commitfest.postgresql.org/patch/5726/ > > > Please find attached v6 of the patchset, updated per your feedback. > > Thanks. (BTW, we discourage top-posting and prefer to cut to size and > use inline responses) > > This is not a complete review, but some architectural thoughts and > some things I've noticed. > > The top of the checksum_impl.h has this: > > * This file exists for the benefit of external programs that may wish to > * check Postgres page checksums. They can #include this to get the code > * referenced by storage/checksum.h. (Note: you may need to redefine > * Assert() as empty to compile this successfully externally.) > > It's going to be a bit tricky to preserve this ability while allowing > the core server and client programs to dispatch to a specialized > implementation, but we should at least try. That means keeping > pg_checksum_block() and pg_checksum_page() where they live now. > > I think a good first refactoring patch would be to move > src/backend/storage/checksum.c (which your patch doesn't even touch) > to src/port (and src/include/storage/checksum.h to src/include/port) > and have all callers use that. With that, I imagine only that > checksum.c file would include checksum_impl.h. > > If that poses a problem, let us know -- we may have to further juggle > things. If that works without issue, we can proceed with the > specialization. On that, just a few things to note here, although the > next patch doesn't need to worry about any of this yet: > > + #if defined(__has_attribute) && __has_attribute (target) > + __attribute__((target("avx2"))) > + #endif > + static int avx2_test(void) > + { > + const char buf@<:@sizeof(__m256i)@:>@; > + __m256i accum = _mm256_loadu_si256((const __m256i *) buf); > + accum = _mm256_add_epi32(accum, accum); > + int result = _mm256_extract_epi32(accum, 0); > + return (int) result; > + }], > > If we're just testing if the target works, we can just use an empty > function, right? > > +#define PG_DECLARE_CHECKSUM_ISA(ISANAME) \ > +uint32 \ > +pg_checksum_block_##ISANAME(const PGChecksummablePage *page); > + > +#define PG_DEFINE_CHECKSUM_ISA(ISANAME) \ > +pg_attribute_target(#ISANAME) \ > +uint32 pg_checksum_block_##ISANAME(const PGChecksummablePage *page) \ > > I find this hard to read compared to just using the actual name. > > +avx2_available(void) > +{ > +#if defined (USE_AVX2_WITH_RUNTIME_CHECK) && defined(__x86_64__) > > Why guard on __x86_64__? > > +PG_DEFINE_CHECKSUM_ISA(default) > +{ > + uint32 sums[N_SUMS], result = 0; > + uint32 i, j; > [...] > > +#ifdef USE_AVX2_WITH_RUNTIME_CHECK > +PG_DEFINE_CHECKSUM_ISA(avx2) > +{ > + uint32 sums[N_SUMS], result = 0; > + uint32 i, j; > [...] > > With the single src/port file idea above, these would just do "return > pg_checksum_block()" (or pg_checksum_page, whichever makes more > sense). > > + if (avx2_available()) > + { > + /* optional: patch pointer so next call goes directly */ > + pg_checksum_block = pg_checksum_block_avx2; > + return pg_checksum_block_avx2(page); > + } > > Not sure what your referring to here by "patching" the pointer, but it > sounds dangerous. Besides, the cost of indirection is basically zero > for multi-kilobyte inputs, so there is not even any motivation to > consider doing differently. > > -- > John Naylor > Amazon Web Services
Attachment
pgsql-hackers by date: