On Fri, Jan 30, 2026 at 03:22:45PM +0700, John Naylor wrote:
> 0001 - I'm pretty sure this is comparable to HEAD if the optimized
> function is pg_popcount_sse42(). Has the AVX512 version been tested
> with 8-byte inputs? That seems to have a lot of pre- and
> post-processing involved. The inline wrapper only bypasses for 7 or
> less bytes.
Here [0] is the latest perf data I see for the AVX-512 popcount patch,
although that's comparing to v16, which IIRC lacks a few other inlining
tricks. There's a chance the SSE4.2 version is faster at that particular
length. I'm not sure we need to worry about that, but I can do a bit of
testing if you'd like.
> 0002
> - I tried running this on x86-64 with alignment sanitizer and no
> alarms went off during "make check", but adding
> pg_attribute_no_sanitize_alignment() would prevent surprises in the
> future.
Done.
> - I imagine that the old SIZEOF_VOID_P check is superfluous now, since
> the whole file is gated by HAVE_X86_64_POPCNTQ.
I think you're right. There was some concern about this when I was first
adding the SSE4.2-specific pg_popcount() [1], but all the configure-time
checks for HAVE_X86_64_POPCNTQ are restricted to 64-bit x86, so I bet we
could safely assume SIZEOF_VOID_P == 8 in that file.
> - Maybe we can remove the aligned 32-bit path in
> pg_popcount_(masked_)portable(), since that's on-topic for this patch
> and would simplify things further.
IMHO that's a reasonable thing for us to do.
[0] https://postgr.es/m/20240404171828.GA3866970%40nathanxps13
[1] https://postgr.es/m/CAApHDvojPyh6dLKooqjXSZE%3D0Ed590Lq1BxF7WQ9knSggyuJEA%40mail.gmail.com
--
nathan