On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote:
> On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> Overall, I wish we could avoid splitting things into separate files and
>> adding more header file gymnastics, but maybe there isn't much better we
>> can do without overhauling the CPU feature detection code.
>
> I wanted to make an attempt to make this aspect nicer. v13-0002
> incorporates deliberately compact and simple loops for inlined
> constant input into the dispatch function, and leaves the existing
> code alone. This avoids code churn and saves vertical space in the
> copied code. It needs a bit more commentary, but I hope this is a more
> digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
> be simpler if we can always assume non-constant input can go through a
> function pointer.
That is certainly more readable. FWIW I think it would be entirely
reasonable to replace the pg_crc32c_sse42.c implementation with a call to
this new pg_comp_crc32c_dispatch() function. Of course, you'd have to
split things up like:
pg_attribute_no_sanitize_alignment()
static inline
pg_crc32c
pg_comp_crc32c_sse42_inline(pgcrc32c crc, const void *data, size_t len)
{
const unsigned char *p = data;
#ifdef __x86_64__
for (; len >= 8; p += 8, len -= 8)
crc = _mm_crc32_u64(crc, *(const uint64 *) p);
#endif
for (; len >= 4; p += 4, len -= 4)
crc = _mm_crc32_u32(crc, *(const uint32 *) p);
for (; len > 0; len--)
crc = _mm_crc32_u8(crc, *p++)
return crc;
}
pg_attribute_no_sanitize_alignment()
static inline
pg_crc32c
pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
{
if (__builtin_constant_p(len))
return pg_comp_crc32c_sse42_inline(crc, data, len);
return pg_comp_crc32c_sse42(crc, data, len);
}
And then in pg_crc32c_sse42.c:
pg_attribute_no_sanitize_alignment()
pg_attribute_target("sse4.2")
pg_crc32c
pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len)
{
return pg_comp_crc32c_sse42_inline(crc, data, len);
}
Granted, that adds back some of the complexity you were trying to avoid
(and is very similar to your v12 patches), but it's pretty compact and
avoids some code duplication. FTR I don't feel too strongly about this,
but on balance I guess I'd be okay with a tad more complexity than your
v13 patches if it served a useful purpose.
--
nathan