Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Improve CRC32C performance on SSE4.2
Date
Msg-id Z89d-eRP9Tpd5Mq1@nathan
Whole thread Raw
In response to Re: Improve CRC32C performance on SSE4.2  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: Improve CRC32C performance on SSE4.2
List pgsql-hackers
On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote:
> On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> Overall, I wish we could avoid splitting things into separate files and
>> adding more header file gymnastics, but maybe there isn't much better we
>> can do without overhauling the CPU feature detection code.
> 
> I wanted to make an attempt to make this aspect nicer. v13-0002
> incorporates deliberately compact and simple loops for inlined
> constant input into the dispatch function, and leaves the existing
> code alone. This avoids code churn and saves vertical space in the
> copied code. It needs a bit more commentary, but I hope this is a more
> digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
> be simpler if we can always assume non-constant input can go through a
> function pointer.

That is certainly more readable.  FWIW I think it would be entirely
reasonable to replace the pg_crc32c_sse42.c implementation with a call to
this new pg_comp_crc32c_dispatch() function.  Of course, you'd have to
split things up like:

    pg_attribute_no_sanitize_alignment()
    static inline
    pg_crc32c
    pg_comp_crc32c_sse42_inline(pgcrc32c crc, const void *data, size_t len)
    {
        const unsigned char *p = data;

    #ifdef __x86_64__
        for (; len >= 8; p += 8, len -= 8)
            crc = _mm_crc32_u64(crc, *(const uint64 *) p);
    #endif
        for (; len >= 4; p += 4, len -= 4)
            crc = _mm_crc32_u32(crc, *(const uint32 *) p);
        for (; len > 0; len--)
            crc = _mm_crc32_u8(crc, *p++)
        return crc;
    }

    pg_attribute_no_sanitize_alignment()
    static inline
    pg_crc32c
    pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
    {
        if (__builtin_constant_p(len))
            return pg_comp_crc32c_sse42_inline(crc, data, len);
        return pg_comp_crc32c_sse42(crc, data, len);
    }

And then in pg_crc32c_sse42.c:

    pg_attribute_no_sanitize_alignment()
    pg_attribute_target("sse4.2")
    pg_crc32c
    pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len)
    {
        return pg_comp_crc32c_sse42_inline(crc, data, len);
    }

Granted, that adds back some of the complexity you were trying to avoid
(and is very similar to your v12 patches), but it's pretty compact and
avoids some code duplication.  FTR I don't feel too strongly about this,
but on balance I guess I'd be okay with a tad more complexity than your
v13 patches if it served a useful purpose.

-- 
nathan



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Documentation Edits for pg_createsubscriber
Next
From: Tom Lane
Date:
Subject: Re: Statistics Import and Export: difference in statistics dumped