Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From John Naylor
Subject Re: Improve CRC32C performance on SSE4.2
Date
Msg-id CANWCAZZjuH8daxK-tcnYbK_sAQdq8R6d3T3iGfL43Dj6DkFv0g@mail.gmail.com
Whole thread Raw
In response to RE: Improve CRC32C performance on SSE4.2  ("Devulapalli, Raghuveer" <raghuveer.devulapalli@intel.com>)
List pgsql-hackers
On Thu, Feb 27, 2025 at 3:53 AM Devulapalli, Raghuveer
<raghuveer.devulapalli@intel.com> wrote:

> > I'm not a fan of exposing these architecture-specific details to places that consult
> > the capabilities. That requires things like
> >
> > +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)
> > [...]
> > +#define PGCPUCAP_CRC32C
> > pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C)
> >
> > I'd prefer to have 1 capability <-> one place to check at runtime for architectures
> > that need that, and to keep architecture details private to the initialization step. .
> > Even for things that test for which function pointer to use, I think it's a cleaner
> > interface to look at one thing.
>
> Isn't that one thing currently pg_cpu_have(FEATURE)?

No, I mean the one thing would be "do we have hardware CRC?", and each
architecture would set (and check if needed) the same slot.

I'm not 100% sure this is the better way, and there may be a
disadvantage I haven't thought of, but this makes the most sense to
me. I'm willing to hear reasons to do it differently, but I'm thinking
those reasons need to be weighed against the loss of abstraction.

> The reason we have the additional PGCPUCAP_CRC32C is because the dispatch for CRC32C is currently defined in the
headerfile common to both ARM and SSE42. We should ideally have separate dispatch for ARM and x86 in their own files
(theway it is in the main branch). 

Just so we're on the same page, the current separate files do
initialization, not dispatch. I'm seeing conflicting design goals
here:

1. Have multiple similar dispatch functions with the only difference
(for now) being certain names (not in the patch, just discussed).
2. Put initialization routines in the same file, even though they have
literally nothing in common.

> This also makes it easier to add more implementations in the future without having to make the dispatch function work
forboth ARM and x86. 

I don't quite understand, since with 0001 it already works (at least
in theory) for 3 architectures with hardware CRC, plus those without,
and I don't see it changing with more architectures.

+/* Access to pg_cpucap for modules that need runtime CPUID information */
+bool pg_cpu_have(int feature_id)
+{
+    return pg_cpucap[feature_id];
 }

Putting this in a separate translation may have to do the decision to
turn v8's #defines into an enum? In any case, this means doing a
function call to decide which function to call. That makes no sense.

The goal of v9/10 was to centralize the initialization from cpuid etc,
but it also did a major re-architecture of the runtime-checks at the
same. That made review more difficult.

+#if defined(__x86_64__) || defined ...
+    pg_cpucap_x86();
+#elif defined(__aarch64__) || defined ...
+    pg_cpucap_arm();
+#endif

I view this another conflict in design goals: After putting everything
in the same file, the routines still have different names and have to
be guarded accordingly. I don't really see the advantage, especially
since these guard symbols don't even match the ones that guard the
actual initialization steps. I tried to imply that in my last review,
but maybe I should have been more explicit.

I think the least painful step is to take the x86 initialization from
v10, which is looking great, but
- keep separate initialization files
- don't whack around the runtime representation, at least not in the same patch

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: new commitfest transition guidance
Next
From: vignesh C
Date:
Subject: Re: psql \dh: List High-Level (Root) Tables and Indexes