Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: Improve CRC32C performance on SSE4.2 |
Date | |
Msg-id | CANWCAZZjuH8daxK-tcnYbK_sAQdq8R6d3T3iGfL43Dj6DkFv0g@mail.gmail.com Whole thread Raw |
In response to | RE: Improve CRC32C performance on SSE4.2 ("Devulapalli, Raghuveer" <raghuveer.devulapalli@intel.com>) |
List | pgsql-hackers |
On Thu, Feb 27, 2025 at 3:53 AM Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com> wrote: > > I'm not a fan of exposing these architecture-specific details to places that consult > > the capabilities. That requires things like > > > > +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42) > > [...] > > +#define PGCPUCAP_CRC32C > > pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C) > > > > I'd prefer to have 1 capability <-> one place to check at runtime for architectures > > that need that, and to keep architecture details private to the initialization step. . > > Even for things that test for which function pointer to use, I think it's a cleaner > > interface to look at one thing. > > Isn't that one thing currently pg_cpu_have(FEATURE)? No, I mean the one thing would be "do we have hardware CRC?", and each architecture would set (and check if needed) the same slot. I'm not 100% sure this is the better way, and there may be a disadvantage I haven't thought of, but this makes the most sense to me. I'm willing to hear reasons to do it differently, but I'm thinking those reasons need to be weighed against the loss of abstraction. > The reason we have the additional PGCPUCAP_CRC32C is because the dispatch for CRC32C is currently defined in the headerfile common to both ARM and SSE42. We should ideally have separate dispatch for ARM and x86 in their own files (theway it is in the main branch). Just so we're on the same page, the current separate files do initialization, not dispatch. I'm seeing conflicting design goals here: 1. Have multiple similar dispatch functions with the only difference (for now) being certain names (not in the patch, just discussed). 2. Put initialization routines in the same file, even though they have literally nothing in common. > This also makes it easier to add more implementations in the future without having to make the dispatch function work forboth ARM and x86. I don't quite understand, since with 0001 it already works (at least in theory) for 3 architectures with hardware CRC, plus those without, and I don't see it changing with more architectures. +/* Access to pg_cpucap for modules that need runtime CPUID information */ +bool pg_cpu_have(int feature_id) +{ + return pg_cpucap[feature_id]; } Putting this in a separate translation may have to do the decision to turn v8's #defines into an enum? In any case, this means doing a function call to decide which function to call. That makes no sense. The goal of v9/10 was to centralize the initialization from cpuid etc, but it also did a major re-architecture of the runtime-checks at the same. That made review more difficult. +#if defined(__x86_64__) || defined ... + pg_cpucap_x86(); +#elif defined(__aarch64__) || defined ... + pg_cpucap_arm(); +#endif I view this another conflict in design goals: After putting everything in the same file, the routines still have different names and have to be guarded accordingly. I don't really see the advantage, especially since these guard symbols don't even match the ones that guard the actual initialization steps. I tried to imply that in my last review, but maybe I should have been more explicit. I think the least painful step is to take the x86 initialization from v10, which is looking great, but - keep separate initialization files - don't whack around the runtime representation, at least not in the same patch -- John Naylor Amazon Web Services
pgsql-hackers by date: