RE: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers
From | Devulapalli, Raghuveer |
---|---|
Subject | RE: Improve CRC32C performance on SSE4.2 |
Date | |
Msg-id | PH8PR11MB82862B464609340FB0A4BA37FBCD2@PH8PR11MB8286.namprd11.prod.outlook.com Whole thread Raw |
In response to | Re: Improve CRC32C performance on SSE4.2 (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: Improve CRC32C performance on SSE4.2
|
List | pgsql-hackers |
Hi John, Going back to your earlier comment: > > > I'm not a fan of exposing these architecture-specific details to > > > places that consult the capabilities. That requires things like +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42) > > Isn't that one thing currently pg_cpu_have(FEATURE)? > > No, I mean the one thing would be "do we have hardware CRC?", and each > architecture would set (and check if needed) the same slot. > > I'm not 100% sure this is the better way, and there may be a disadvantage I > haven't thought of, but this makes the most sense to me. I'm willing to hear > reasons to do it differently, but I'm thinking those reasons need to be weighed > against the loss of abstraction. IMO, CPU SIMD (SSE4.2, AVX, etc.) features are a module of their own separate from capabilities/features supported in postgres(CRC32C, bitcount, etc.). Exposing architecture-specific details to the features that need to check against themis a way to make code more modular and reusable. It is reasonable to expect developers writing SIMD specific functionsto naturally understand the runtime architecture requirements for that function which is easily accessible by justchecking the value of pg_cpu_have(PG_CPU_FEATURE_..). If another feature in postgres requires SSE4.2, then the CPU initializationmodule doesn't need to be modified at all. They just have to worry about their feature and its CPU requirements. > Just so we're on the same page, the current separate files do initialization, not > dispatch. I'm seeing conflicting design goals > here: > > 1. Have multiple similar dispatch functions with the only difference (for now) > being certain names (not in the patch, just discussed). > 2. Put initialization routines in the same file, even though they have literally > nothing in common. Sorry for not being clear. I will move the initialization routines to separate files. Just haven’t gotten to it yet withv10. > > > This also makes it easier to add more implementations in the future without > having to make the dispatch function work for both ARM and x86. > > I don't quite understand, since with 0001 it already works (at least in theory) for 3 > architectures with hardware CRC, plus those without, and I don't see it changing > with more architectures. The reason its working across all architectures as of now is because there is only one runtime check for CRC32C feature acrossx86, ARM and LongArch. (PCLMUL dispatch happens inside the SSE42). If and when we add the AVX-512 implementation, won'tthe dispatch logic will look different for x86 and ARM? Along with that, the header file pg_crc32c.h will also looka lot messier with a whole bunch of nested #ifdefs. IMO, the header file should be devoid of any architecture dispatchlogic and simply contain declarations of various implementations (see https://github.com/r-devulap/postgressql/blob/cf80f7375f24d2fb5cd29e95dc73d4988fc6d066/src/include/port/pg_crc32c.hfor example).The dispatch logic should be handled separately in a C file. > > +/* Access to pg_cpucap for modules that need runtime CPUID information > +*/ bool pg_cpu_have(int feature_id) { > + return pg_cpucap[feature_id]; > } > > Putting this in a separate translation may have to do the decision to turn v8's > #defines into an enum? In any case, this means doing a function call to decide > which function to call. That makes no sense. The reason it is a separate translational unit is because I intended pg_cpucap to be a read only variable outside of pg_cpucap.c.If the function overhead is not preferred, then I can get rid of it. > > The goal of v9/10 was to centralize the initialization from cpuid etc, but it also did > a major re-architecture of the runtime-checks at the same. That made review > more difficult. I assumed we wanted to move the runtime checks to a central place and that needed this re-architecture. > > +#if defined(__x86_64__) || defined ... > + pg_cpucap_x86(); > +#elif defined(__aarch64__) || defined ... > + pg_cpucap_arm(); > +#endif > > I view this another conflict in design goals: After putting everything in the same > file, the routines still have different names and have to be guarded accordingly. I > don't really see the advantage, especially since these guard symbols don't even > match the ones that guard the actual initialization steps. I tried to imply that in > my last review, but maybe I should have been more explicit. > > I think the least painful step is to take the x86 initialization from v10, which is > looking great, but > - keep separate initialization files > - don't whack around the runtime representation, at least not in the same patch Sure. I can fix that in v11. Raghuveer
pgsql-hackers by date: