RE: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers
From | Amonson, Paul D |
---|---|
Subject | RE: Proposal for Updating CRC32C with AVX-512 Algorithm. |
Date | |
Msg-id | BN0SPR01MB00084DB3E6F61E09F59533FFDCEE2@BN0SPR01MB0008.namprd11.prod.outlook.com Whole thread Raw |
In response to | Proposal for Updating CRC32C with AVX-512 Algorithm. ("Amonson, Paul D" <paul.d.amonson@intel.com>) |
Responses |
Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
|
List | pgsql-hackers |
Hi, forgive the top-post but I have not seen any response to this post? Thanks, Paul > -----Original Message----- > From: Amonson, Paul D > Sent: Wednesday, May 1, 2024 8:56 AM > To: pgsql-hackers@lists.postgresql.org > Cc: Nathan Bossart <nathandbossart@gmail.com>; Shankaran, Akash > <akash.shankaran@intel.com> > Subject: Proposal for Updating CRC32C with AVX-512 Algorithm. > > Hi, > > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased performance > measured on 3 different Intel products. Details below. The AVX-512 algorithm > in C is a port of the ISA-L library [1] assembler code. > > Workload call size distribution details (write heavy): > * Average was approximately around 1,010 bytes per call > * ~80% of the calls were under 256 bytes > * ~20% of the calls were greater than or equal to 256 bytes up to the max > buffer size of 8192 > > The 256 bytes is important because if the buffer is smaller, it makes sense > fallback to the existing implementation. This is because the AVX-512 algorithm > needs a minimum of 256 bytes to operate. > > Using the above workload data distribution, > at 0% calls < 256 bytes, a 841% improvement on average for crc32c > functionality was observed. > at 50% calls < 256 bytes, a 758% improvement on average for crc32c > functionality was observed. > at 90% calls < 256 bytes, a 44% improvement on average for crc32c > functionality was observed. > at 97.6% calls < 256 bytes, the workload's crc32c performance breaks-even. > at 100% calls < 256 bytes, a 14% regression is seen when using AVX-512 > implementation. > > The results above are averages over 3 machines, and were measured on: Intel > Saphire Rapids bare metal, and using EC2 on AWS cloud: Intel Saphire Rapids > (m7i.2xlarge) and Intel Ice Lake (m6i.2xlarge). > > Summary Data (Saphire Rapids bare metal, AWS m7i-2xl, and AWS m6i-2xl): > +---------------------+-------------------+-------------------+-------------------+--------- > -----------+ > | Rates in Bytes/us | Bare Metal | AWS m6i-2xl | AWS m7i-2xl | > | > | (Larger is Better) +---------+---------+---------+---------+---------+---------+ > Overall Multiplier | > | | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 | > | > +---------------------+---------+---------+---------+---------+---------+---------+------- > -------------+ > | Numbers 256-8192 | 12,046 | 83,196 | 7,471 | 39,965 | 11,867 | > 84,589 | 6.62 | > +---------------------+---------+---------+---------+---------+---------+---------+------- > -------------+ > | Numbers 64 - 255 | 16,865 | 15,909 | 9,209 | 7,363 | 12,496 | > 10,046 | 0.86 | > +---------------------+---------+---------+---------+---------+---------+---------+------- > -------------+ > | Weighted Multiplier [*] | 1.44 | > +-----------------------------+--------------------+ > There was no evidence of AVX-512 frequency throttling from perf data, which > stayed steady during the test. > > Feedback on this proposed improvement is appreciated. Some questions: > 1) This AVX-512 ISA-L derived code uses BSD-3 license [2]. Is this compatible > with the PostgreSQL License [3]? They both appear to be very permissive > licenses, but I am not an expert on licenses. > 2) Is there a preferred benchmark I should run to test this change? > > If licensing is a non-issue, I can post the initial patch along with my Postgres > benchmark function patch for further review. > > Thanks, > Paul > > [0] > https://www.researchgate.net/publication/263424619_Fast_CRC_computati > on#full-text > [1] https://github.com/intel/isa-l > [2] https://opensource.org/license/bsd-3-clause > [3] https://opensource.org/license/postgresql > > [*] Weights used were 90% of requests less than 256 bytes, 10% greater than > or equal to 256 bytes.
pgsql-hackers by date: