RE: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers

From Amonson, Paul D
Subject RE: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date
Msg-id BN0SPR01MB00084DB3E6F61E09F59533FFDCEE2@BN0SPR01MB0008.namprd11.prod.outlook.com
Whole thread Raw
In response to Proposal for Updating CRC32C with AVX-512 Algorithm.  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Responses Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
List pgsql-hackers
Hi, forgive the top-post but I have not seen any response to this post?

Thanks,
Paul

> -----Original Message-----
> From: Amonson, Paul D
> Sent: Wednesday, May 1, 2024 8:56 AM
> To: pgsql-hackers@lists.postgresql.org
> Cc: Nathan Bossart <nathandbossart@gmail.com>; Shankaran, Akash
> <akash.shankaran@intel.com>
> Subject: Proposal for Updating CRC32C with AVX-512 Algorithm.
>
> Hi,
>
> Comparing the current SSE4.2 implementation of the CRC32C algorithm in
> Postgres, to an optimized AVX-512 algorithm [0] we observed significant
> gains. The result was a ~6.6X average multiplier of increased performance
> measured on 3 different Intel products. Details below. The AVX-512 algorithm
> in C is a port of the ISA-L library [1] assembler code.
>
> Workload call size distribution details (write heavy):
>    * Average was approximately around 1,010 bytes per call
>    * ~80% of the calls were under 256 bytes
>    * ~20% of the calls were greater than or equal to 256 bytes up to the max
> buffer size of 8192
>
> The 256 bytes is important because if the buffer is smaller, it makes sense
> fallback to the existing implementation. This is because the AVX-512 algorithm
> needs a minimum of 256 bytes to operate.
>
> Using the above workload data distribution,
> at 0%    calls < 256 bytes, a 841% improvement on average for crc32c
> functionality was observed.
> at 50%   calls < 256 bytes, a 758% improvement on average for crc32c
> functionality was observed.
> at 90%   calls < 256 bytes, a 44% improvement on average for crc32c
> functionality was observed.
> at 97.6% calls < 256 bytes, the workload's crc32c performance breaks-even.
> at 100%  calls < 256 bytes, a 14% regression is seen when using AVX-512
> implementation.
>
> The results above are averages over 3 machines, and were measured on: Intel
> Saphire Rapids bare metal, and using EC2 on AWS cloud: Intel Saphire Rapids
> (m7i.2xlarge) and Intel Ice Lake (m6i.2xlarge).
>
> Summary Data (Saphire Rapids bare metal, AWS m7i-2xl, and AWS m6i-2xl):
> +---------------------+-------------------+-------------------+-------------------+---------
> -----------+
> | Rates in Bytes/us   |     Bare Metal    |    AWS m6i-2xl    |   AWS m7i-2xl     |
> |
> | (Larger is Better)  +---------+---------+---------+---------+---------+---------+
> Overall Multiplier |
> |                     | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 |
> |
> +---------------------+---------+---------+---------+---------+---------+---------+-------
> -------------+
> | Numbers 256-8192    |  12,046 |  83,196 |   7,471 |  39,965 |  11,867 |
> 84,589 |        6.62        |
> +---------------------+---------+---------+---------+---------+---------+---------+-------
> -------------+
> | Numbers 64 - 255    |  16,865 |  15,909 |   9,209 |   7,363 |  12,496 |
> 10,046 |        0.86        |
> +---------------------+---------+---------+---------+---------+---------+---------+-------
> -------------+
>                                                     |  Weighted Multiplier [*]    |        1.44        |
>                                                     +-----------------------------+--------------------+
> There was no evidence of AVX-512 frequency throttling from perf data, which
> stayed steady during the test.
>
> Feedback on this proposed improvement is appreciated. Some questions:
> 1) This AVX-512 ISA-L derived code uses BSD-3 license [2]. Is this compatible
> with the PostgreSQL License [3]? They both appear to be very permissive
> licenses, but I am not an expert on licenses.
> 2) Is there a preferred benchmark I should run to test this change?
>
> If licensing is a non-issue, I can post the initial patch along with my Postgres
> benchmark function patch for further review.
>
> Thanks,
> Paul
>
> [0]
> https://www.researchgate.net/publication/263424619_Fast_CRC_computati
> on#full-text
> [1] https://github.com/intel/isa-l
> [2] https://opensource.org/license/bsd-3-clause
> [3] https://opensource.org/license/postgresql
>
> [*] Weights used were 90% of requests less than 256 bytes, 10% greater than
> or equal to 256 bytes.



pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: commitfest.postgresql.org is no longer fit for purpose
Next
From: Tom Lane
Date:
Subject: Re: commitfest.postgresql.org is no longer fit for purpose