Re: [PATCH] CRC32C optimizations using SVE2 on ARM. - Mailing list pgsql-hackers

From Devanga.Susmitha@fujitsu.com
Subject Re: [PATCH] CRC32C optimizations using SVE2 on ARM.
Date
Msg-id OSZPR01MB84994FEF43BA834F706221A78BB0A@OSZPR01MB8499.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PATCH] CRC32C optimizations using SVE2 on ARM.  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: [PATCH] CRC32C optimizations using SVE2 on ARM.
List pgsql-hackers
>There was already a proposal to use armv8-a+crypto, which is more
widely available and works on smaller inputs.

Our implementation with SVE2 is able to gain better performance than
https://www.postgresql.org/message-id/CANWCAZaKhE%2BRD5KKouUFoxx1EbUNrNhcduM1VQ%3DDkSDadNEFng%40mail.gmail.com

I've benchmarked our SVE2 implementation against armv8-a+crypto, and the results show substantial improvements.

Buffer size (bytes)
               armv8+crypto (in ms)
             armv9+SVE2 (in ms)
                   Improvement
512
28.491
19.37
                  32.0% faster
1024
47.145
29.962
                  36.5% faster
2048
86.717
52.841
                  39.1% faster
4096
165.205
105.626
                  36.1% faster
8192
318.103
226.437
                  28.8% faster

These buffer sizes are particularly relevant for PostgreSQL workloads:
  • 8KB: Default page size (28.8% faster checksumming)
  • 4KB: Alternative page size configuration (36.1% faster)
  • 512B-2KB: Typical WAL record sizes (32-39% faster)
  • 2KB: TOAST chunk size (39% faster)


While armv8-a+crypto has broader current deployment, SVE2 is already available in production cloud infrastructure: AWS Graviton 4, Ampere AmpereOne, and NVIDIA Grace (all released 2023). As ARMv9 adoption continues, these gains become increasingly relevant.
Rather than choosing one approach over the other, perhaps we could implement both with runtime CPU detection? Since we already perform runtime detection for crypto extension availability, adding an additional check for SVE2 introduces no performance degradation on systems without SVE2, while providing significant performance gains (28-39%) on systems that do support it. This would provide optimal performance on capable hardware while maintaining broad compatibility. Please let me know your thoughts.


static pg_crc32c (*pg_comp_crc32c_armv8)(pg_crc32c crc, const void *data, size_t len);
void pg_comp_crc32c_choose_armv8(void)
{
    if (pg_cpu_has_sve2())
        pg_comp_crc32c_armv8 = pg_comp_crc32c_armv8_sve2;
    else if (pg_cpu_has_crypto())
        pg_comp_crc32c_armv8 = pg_comp_crc32c_armv8_crypto;
    else
        pg_comp_crc32c_armv8 = pg_comp_crc32c_sb8; // scalar fallback
}



Thanks,
Susmitha Devanga.




From: John Naylor <johncnaylorls@gmail.com>
Sent: Friday, December 19, 2025 08:27
To: Susmitha, Devanga <Devanga.Susmitha@fujitsu.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; Hajela, Ragesh <Ragesh.Hajela@fujitsu.com>; Bhattacharya, Chiranmoy <Chiranmoy.Bhattacharya@fujitsu.com>
Subject: Re: [PATCH] CRC32C optimizations using SVE2 on ARM.

On Fri, Dec 19, 2025 at 4:20 AM Devanga.Susmitha@fujitsu.com
<Devanga.Susmitha@fujitsu.com> wrote:
> For architecture-specific functions, we use pg_attribute_target("arch=armv9-a+sve2-aes")

There was already a proposal to use armv8-a+crypto, which is more
widely available and works on smaller inputs. Perhaps you'd be
interested in reviewing and testing?

https://www.postgresql.org/message-id/CANWCAZaKhE%2BRD5KKouUFoxx1EbUNrNhcduM1VQ%3DDkSDadNEFng%40mail.gmail.com

> to ensure precise compilation control without modifying global CFLAGS, enabling a clean integration within PostgreSQL’s build system.

I think the reason we continue to use CFLAGS here was that clang
support for target attributes on Arm is fairly recent. It's probably
too soon to reconsider that.

--
John Naylor
Amazon Web Services

pgsql-hackers by date:

Previous
From: "Jelte Fennema-Nio"
Date:
Subject: Re: Decouple C++ support in Meson's PGXS from LLVM enablement
Next
From: Andrey Borodin
Date:
Subject: Re: REASSIGN OWNED BY alters objects in other database.