Re: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date
Msg-id rxchh5kqwt4j7bwwgpugx75zq7kmslluqsmaxecw5bksypp2es@2zvbgtcbkf3s
Whole thread Raw
In response to RE: Proposal for Updating CRC32C with AVX-512 Algorithm.  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Responses Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
List pgsql-hackers
Hi,

On 2024-10-30 21:03:20 +0000, Devulapalli, Raghuveer wrote:
> v6: Fixing build failure on Windows/MSVC. 
> 
> Raghuveer

> From b601e7b4ee9f25fd32e9d8d056bb20a03d755a8a Mon Sep 17 00:00:00 2001
> From: Paul Amonson <paul.d.amonson@intel.com>
> Date: Mon, 6 May 2024 08:34:17 -0700
> Subject: [PATCH v6 1/6] Add a Postgres SQL function for crc32c testing.
> 
> Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
> Signed-off-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>
> ---
>  src/test/modules/test_crc32c/Makefile         | 20 +++++++++
>  .../modules/test_crc32c/test_crc32c--1.0.sql  |  1 +
>  src/test/modules/test_crc32c/test_crc32c.c    | 41 +++++++++++++++++++
>  .../modules/test_crc32c/test_crc32c.control   |  4 ++
>  4 files changed, 66 insertions(+)
>  create mode 100644 src/test/modules/test_crc32c/Makefile
>  create mode 100644 src/test/modules/test_crc32c/test_crc32c--1.0.sql
>  create mode 100644 src/test/modules/test_crc32c/test_crc32c.c
>  create mode 100644 src/test/modules/test_crc32c/test_crc32c.control

Needs to be integrated with the meson based build as well.



> +/*
> + * drive_crc32c(count: int, num: int) returns bigint
> + *
> + * count is the nuimber of loops to perform
> + *
> + * num is the number byte in the buffer to calculate
> + * crc32c over.
> + */
> +PG_FUNCTION_INFO_V1(drive_crc32c);
> +Datum
> +drive_crc32c(PG_FUNCTION_ARGS)
> +{
> +    int64            count    = PG_GETARG_INT64(0);
> +    int64            num        = PG_GETARG_INT64(1);
> +    pg_crc32c        crc        = 0xFFFFFFFF;
> +    const char*        data    = malloc((size_t)num);

This is computing a crc of uninitialized data. That's
a) undefined behaviour
b) means the return value is basically random
c) often will just CRC a lot of zeroes



> From da26645ec8515e0e6d91e2311a83c3bb6649017e Mon Sep 17 00:00:00 2001
> From: Paul Amonson <paul.d.amonson@intel.com>
> Date: Tue, 23 Jul 2024 11:23:23 -0700
> Subject: [PATCH v6 2/6] Move all HW checks to common file.

Would be good to actually include a justification here.


> --- /dev/null
> +++ b/src/port/pg_hw_feat_check.c
> @@ -0,0 +1,159 @@
> +/*-------------------------------------------------------------------------
> + *
> + * pg_hw_feat_check.c
> + *        Test for hardware features at runtime on x86_64 platforms.
> + *
> + * Copyright (c) 2024, PostgreSQL Global Development Group
> + *
> + * IDENTIFICATION
> + *        src/port/pg_hw_feat_check.c
> + *
> + *-------------------------------------------------------------------------
> + */
> +#include "c.h"
> +
> +#if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT)
> +#include <cpuid.h>
> +#endif
> +
> +#include <immintrin.h>
> +
> +#if defined(HAVE__CPUID) || defined(HAVE__CPUIDEX)
> +#include <intrin.h>
> +#endif
> +
> +#include "port/pg_hw_feat_check.h"
> +
> +/* Define names for EXX registers to avoid hard to see bugs in code below. */
> +typedef unsigned int exx_t;
> +typedef enum
> +{
> +    EAX = 0,
> +    EBX = 1,
> +    ECX = 2,
> +    EDX = 3
> +} reg_name;

Shouldn't this be in some x86 sepcific ifdef?


> +# PGAC_AVX512_CRC32_INTRINSICS
> +# ---------------------------
> +# Check if the compiler supports the x86 CRC instructions added in AVX-512,
> +# using the intrinsic functions:
> +
> +# (We don't test the 8-byte variant, _mm_crc32_u64, but it is assumed to
> +# be present if the other ones are, on x86-64 platforms)
> +#
> +# An optional compiler flag can be passed as arguments (e.g. -msse4.2
> +# -mavx512vl -mvpclmulqdq). If the intrinsics are supported, sets
> +# pgac_avx512_crc32_intrinsics, and CFLAGS_CRC.
> +AC_DEFUN([PGAC_AVX512_CRC32_INTRINSICS],
> +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_crc32_intrinsics_$1])])dnl
> +AC_CACHE_CHECK([for _mm512_clmulepi64_epi128, _mm512_clmulepi64_epi128... with CFLAGS=$1], [Ac_cachevar],
> +[pgac_save_CFLAGS=$CFLAGS
> +CFLAGS="$pgac_save_CFLAGS $1"
> +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <immintrin.h>],
> +  [const unsigned long k1k2[[8]] = {
> +  0xdcb17aa4, 0xb9e02b86, 0xdcb17aa4, 0xb9e02b86,
> +  0xdcb17aa4, 0xb9e02b86, 0xdcb17aa4, 0xb9e02b86};
> +  unsigned char buffer[[512]];
> +  unsigned char *aligned = (unsigned char*)(((size_t)buffer + 64L) & 0xffffffffffc0L);
> +  unsigned long val;
> +  __m512i x0, x1, x2, x3, x4, x5, x6, x7, x8, y5, y6, y7, y8;
> +  __m128i a1, a2;
> +  unsigned int crc = 0xffffffff;
> +  y8 = _mm512_load_si512((__m512i *)aligned);
> +  x0 = _mm512_loadu_si512((__m512i *)k1k2);
> +  x1 = _mm512_loadu_si512((__m512i *)(buffer + 0x00));
> +  x1 = _mm512_xor_si512(x1, _mm512_castsi128_si512(_mm_cvtsi32_si128(crc)));
> +  x5 = _mm512_clmulepi64_epi128(x1, x0, 0x00);
> +  x1 = _mm512_ternarylogic_epi64(x1, x5, y5, 0x96);
> +  a1 = _mm512_extracti32x4_epi32(x1, 3);
> +  a1 = _mm_xor_epi64(a1, _mm512_castsi512_si128(x0));
> +  x0 = _mm512_shuffle_i64x2(x1, x1, 0x4E);
> +  val = _mm_crc32_u64(0, _mm_extract_epi64(a1, 0));
> +  crc = (unsigned int)_mm_crc32_u64(val, _mm_extract_epi64(a1, 1));
> +  return crc != 0;])],
> +  [Ac_cachevar=yes],
> +  [Ac_cachevar=no])
> +CFLAGS="$pgac_save_CFLAGS"])
> +if test x"$Ac_cachevar" = x"yes"; then
> +  CFLAGS_CRC="$1"
> +  pgac_avx512_crc32_intrinsics=yes
> +fi
> +undefine([Ac_cachevar])dnl
> +])# PGAC_AVX512_CRC32_INTRINSICS
> +

Why is all this stuff needed inside a configure check? We don't need to check
entire algorithms to check if we can build and link sepcific instructions, no?




> From a495124ee42cb8f9f206f719b9f2235aff715963 Mon Sep 17 00:00:00 2001
> From: Nathan Bossart <nathan@postgresql.org>
> Date: Wed, 16 Oct 2024 15:57:55 -0500
> Subject: [PATCH v6 5/6] use __attribute__((target(...))) for AVX-512 stuff

Huh, so now we're undoing a bunch of stuff done earlier. Makes this series
pretty hard to review.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
Next
From: Nathan Bossart
Date:
Subject: Re: Proposal for Updating CRC32C with AVX-512 Algorithm.