Thread: [V2] Adding new CRC32C implementation for IBM S390X
Hi, So here is V2 of the crc32c_s390x patch. Changes from V1 are: - added gcc 14-14.2 as known broken compiler (bug was fixed with 14.3) - moved broken compiler check to vx extension compile&link check - made variables global in the extension check - create dependency to getauxval in configure, so we don't compile the code if we won't be able to detect the cpu extension at runtime - moved buffer length check into macro - changed minimal buffer length for crc32c_s390x from 64 to 16 byte - added CFLAGS_CRC to all crc32c_s390x* artifacts - fixed formatting with pgindent - fixed typos in email address -- Eduard Stefes <eduard.stefes@ibm.com>
Attachment
On Thu, Jun 5, 2025 at 2:15 PM Eduard Stefes <Eduard.Stefes@ibm.com> wrote: > > Hi, > > So here is V2 of the crc32c_s390x patch. Changes from V1 are: Hi Eduard, thanks for the update. Note, it's not necessary to change the thread subject, and in fact I came very close to missing this email entirely. Secondly, I haven't seen a response about the copyright. I'm referring to this part in particular: + * Copyright (c) 2025, International Business Machines (IBM) I shared this link in my first reply: https://wiki.postgresql.org/wiki/Developer_FAQ#Do_I_need_to_sign_a_copyright_assignment.3F It says, in part: "May I add my own copyright notice where appropriate? No, please don't. We like to keep the legal information short and crisp. Additionally, we've heard that could possibly pose problems for corporate users." > - added gcc 14-14.2 as known broken compiler (bug was fixed with 14.3) Why do we need to mark this as broken? In my research, it caused compiler errors with those versions -- that itself should cause fallback to sb8. On that note, I'm now wondering if clang compiles and links successfully but the program is broken? Or does it fail to compile? If the latter, we should treat them the same: there is no need for "known broken versions" in the configure checks, as it's just a matter of documentation. > - create dependency to getauxval in configure, so we don't compile the > code if we won't be able to detect the cpu extension at runtime That's just unnecessary clutter, and we don't do that anywhere else. We already have the HAVE_GETAUXVAL symbol to guard the runtime check. As I alluded to before, I'm not in favor of having both direct-call and runtime-check paths here. The reason x86 and Arm have it is because their hardware support works on any length input. Is there any reason to expect auxv.h to be unavailable? Also, lately we have been moving away from having separate *choose.c files, since they add complexity to the build system. I'd suggest looking at src/port/pg_popcount_aarch64.c -- the file is compiled unconditionally but inside it has the appropriate #ifdef's as well as the choice function. See how it handles auxv.h as well. > - moved buffer length check into macro > - changed minimal buffer length for crc32c_s390x from 64 to 16 byte +#define COMP_CRC32C(crc, data, len) \ + ((crc) = (len) < 16 ? pg_comp_crc32c_sb8((crc),(data),(len)) : pg_comp_crc32c((crc), (data), (len))) Your tests demonstrated improvement with 32 bytes and above, and nothing less than 31 makes sense as a minimum because of the 16-byte alignment requirement. I've mentioned that we don't want the 20-byte WAL header computation to have any more overhead than it does now. -- John Naylor Amazon Web Services
Hi, On Wed, 2025-06-11 at 13:48 +0700, John Naylor wrote: > Hi Eduard, thanks for the update. Note, it's not necessary to change > the thread subject, and in fact I came very close to missing this > email entirely. Sorry for the confusion. > Secondly, I haven't seen a response about the copyright. I'm > referring > to this part in particular: > > + * Copyright (c) 2025, International Business Machines (IBM) > > I shared this link in my first reply: > > > It says, in part: > > "May I add my own copyright notice where appropriate? > > No, please don't. We like to keep the legal information short and > crisp. Additionally, we've heard that could possibly pose problems > for > corporate users." > I had to talk to some people about this and get their ok. I'll remove the copyright text. > > - added gcc 14-14.2 as known broken compiler (bug was fixed with > > 14.3) > > Why do we need to mark this as broken? In my research, it caused > compiler errors with those versions -- that itself should cause > fallback to sb8. > > On that note, I'm now wondering if clang compiles and links > successfully but the program is broken? Or does it fail to compile? > If > the latter, we should treat them the same: there is no need for > "known > broken versions" in the configure checks, as it's just a matter of > documentation. > Yes in all cases the compilation will fail and we will fall back to sb8. However personally I think the user should get feedback of *why* something fails. I'll remove the check and document the broken compiler versions. > > - create dependency to getauxval in configure, so we don't compile > > the > > code if we won't be able to detect the cpu extension at runtime > > That's just unnecessary clutter, and we don't do that anywhere else. > We already have the HAVE_GETAUXVAL symbol to guard the runtime check. Noted. I'll remove it. > As I alluded to before, I'm not in favor of having both direct-call > and runtime-check paths here. The reason x86 and Arm have it is > because their hardware support works on any length input. Is there > any > reason to expect auxv.h to be unavailable? I tried to find a reason but did not find any. So I'll remove it. > Also, lately we have been moving away from having separate *choose.c > files, since they add complexity to the build system. I'd suggest > looking at src/port/pg_popcount_aarch64.c -- the file is compiled > unconditionally but inside it has the appropriate #ifdef's as well as > the choice function. See how it handles auxv.h as well. I'll change that. > Your tests demonstrated improvement with 32 bytes and above, and > nothing less than 31 makes sense as a minimum because of the 16-byte > alignment requirement. I've mentioned that we don't want the 20-byte > WAL header computation to have any more overhead than it does now. Sorry, yes that's true I'll change that back. -- Eduard Stefes <eduard.stefes@ibm.com>