Thread: Proposal for enabling auto-vectorization for checksum calculations

Proposal for enabling auto-vectorization for checksum calculations

From
Matthew Sterrett
Date:
Hello,
This patch enables more compiler autovectorization for the checksum
calculations.
This code is particularly well suited for autovectorization, so just
adding pg_attribute_target and some simple dynamic dispatch logic we
can get improved vectorization.
This gives about a 2x speedup in a synthetic benchmark for
pg_checksum, which is also included as a seperate patch file.

Additionally, another 2x performance increase in the synthetic
benchmark with AVX2 can be obtained if N_SUMS was changed to 64.
However, this would change the results of the checksum. This isn't
included in this patch, but I think it is worth considering for the
future

One additional factor, without explicitly passing some optimization
flag like -O2 the makefile build won't autovectorize any of the code.
However, the meson based build does this automatically.

Attachment

Re: Proposal for enabling auto-vectorization for checksum calculations

From
Matthew Sterrett
Date:
Hello! I'm still trying to figure out those CI failures, I just wanted
to update things.

From my testing, with this patch repeatedly disabling/enabling
checksums is about 12.4% on an approximately 15 GB database.

By the way, I'd love it if anyone could help me figure out how to
replicate a CI failure in the Cirrus CI.
I haven't been able to figure out how to test CI runs locally, does
anyone know a good method to do that?





On Thu, May 8, 2025 at 6:57 AM Matthew Sterrett <matthewsterrett2@gmail.com> wrote:
Hello! I'm still trying to figure out those CI failures, I just wanted
to update things.

From my testing, with this patch repeatedly disabling/enabling
checksums is about 12.4% on an approximately 15 GB database.

By the way, I'd love it if anyone could help me figure out how to
replicate a CI failure in the Cirrus CI.
I haven't been able to figure out how to test CI runs locally, does
anyone know a good method to do that?



Hi Matthew,

Thanks for the patch!

I ran some timing tests:

(without avx2)

Time: 4034.351 ms
SELECT drive_pg_checksum(512);

(with avx2)

Time: 3559.076 ms
SELECT drive_pg_checksum(512);

Also attached two patches that should fix the CI issues.

Best,

Stepan Neretin


 

Re: Proposal for enabling auto-vectorization for checksum calculations

From
Matthew Sterrett
Date:
Hello! Thanks for helping me with this.
I'm still trying to figure out what is going on with the Bookworm test
failures. I'm pretty sure this patchset should resolve all the issues
with the macOS build, but I don't think it will help the linux
failures unfortunately.

On Sat, May 10, 2025 at 4:02 AM Stepan Neretin <slpmcf@gmail.com> wrote:
>
>
>
> On Sat, May 10, 2025 at 6:01 PM Stepan Neretin <slpmcf@gmail.com> wrote:
>>
>>
>>
>> On Thu, May 8, 2025 at 6:57 AM Matthew Sterrett <matthewsterrett2@gmail.com> wrote:
>>>
>>> Hello! I'm still trying to figure out those CI failures, I just wanted
>>> to update things.
>>>
>>> From my testing, with this patch repeatedly disabling/enabling
>>> checksums is about 12.4% on an approximately 15 GB database.
>>>
>>> By the way, I'd love it if anyone could help me figure out how to
>>> replicate a CI failure in the Cirrus CI.
>>> I haven't been able to figure out how to test CI runs locally, does
>>> anyone know a good method to do that?
>>>
>>>
>>
>> Hi Matthew,
>>
>> Thanks for the patch!
>>
>> I ran some timing tests:
>>
>> (without avx2)
>>
>> Time: 4034.351 ms
>> SELECT drive_pg_checksum(512);
>>
>> (with avx2)
>>
>> Time: 3559.076 ms
>> SELECT drive_pg_checksum(512);
>>
>> Also attached two patches that should fix the CI issues.
>>
>> Best,
>>
>> Stepan Neretin
>>
>>
>>
>
> Oops, forgot to attach patches :)
>
> Best,
>
> Stepan Neretin
>
>

Attachment

Re: Proposal for enabling auto-vectorization for checksum calculations

From
Nazir Bilal Yavuz
Date:
Hi,

On Tue, 20 May 2025 at 02:54, Matthew Sterrett
<matthewsterrett2@gmail.com> wrote:
>
> Hello! Thanks for helping me with this.
> I'm still trying to figure out what is going on with the Bookworm test
> failures. I'm pretty sure this patchset should resolve all the issues
> with the macOS build, but I don't think it will help the linux
> failures unfortunately.

You can see the failure at the artifacts ->
'log/tmp_install/log/install.log' file on the CI web page [1].

If you want to replicate that on your local:

$ ./configure --with-llvm CLANG="ccache clang-16"
$ make -s -j8 world-bin
$ make -j8 check-world

should be enough. I was able to replicate it with these commands. I
hope these help.

[1] https://cirrus-ci.com/task/4834162550505472

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Proposal for enabling auto-vectorization for checksum calculations

From
Matthew Sterrett
Date:
> You can see the failure at the artifacts ->
> 'log/tmp_install/log/install.log' file on the CI web page [1].
>
> If you want to replicate that on your local:
>
> $ ./configure --with-llvm CLANG="ccache clang-16"
> $ make -s -j8 world-bin
> $ make -j8 check-world
>
> should be enough. I was able to replicate it with these commands. I
> hope these help.
Thanks so much for helping me figure this out!

Okay, I've determined that versions of LLVM/Clang before 19 crash when
compiling this patch for some reason; it seems that both make
check-world and make install will crash with the affected LLVM
versions.
Unfortunately, what matters seems to be the version of the linker/LTO
optimizer, which I don't think we can check at compile time.
I added a check for Clang>=19 which works at preventing the crash on my system.
I think it's possible some unusual combination of clang/LLVM might
still crash during the build, but I think this is a reasonable
solution

Attachment
On Fri, May 23, 2025 at 4:54 AM Matthew Sterrett
<matthewsterrett2@gmail.com> wrote:
> Okay, I've determined that versions of LLVM/Clang before 19 crash when
> compiling this patch for some reason; it seems that both make
> check-world and make install will crash with the affected LLVM
> versions.
> Unfortunately, what matters seems to be the version of the linker/LTO
> optimizer, which I don't think we can check at compile time.
> I added a check for Clang>=19 which works at preventing the crash on my system.
> I think it's possible some unusual combination of clang/LLVM might
> still crash during the build, but I think this is a reasonable
> solution

I don't know if this is related to the crashes, but it doesn't seem
like a good idea to #include the function pointer stuff everywhere,
that should probably go into src/port like the others.

--
John Naylor
Amazon Web Services