Re: [PATCH] SVE popcount support - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: [PATCH] SVE popcount support
Date
Msg-id Z9Cm9j-xLnbaHwxz@nathan
Whole thread Raw
In response to Re: [PATCH] SVE popcount support  ("Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>)
Responses Re: [PATCH] SVE popcount support
List pgsql-hackers
On Fri, Mar 07, 2025 at 03:20:07AM +0000, Chiranmoy.Bhattacharya@fujitsu.com wrote:
> Sounds good. Let us know your findings.

Alright, here's what I saw on an R8g for drive_popcount(1000000, N):

    8-byte words  master       v5-no-sve    v5-sve      v5-4reg
    1             2.540 ms     2.170 ms     1.807 ms    2.178 ms
    2             2.534 ms     2.180 ms     1.804 ms    2.167 ms
    4             3.988 ms     3.240 ms     1.590 ms    2.879 ms
    8             5.033 ms     4.672 ms     2.175 ms    2.525 ms
    16            8.252 ms     10.916 ms    3.235 ms    3.588 ms
    32            20.932 ms    22.883 ms    5.134 ms    5.395 ms
    64            40.446 ms    45.668 ms    9.817 ms    9.285 ms
    128           66.087 ms    91.386 ms    20.072 ms   17.175 ms
    256           153.852 ms   182.594 ms   40.447 ms   32.212 ms
    512           246.271 ms   300.941 ms   87.116 ms   60.729 ms
    1024          487.180 ms   607.289 ms   180.574 ms  116.948 ms
    2048          969.335 ms   1223.838 ms  363.595 ms  232.575 ms
    4096          1934.646 ms  2472.154 ms  729.525 ms  459.495 ms

(Note that there should be no need to test anything smaller than 8 bytes
because we use the inline version in pg_bitutils.h in that case.)

v5-no-sve is the result of using a function pointer, but pointing to the
"slow" versions instead of the SVE version.  v5-sve is the result of the
latest patch in this thread on a machine with SVE support, and v5-4reg is
the result of the latest patch in this thread modified to process 4
register's worth of data at a time.

The biggest takeaways for me are as follows:

* The 4-register version does show some nice improvements as the data
  grows.
* Machines without SVE will likely incur a rather sizable regression from
  the newly introduced function pointer.

For the latter point, I think we should consider trying to add a separate
Neon implementation that we use as a fallback for machines that don't have
SVE.  My understanding is that Neon is virtually universally supported on
64-bit Arm gear, so that will not only help offset the function pointer
overhead but may even improve performance for a much wider set of machines.

-- 
nathan



pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: Non-text mode for pg_dumpall
Next
From: Masahiko Sawada
Date:
Subject: Re: maintenance_work_mem = 64kB doesn't work for vacuum