Hi,
So I worked on the algorithm to also work on buffers between 16-64
bytes. Then I ran the performance measurement on two
dataset[^raw_data_1] [^raw_data_2]. And created two diagrams
[^attachment].
my findings so far:
- the optimized crc32cvx is faster
- the sb8 performance is heavily depending on alignment (see the
ripples every 8 bytes)
- the 8 byte ripple is also visible in the vx implementation. As it can
only perform on 16 or 64 byte chunks, it will still use sb8 for the
remaining bytes.
- there is no obvious speed regression in the vx algorithm. Except
raw_data_2-28 which I assume is a fluke. I am sharing the system with a
bunch of other devs.
I hope this this is acceptable as performance measurement. However we
will setup a dedicated performance test and try to get precise numbers
without side-effects. But it may take some time until we get to that.
I'll post the update on the Code together with the other requested
updates.
cheers, Eddy
[^raw_data_1]
bytes crc32c_sb8 crc32c_vx
4 6.54 ms 6.548 ms
8 4.476 ms 4.47 ms
10 7.346 ms 7.348 ms
12 10.955 ms 10.958 ms
14 14.548 ms 14.546 ms
16 6.837 ms 6.193 ms
32 12.23 ms 6.741 ms
64 22.826 ms 7.6 ms
80 28.536 ms 8.307 ms
96 34.426 ms 9.09 ms
112 40.295 ms 9.844 ms
128 46.053 ms 10.825 ms
144 51.868 ms 11.712 ms
160 65.91 ms 12.122 ms
176 71.649 ms 13.055 ms
192 77.465 ms 11.716 ms
208 83.286 ms 13.532 ms
224 88.991 ms 13.165 ms
240 94.875 ms 13.881 ms
256 100.653 ms 13.147 ms
8192 2967.477 ms 182.911 ms
[^raw_data_2]
bytes crc32c_sb8 crc32c_vx
4 6.543 ms 6.536 ms
8 4.476 ms 4.47 ms
10 7.35 ms 7.345 ms
12 10.96 ms 10.954 ms
14 14.552 ms 14.588 ms
16 6.843 ms 6.189 ms
18 10.253 ms 9.814 ms
24 9.645 ms 9.924 ms
28 15.957 ms 17.211 ms
32 12.226 ms 6.726 ms
36 18.823 ms 14.484 ms
42 17.855 ms 14.271 ms
48 17.342 ms 7.344 ms
52 24.208 ms 15.306 ms
58 23.525 ms 14.695 ms
64 22.818 ms 7.593 ms
On Thu, 2025-05-08 at 05:32 +0700, John Naylor wrote:
> On Wed, May 7, 2025 at 8:15 PM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > I didn't review the patch but wanted to point out that when it
> > comes
> > to performance improvements it's typically useful to provide some
> > benchmarks.
>
> +1 -- It's good to have concrete numbers for the commit message, and
> also to verify improvement on short inputs. There is a test harness
> in
> the v7-0002 patch from here:
>
> https://www.postgresql.org/message-id/CANWCAZaD5niydBF6q3V_cjApNV05cw-LpxxFtMbwDPLsz-PjbQ@mail.gmail.com
>
>
> After building, run the "test-crc.sh" script here after executing
> "CREATE EXTENSION test_crc32c;":
>
> https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com
>
>
>
>
> --
> John Naylor
> Amazon Web Services