Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From John Naylor
Subject Re: Improve CRC32C performance on SSE4.2
Date
Msg-id CANWCAZbhL1EAjzZ7s_PBvmG2Mu=1bx1gvzbht5opitdV_Z48Rw@mail.gmail.com
Whole thread Raw
In response to Re: Improve CRC32C performance on SSE4.2  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Wed, Mar 5, 2025 at 10:52 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Mar 05, 2025 at 08:51:21AM +0700, John Naylor wrote:
> > That was my hunch too, but I wanted to be more sure, so I modified the
> > benchmark so it doesn't know the address of the next calculation until
> > it finishes the last calculation so we can hopefully see the latency
> > caused by indirection. It also does an additional calculation on
> > constant 20 bytes, like the WAL header. I also tweaked the length each
> > iteration so the branch predictor maybe has a harder time predicting
> > the constant 20 input. And to make it more challenging, I removed the
> > part that inlined all small inputs, so it inlines only constant
> > inputs:
>
> Would you mind sharing this test?

The test script is the same as here, except I only ran small lengths:

https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com

...but I must have forgotten to attach the slightly tweaked patch set,
which I've done now. 0002 modifies the 0001 test module and 0006
reverts inlining non-constant input from 0005, just to see if I could
find a regression from indirection, which I didn't. If we don't need
it, it'd better to avoid inlining loops to keep from bloating the
binary.

> It sounds like you are running a
> workload with a mix of constant/inlined calls and function pointer calls to
> simulate typical usage for WAL, but I'm not 100% sure I'm understanding you
> correctly.

Exactly.

--
John Naylor
Amazon Web Services

Attachment

pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: AIO v2.5
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Selectively invalidate caches in pgoutput module