Re: autovectorize page checksum code included elsewhere - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: autovectorize page checksum code included elsewhere
Date
Msg-id 20231111214943.GA1563304@nathanxps13
Whole thread Raw
In response to Re: autovectorize page checksum code included elsewhere  (John Naylor <johncnaylorls@gmail.com>)
List pgsql-hackers
On Sat, Nov 11, 2023 at 07:38:59PM +0700, John Naylor wrote:
> On Tue, Nov 7, 2023 at 9:47 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> Separately, I'm wondering whether we should consider using CFLAGS_VECTORIZE
>> on the whole tree.  Commit fdea253 seems to be responsible for introducing
>> this targeted autovectorization strategy, and AFAICT this was just done to
>> minimize the impact elsewhere while optimizing page checksums.  Are there
>> fundamental problems with adding CFLAGS_VECTORIZE everywhere?  Or is it
>> just waiting on someone to do the analysis/benchmarking?
> 
> It's already the default for gcc 12 with -O2 (looking further in the
> docs, it uses the "very-cheap" vectorization cost model), so it may be
> worth investigating what the effect of that was. I can't quickly find
> the equivalent info for clang.

My x86 machine is using gcc 9.4.0, which isn't even aware of "very-cheap".
I don't see any difference with any of the cost models, though.  It isn't
until I add -O3 that I see things like inlining pg_checksum_block into
pg_checksum_page.  -O3 is generating far more SSE2 instructions, too.

I'll have to check whether gcc 12 is finding anything else within Postgres
to autovectorize with it's "very-cheap" cost model...

> That being the case, if the difference you found was real, it must
> have been due to unrolling loops. What changed in the binary?

For gcc 9.4.0 on x86, the autovectorization flag alone indeed makes no
difference, while the loop unrolling one does.  For Apple clang 14.0.0 on
an M2, both flags seem to generate very different machine code.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Next
From: Thomas Munro
Date:
Subject: Re: pgsql: Don't trust unvalidated xl_tot_len.