On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote:
> On Wed, 22 Nov 2023 at 11:44, John Naylor <johncnaylorls@gmail.com> wrote:
>> Poking in those files a bit, I also see references to building with
>> SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect
>> function call is surely worth it for page-sized data)
Yes, I think we should, but we also need to be careful not to hurt
performance on platforms that aren't able to benefit [0] [1].
There are a couple of other threads about adding support for newer
instructions [2] [3], and properly detecting the availability of these
instructions seems to be a common obstacle. We have a path forward for
stuff that's already using a runtime check (e.g., CRC32C), but I think
we're still trying to figure out what to do for things that must be inlined
(e.g., simd.h).
One half-formed idea I have is to introduce some sort of ./configure flag
that enables all the newer instructions that your CPU understands. It
would also remove any existing runtime checks. This option would make it
easy to take advantage of the newer instructions if you are building
Postgres for only your machine (or others just like it).
> For reference, executing the page checksum 10M times on a AMD 3900X CPU:
>
> clang-14 -O2 4.292s (17.8 GiB/s)
> clang-14 -O2 -msse4.1 2.859s (26.7 GiB/s)
> clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s)
Nice. I've noticed similar improvements with AVX2 intrinsics in simd.h.
[0] https://postgr.es/m/2613682.1698779776%40sss.pgh.pa.us
[1] https://postgr.es/m/36329.1699325578%40sss.pgh.pa.us
[2] https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
[3] https://postgr.es/m/DB9PR08MB6991329A73923BF8ED4B3422F5DBA@DB9PR08MB6991.eurprd08.prod.outlook.com
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com