On Mon, Oct 30, 2023 at 10:36:01PM -0500, Nathan Bossart wrote:
> I tested pg_waldump -z with 50M 65-byte records for the following
> implementations on an ARM system:
>
> * slicing-by-8 : ~3.08s
> * proposed patches applied (runtime check) : ~2.44s
> * only CRC intrinsics implementation compiled : ~2.42s
> * forced inlining : ~2.38s
>
> Avoiding the runtime check produced a 0.8% improvement, and forced inlining
> produced another 1.7% improvement. In comparison, even the runtime check
> implementation produced a 20.8% improvement over the slicing-by-8 one.
After reflecting on these numbers for a bit, I think I'm still inclined to
do $SUBJECT. I considered the following:
* While it would be nice to gain a couple of percentage points for existing
hardware, I think we'll still end up doing runtime checks most of the
time once we add support for newer instructions.
* The performance improvements that the new instructions provide seem
likely to outweigh these small regressions, especially for workloads with
larger WAL records [0].
* From my quick scan of a few dozen machines on the buildfarm, it looks
like the runtime checks are already the norm, so the number of systems
that would be subject to a regression from v16 to v17 should be pretty
small, in theory. And this regression seems to be on the order of 1%
based on the numbers above.
Do folks think this is reasonable? Or should we instead try to squeeze
every last drop out of the current implementations by avoiding function
pointers, forcing inlining, etc.?
[0] https://postgr.es/m/20231025014539.GA977906%40nathanxps13
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com