Thread: Re: micro-optimize nbtcompare.c routines

Re: micro-optimize nbtcompare.c routines

From
Nathan Bossart
Date:
On Fri, Sep 27, 2024 at 02:50:13PM +1200, David Rowley wrote:
> I had been looking at [1] (which I've added your version to now). I
> had been surprised to see gcc emitting different code for the first 3
> versions. Clang does a better job at figuring out they all do the same
> thing and emitting the same code for each.

Interesting.

> I played around with the attached (hacked up) qsort.c to see if there
> was any difference.  Likely function call overhead kills the
> performance anyway. There does not seem to be much difference between
> them. I've not tested with an inlined comparison function.

I'd expect worse performance with the branchless routines for the inlined
case.  However, I recall that clang was able to optimize med3() as well as
it can with the branching routines, so that may not always be true.

> Looking at your version, it doesn't look like there's any sort of
> improvement in terms of the instructions. Certainly, for clang, it's
> worse as it adds a shift left instruction and an additional compare.
> No jumps, at least.

I think I may have forgotten to add -O2 when I was inspecting this code
with godbolt.org earlier.  *facepalm*  The different versions look pretty
comparable with that added.

> What's your reasoning for returning INT_MIN and INT_MAX?

That's just for the compile option added by commit c87cb5f, which IIUC is
intended to test that we correctly handle comparisons that return INT_MIN.

-- 
nathan



Re: micro-optimize nbtcompare.c routines

From
Nathan Bossart
Date:
I've marked this one as Withdrawn.  Apologies for the noise.

-- 
nathan