I created an example that is a little bit closer to the actual code and changed the compiler from C++ to C.
It is interesting the optimization that the compiler has chosen for version 1 versus version 2. One calls
memcpy and one doesn't. There is a good chance the inlining of memcpy as SSE+scalar per iteration
will be faster for syscache scans-- which I believe are usually small (1-4 keys?).
Probably the only reason to do this patch would be if N is normally large or if this is considered an
improvement in code clarity without a detrimental impact on small N syscache scans.
I realize you only said "possible small optimization". It might be worthwhile to benchmark the code for
different values of n to determine if there is a tipping point either way?
-- bg
Hi.
In the functions *systable_beginscan* and *systable_beginscan_ordered*,
is possible a small optimization.
The array *idxkey* can be constructed in one go with a single call to mempcy.
The excess might not make much of a difference, but I think it's worth the effort.
patch attached.
Someone asked me if O2 does not do the work.
Apparently not.
best regards,
Ranier Vilela