I created an example that is a little bit closer to the actual code and changed the compiler from C++ to C. It is interesting the optimization that the compiler has chosen for version 1 versus version 2. One callsmemcpy and one doesn't. There is a good chance the inlining of memcpy as SSE+scalar per iterationwill be faster for syscache scans-- which I believe are usually small (1-4 keys?).
Probably the only reason to do this patch would be if N is normally large or if this is considered animprovement in code clarity without a detrimental impact on small N syscache scans. I realize you only said "possible small optimization". It might be worthwhile to benchmark the code for different values of n to determine if there is a tipping point either way?
pgsql-hackers by date:
Соглашаюсь с условиями обработки персональных данных