Thread: gcc -ansi versus SSE4.2 detection
[ this is a bit roundabout, bear with me ] I noticed that, contrary to project policy, a //-style comment snuck into pg_regress.c a month or two back. I had had the idea that buildfarm member pademelon would complain about such comments, given its stone-age C compiler, but evidently not. After some experimentation it seems that "gcc -ansi" can be used to throw errors for // comments, so I'm planning to enable that flag on dromedary. However I found out that adding -ansi also caused configure to stop selecting "-msse4.2", which seemed odd, since that switch has no bearing on C language conformance. And it fell back to the slicing-by-8 CRC implementation too. Investigation showed that that's because -ansi causes the compiler to stop defining __SSE4_2__, which configure supposes must get defined if we are targeting an SSE 4.2 processor. This seems a bit dumb to me: if we have determined that "-msse4.2" works, isn't that sufficient evidence that we can use the intrinsics? Or if not, isn't there a less fragile way to find out the target? Also, it seems like the logic in configure.in is broken in any case: if we are able to compile the intrinsics, should it not pick the runtime-determination option for CRC? It isn't doing that. regards, tom lane
On 06/05/2015 09:27 PM, Tom Lane wrote: > [ this is a bit roundabout, bear with me ] > > I noticed that, contrary to project policy, a //-style comment snuck into > pg_regress.c a month or two back. I had had the idea that buildfarm > member pademelon would complain about such comments, given its stone-age > C compiler, but evidently not. After some experimentation it seems that > "gcc -ansi" can be used to throw errors for // comments, so I'm planning > to enable that flag on dromedary. However I found out that adding -ansi > also caused configure to stop selecting "-msse4.2", which seemed odd, > since that switch has no bearing on C language conformance. And it fell > back to the slicing-by-8 CRC implementation too. Hmm, that's odd. -ansi has no effect on the CRC implementation on my system. > Investigation showed that that's because -ansi causes the compiler to > stop defining __SSE4_2__, which configure supposes must get defined if > we are targeting an SSE 4.2 processor. This seems a bit dumb to me: > if we have determined that "-msse4.2" works, isn't that sufficient > evidence that we can use the intrinsics? Or if not, isn't there a less > fragile way to find out the target? > > Also, it seems like the logic in configure.in is broken in any case: > if we are able to compile the intrinsics, should it not pick the > runtime-determination option for CRC? It isn't doing that. It's quite subtle. The point of the __SSE4.2__ test is to determine if we are targeting a system that has SSE4.2 instructions. If it's defined, then we can assume that SSE4.2 instructions are always available. For example, if you pass CFLAGS=-msse4.2, gcc can freely use SSE4.2 instructions when optimizing, and the produced binary will not work on a system without SSE4.2 support. In that case, __SSE4.2__ is defined, and we don't need the run-time check or the fallback implementation, because we can also freely assume that SSE 4.2 support is available. If __SSE4.2__ is not defined, but the compiler accepts -msse4.2, that means that the compiler will normally not use SSE4.2 instructions, and the produced binary must work without them. We will use the -msse4.2 flag when compiling pg_crc32c_sse42.c, and at runtime, we check that SSE 4.2 instructions are available before using it. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 06/05/2015 09:27 PM, Tom Lane wrote: >> ... However I found out that adding -ansi >> also caused configure to stop selecting "-msse4.2", which seemed odd, >> since that switch has no bearing on C language conformance. And it fell >> back to the slicing-by-8 CRC implementation too. > Hmm, that's odd. -ansi has no effect on the CRC implementation on my system. Ummm ... I was reading the diff backwards. Actually it seems that on dromedary's platform, CFLAGS_SSE42 is set to empty by default, but forcing "-ansi" makes it get set to "-msse4.2". Evidently, (this) gcc will accept the _mm_crc32_foo intrinsics by default normally, but if you say -ansi then it won't accept them unless you also say "-msse4.2". It looks like the actual reason that we aren't using the runtime-check CRC implementation is that we can't find a way to do "cpuid" on this old version of OS X. Not sure if it's worth the time to look for one; modern versions of OS X do have __get_cpuid(). regards, tom lane
On 06/05/2015 10:07 PM, Tom Lane wrote: > It looks like the actual reason that we aren't using the runtime-check > CRC implementation is that we can't find a way to do "cpuid" on this > old version of OS X. Not sure if it's worth the time to look for one; > modern versions of OS X do have __get_cpuid(). Doesn't seem worth it to me. - Heikki