Thread: gcc -ansi versus SSE4.2 detection

gcc -ansi versus SSE4.2 detection

From
Tom Lane
Date:
[ this is a bit roundabout, bear with me ]

I noticed that, contrary to project policy, a //-style comment snuck into
pg_regress.c a month or two back.  I had had the idea that buildfarm
member pademelon would complain about such comments, given its stone-age
C compiler, but evidently not.  After some experimentation it seems that
"gcc -ansi" can be used to throw errors for // comments, so I'm planning
to enable that flag on dromedary.  However I found out that adding -ansi
also caused configure to stop selecting "-msse4.2", which seemed odd,
since that switch has no bearing on C language conformance.  And it fell
back to the slicing-by-8 CRC implementation too.

Investigation showed that that's because -ansi causes the compiler to
stop defining __SSE4_2__, which configure supposes must get defined if
we are targeting an SSE 4.2 processor.  This seems a bit dumb to me:
if we have determined that "-msse4.2" works, isn't that sufficient
evidence that we can use the intrinsics?  Or if not, isn't there a less
fragile way to find out the target?

Also, it seems like the logic in configure.in is broken in any case:
if we are able to compile the intrinsics, should it not pick the
runtime-determination option for CRC?  It isn't doing that.
        regards, tom lane



Re: gcc -ansi versus SSE4.2 detection

From
Heikki Linnakangas
Date:
On 06/05/2015 09:27 PM, Tom Lane wrote:
> [ this is a bit roundabout, bear with me ]
>
> I noticed that, contrary to project policy, a //-style comment snuck into
> pg_regress.c a month or two back.  I had had the idea that buildfarm
> member pademelon would complain about such comments, given its stone-age
> C compiler, but evidently not.  After some experimentation it seems that
> "gcc -ansi" can be used to throw errors for // comments, so I'm planning
> to enable that flag on dromedary.  However I found out that adding -ansi
> also caused configure to stop selecting "-msse4.2", which seemed odd,
> since that switch has no bearing on C language conformance.  And it fell
> back to the slicing-by-8 CRC implementation too.

Hmm, that's odd. -ansi has no effect on the CRC implementation on my system.

> Investigation showed that that's because -ansi causes the compiler to
> stop defining __SSE4_2__, which configure supposes must get defined if
> we are targeting an SSE 4.2 processor.  This seems a bit dumb to me:
> if we have determined that "-msse4.2" works, isn't that sufficient
> evidence that we can use the intrinsics?  Or if not, isn't there a less
> fragile way to find out the target?
>
> Also, it seems like the logic in configure.in is broken in any case:
> if we are able to compile the intrinsics, should it not pick the
> runtime-determination option for CRC?  It isn't doing that.

It's quite subtle. The point of the __SSE4.2__ test is to determine if 
we are targeting a system that has SSE4.2 instructions. If it's defined, 
then we can assume that SSE4.2 instructions are always available. For 
example, if you pass CFLAGS=-msse4.2, gcc can freely use SSE4.2 
instructions when optimizing, and the produced binary will not work on a 
system without SSE4.2 support. In that case, __SSE4.2__ is defined, and 
we don't need the run-time check or the fallback implementation, because 
we can also freely assume that SSE 4.2 support is available.

If __SSE4.2__ is not defined, but the compiler accepts -msse4.2, that 
means that the compiler will normally not use SSE4.2 instructions, and 
the produced binary must work without them. We will use the -msse4.2 
flag when compiling pg_crc32c_sse42.c, and at runtime, we check that SSE 
4.2 instructions are available before using it.

- Heikki




Re: gcc -ansi versus SSE4.2 detection

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 06/05/2015 09:27 PM, Tom Lane wrote:
>> ... However I found out that adding -ansi
>> also caused configure to stop selecting "-msse4.2", which seemed odd,
>> since that switch has no bearing on C language conformance.  And it fell
>> back to the slicing-by-8 CRC implementation too.

> Hmm, that's odd. -ansi has no effect on the CRC implementation on my system.

Ummm ... I was reading the diff backwards.  Actually it seems that on
dromedary's platform, CFLAGS_SSE42 is set to empty by default, but forcing
"-ansi" makes it get set to "-msse4.2".  Evidently, (this) gcc will accept
the _mm_crc32_foo intrinsics by default normally, but if you say -ansi
then it won't accept them unless you also say "-msse4.2".

It looks like the actual reason that we aren't using the runtime-check
CRC implementation is that we can't find a way to do "cpuid" on this
old version of OS X.  Not sure if it's worth the time to look for one;
modern versions of OS X do have __get_cpuid().
        regards, tom lane



Re: gcc -ansi versus SSE4.2 detection

From
Heikki Linnakangas
Date:
On 06/05/2015 10:07 PM, Tom Lane wrote:
> It looks like the actual reason that we aren't using the runtime-check
> CRC implementation is that we can't find a way to do "cpuid" on this
> old version of OS X.  Not sure if it's worth the time to look for one;
> modern versions of OS X do have __get_cpuid().

Doesn't seem worth it to me.

- Heikki