On Wed, Nov 23, 2022 at 3:54 PM David Rowley <
dgrowleyml@gmail.com> wrote:
>
> Going by [1], clang will actually use multiplication by 16 to
> implement the former. gcc is better and shifts left by 4, so likely
> won't improve things for gcc. It seems worth doing it this way for
> anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.
FWIW, gcc 12.2 generates an imul on my system when compiling in situ. I've found it useful to run godbolt locally* and load the entire PG file (nicer to read than plain objdump) -- compilers can make different decisions when going from isolated snippets to within full functions.
* clone from
https://github.com/compiler-explorer/compiler-explorerinstall npm 16
run "make" and when finished will show the localhost url
add the right flags, which in this case was
-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 -I/path/to/srcdir/src/include -I/path/to/builddir/src/include -D_GNU_SOURCE