On Sun, Mar 26, 2017 at 12:22 AM, Andres Freund <andres@anarazel.de> wrote:
>> At least with current gcc (6.3.1 on Fedora 25) at -O2,
>> what I see is multiple places jumping to the same indirect jump
>> instruction :-(. It's not a total disaster: as best I can tell, all the
>> uses of EEO_JUMP remain distinct. But gcc has chosen to implement about
>> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
>> instructions that increment the "op" register and then do an indirect
>> jump :-(.
>
> Yea, I see some of that too - "usually" when there's more than just the
> jump in common. I think there's some gcc variables that influence this
> (min-crossjump-insns (5), max-goto-duplication-insns (8)). Might be
> worthwhile experimenting with setting them locally via a pragma or such.
> I think Aants wanted to experiment with that, too.
I haven't had the time to research this properly, but initial tests
show that with GCC 6.2 adding
#pragma GCC optimize ("no-crossjumping")
fixes merging of the op tail jumps.
Some quick and dirty benchmarking suggests that the benefit for the
interpreter is about 15% (5% speedup on a workload that spends 1/3 in
ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
vars before the indirect jump is somewhere between a tiny benefit and
no effect, certainly not worth introducing extra complexity. Clang 3.8
does the correct thing out of the box and is a couple of percent
faster than GCC with the pragma.
Regards,
Ants Aasma