Re: WIP: Faster Expression Processing v4 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WIP: Faster Expression Processing v4
Date
Msg-id 5768.1490458935@sss.pgh.pa.us
Whole thread Raw
In response to Re: WIP: Faster Expression Processing v4  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
More random musing ... have you considered making the jump-target fields
in expressions be relative rather than absolute indexes?  That is,
EEO_JUMP would look like
    op += (stepno); \    EEO_DISPATCH(); \

instead of
    op = &state->steps[stepno]; \    EEO_DISPATCH(); \

I have not carried out a full patch to make this work, but just making
that one change and examining the generated assembly code looks promising.
Instead of this
movslq    40(%r14), %r8salq    $6, %r8addq    24(%rbx), %r8movq    %r8, %r14jmp    *(%r8)

we get this
movslq    40(%r14), %raxsalq    $6, %raxaddq    %rax, %r14jmp    *(%r14)

which certainly looks like it ought to be faster.  Also, the real reason
I got interested in this at all is that with relative jumps, groups of
steps would be position-independent within the steps array, which would
enable some compile-time tricks that seem impractical with the current
definition.

BTW, now that I've spent a bit of time looking at the generated assembly
code, I'm kind of disinclined to believe any arguments about how we have
better control over branch prediction with the jump-threading
implementation.  At least with current gcc (6.3.1 on Fedora 25) at -O2,
what I see is multiple places jumping to the same indirect jump
instruction :-(.  It's not a total disaster: as best I can tell, all the
uses of EEO_JUMP remain distinct.  But gcc has chosen to implement about
40 of the 71 uses of EEO_NEXT by jumping to the same couple of
instructions that increment the "op" register and then do an indirect
jump :-(.

So it seems that we're at the mercy of gcc's whims as to which instruction
dispatches will be distinguishable to the hardware; which casts a very
dark shadow over any benchmarking-based arguments that X is better than Y
for branch prediction purposes.  Compiler version differences are likely
to matter a lot more than anything we do.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Monitoring roles patch
Next
From: Stephen Frost
Date:
Subject: Re: increasing the default WAL segment size