Re: Avoid stack frame setup in performance critical routines using tail calls - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Avoid stack frame setup in performance critical routines using tail calls
Date
Msg-id 20210720155723.dau4xqsnfq72uih5@alap3.anarazel.de
Whole thread Raw
In response to Re: Avoid stack frame setup in performance critical routines using tail calls  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hi,

On 2021-07-20 19:37:46 +1200, David Rowley wrote:
> On Tue, 20 Jul 2021 at 19:04, Andres Freund <andres@anarazel.de> wrote:
> > > * AllocateSetAlloc.txt
> > > * palloc.txt
> > > * percent.txt
> >
> > Huh, that's interesting. You have some control flow enforcement stuff turned on (the endbr64). And it looks like it
hasa non zero cost (or maybe it's just skid). Did you enable that intentionally? If not, what compiler/version/distro
isit? I think at least on GCC that's -fcf-protection=...
 
>
> It's ubuntu 21.04 with gcc 10.3 (specifically gcc version 10.3.0
> (Ubuntu 10.3.0-1ubuntu1)
>
> I've attached the same results from compiling with clang 12
> (12.0.0-3ubuntu1~21.04.1)

It looks like the ubuntu folks have changed the default for CET to on.


andres@ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2  -c -o test.o test.c && objdump -S test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:    f3 0f 1e fa              endbr64
   4:    b8 11 00 00 00           mov    $0x11,%eax
   9:    c3                       retq
andres@ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -fcf-protection=none -c -o test.o test.c &&
objdump-S test.o
 

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:    b8 11 00 00 00           mov    $0x11,%eax
   5:    c3                       retq


Independent of this patch, it might be worth running a benchmark with
the default options, and one with -fcf-protection=none. None of my
machines support it...

$ cpuid -1|grep CET
      CET_SS: CET shadow stack                 = false
      CET_IBT: CET indirect branch tracking    = false
         XCR0 supported: CET_U state          = false
         XCR0 supported: CET_S state          = false

Here it adds about 40kB of .text, but I can't measure the CET
overhead...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ronan Dunklau
Date:
Subject: Re: Early Sort/Group resjunk column elimination.
Next
From: Alvaro Herrera
Date:
Subject: Re: Question about non-blocking mode in libpq