On Thu, 26 Feb 2026 at 09:29, Andres Freund <andres@anarazel.de> wrote:
> Huh. It, at least partially, seems to be related to using an integer for
> attnum et al. Due to us using -fwrapv, the compiler can't actually assume that
> an attnum++ won't overflow. An overflow would make the loop trip counts a lot
> more complicated. Even with that I don't understand how it ends up
> generating such crappy code, but since using size_t fixes it...
Thanks. That seems to make the gcc compiled version quite a bit better.
I am still seeing a bit of register overflow as the TupleDesc is
written to the stack and reloaded back into a register a couple of
times. I've attached the objdump in question.
if (attnum < firstNonGuaranteedAttr)
1c3c: 48 39 e8 cmp rax,rbp
1c3f: 73 7f jae 1cc0 <tts_heap_getsomeattrs+0x110>
1c41: 48 89 54 24 f0 mov QWORD PTR [rsp-0x10],rdx
1c46: 48 8d 74 c2 20 lea rsi,[rdx+rax*8+0x20]
the tupledesc is put back into the register in:
off += cattr->attlen;
1f88: 48 8b 54 24 f0 mov rdx,QWORD PTR [rsp-0x10]
I've not found a way to have gcc not do this.
I've also resequenced the patches so 0002 contains the sibling call
optimisation for slot_getmissingattrs() and I've applied that tail
call optimisation that you mentioned for slot_getmissingattrs() in
0004.
I've attached benchmark results in the attached spreadsheet.
David