Re: Make tuple deformation faster - Mailing list pgsql-hackers

From David Rowley
Subject Re: Make tuple deformation faster
Date
Msg-id CAApHDvrRiBF2YnACgcV_=k_bHjUrLczFfU4hQPJO2VRPFczDbA@mail.gmail.com
Whole thread Raw
In response to Re: Make tuple deformation faster  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, 6 Mar 2025 at 10:17, Andres Freund <andres@anarazel.de> wrote:
> FWIW, I am fairly certain that I looked at this at an earlier state of the
> patch, and at least for me the issue wasn't that it was inherently slower to
> use the bitmask, but that it was hard to convince the compiler not generate
> worse code.
>
> IIRC the compiler generated more complicated address gathering instructions
> which are slower on some older microarchitectures, but this is a vague memory.

I've been reading GCC's assembly output with -fverbose-asm. I find it
quite hard to follow as the changes between the 16-byte and 8-byte
CompactAttribute versions are vast (see attached).

A few interesting things jump out. e.g, in master:

# execTuples.c:1080: thisatt->attcacheoff = *offp;
.loc 1 1080 26 is_stmt 0 view .LVU1468
movl %ebp, (%rax) # off, MEM[(int *)_22]

whereas with the 8-byte version, I see:

# execTuples.c:1080: thisatt->attcacheoff = *offp;
.loc 1 1080 26 is_stmt 0 view .LVU1484
movl %ebp, 24(%rax) # off, MEM[(int *)_358 + 24B]

You can see the MOVL in the 8-byte version should amount to an
additional micro op to add 24 to RAX before the dereference.

One interesting thing to note about having CompactAttribute in its
8-byte form is that the compiler is tempted into sharing a register
with the tts_values array before Datum is also 8-bytes. Note the
difference in [1] between the two left compiler outputs and the
right-hand one. You can see RCX is dedicated for addressing
CompactAttribute in the right window, but RAX is used for both arrays
in the left two.  I don't 100% know for sure that's the reason for the
slowness with the full version but it does seem from the fragment I
posted just above that RAX does need 24 bytes added in the 8 bytes
version but not in the 16 byte version, so RAX is certainly not
dedicated and ready pointing to attcacheoff at that point.

Jeff, I'm not sure if I understand this well enough to write a
meaningful comment to explain why we don't use bitflags. With my
current knowledge level on this, it's a bit hand-wavy at best. Are you
content with this, or do you want to see something written into the
header comment for CompactAttribute in the code?

David

[1] https://godbolt.org/z/7hWvqdW6E

Attachment

pgsql-hackers by date:

Previous
From: Erik Wienhold
Date:
Subject: Re: CREATE OR REPLACE MATERIALIZED VIEW
Next
From: Amit Kapila
Date:
Subject: Re: Add an option to skip loading missing publication to avoid logical replication failure