Hi,
On 2018-10-01 22:21:58 -0400, Tom Lane wrote:
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180925234509.3hrrf6tmvy5tfith@alap3.anarazel.de>
> >> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> >>> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> >>> This reduces the size of TupleTableSlot since each bool member takes
> >>> at least a byte on the platforms where bool is defined as a char.
>
> > About bitfields, an advantage of it is debugger awareness. We
> > don't need to look aside to the definitions of bitwise macros
> > while using debugger. And the current code is preserved in
> > appearance by using it.
>
> FWIW, I would expect a change like this to be a net loss performance-wise
> on most platforms. Testing the contents of a byte-wide variable is pretty
> cheap on any architecture invented later than ~ 1970. Testing a bit,
> though, requires a masking operation that is not free. I am not seeing
> how making TupleTableSlot a little smaller buys that back ... we don't
> normally have that many active slots in a plan.
I measured it as a speedup on x86-64, mainly because we require fewer
instructions to reset a slot into an empty state, but also because there
are fewer loads. Masking a register is just about free, loading from
memory isn't, even if the cacheline is in L1. The other benefit is that
this allows TupleTableSlots to fit into one cacheline more often, and
that's noticable in cache miss rates.
Greetings,
Andres Freund