Re: Reduce TupleHashEntryData struct size by half - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Reduce TupleHashEntryData struct size by half
Date
Msg-id f05cff9e-c9e6-4452-83aa-9f6dd795f92d@iki.fi
Whole thread Raw
In response to Re: Reduce TupleHashEntryData struct size by half  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 18/11/2024 22:22, Jeff Davis wrote:
> On Mon, 2024-11-18 at 12:13 +0200, Heikki Linnakangas wrote:
>> Hmm, it would seem more straightforward to store it in the beginning,
>> i.e. have something like this:
>>
>> struct {
>>        void *additional;
>>        MinimalTupleData mtup;
>> } ;
> 
> That was my first approach, but it requires an additional memcpy,
> because ExecCopySlotMinimalTuple() does it's own palloc. I used
> repalloc() because it will often have space at the end of the chunk
> anyway, and not need to memcpy(). Maybe that's not significant but it
> did seem detectable in some perf tests.
> 
> But perhaps we can go further and get rid of the "additional" pointer
> and inline the pergroup data and the grouping key tuple into the same
> palloc chunk? That would cut out a palloc and the 8 wasted bytes on the
> pointer.

Sounds like a good idea. Needs some changes to the TupleTableSlotOps 
interface to avoid the memcpy I presume.

>> Come to think of it, how important is it that we use MinimalTuple
>> here
>> at all? Some other representation could be faster to deal with in
>> TupleHashTableMatch() anyway.
> 
> What did you have in mind? That sounds like a good idea orthogonal to
> reducing the bucket size.

Queries that have a only a small number of groups might might benefit 
from storing a plain Datum/isnull array instead of a MinimalTuple. That 
would take more memory when you have a lot of groups though.

> Alternatively, MinimalTuple is not very "minimal", and perhaps we can
> just make it better.

Yeah. It tries to be compatible with HeapTuple, but perhaps we should 
give up on that and pack it more tightly instead.

>>> 0004: Removes the "status" field from TupleHashEntryData, using
>>> firstTuple==NULL to mean "empty", otherwise "in use". Hack: need an
>>> additional "special" pointer value to mean "input slot" now that
>>> NULL
>>> means "empty".
>>
>> +1
> 
> For the FIRSTTUPLE_INPUTSLOT marker, do you think it's cleaner to use
> what I did:
> 
>    const static MinimalTuple FIRSTTUPLE_INPUTSLOT = (MinimalTuple) 0x1;
> 
> or something like:
> 
>    static MinimalTupleData dummy = {0};
>    const static MinimalTuple FIRSTTUPLE_INPUTSLOT = &dummy;
> 
> ?

I think I'd do "(MinimalTuple) 0x1" myself, but no strong opinion.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: UNION versus collations
Next
From: Heikki Linnakangas
Date:
Subject: Re: Fix an error while building test_radixtree.c with TEST_SHARED_RT