Thread: GISTSTATE is too large

GISTSTATE is too large

From
Andres Freund
Date:
Hi,

On twitter it was mentioned [1] that gist index builds spend a lot of time
in FunctionCall3Coll. Which could be addressed to a good degree by not
using FunctionCall3Coll() which needs to call InitFunctionCallInfoData()
every time, but instead doing so once by including the FunctionCallInfo
in GISTSTATE.

Which made me look at GISTSTATEs layout. And, uh, I was a bit shocked:
struct GISTSTATE {
        MemoryContext              scanCxt;              /*     0     8 */
        MemoryContext              tempCxt;              /*     8     8 */
        TupleDesc                  leafTupdesc;          /*    16     8 */
        TupleDesc                  nonLeafTupdesc;       /*    24     8 */
        TupleDesc                  fetchTupdesc;         /*    32     8 */
        FmgrInfo                   consistentFn[32];     /*    40  1536 */
        /* --- cacheline 24 boundary (1536 bytes) was 40 bytes ago --- */
        FmgrInfo                   unionFn[32];          /*  1576  1536 */
...
        /* --- cacheline 216 boundary (13824 bytes) was 40 bytes ago --- */
        Oid                        supportCollation[32]; /* 13864   128 */

        /* size: 13992, cachelines: 219, members: 15 */
        /* last cacheline: 40 bytes */
};

So the basic GISTSTATE is 14kB large. And all the information needed to
call support functions for one attribute is spaced so far appart that
it's guaranteed to be on different cachelines and to be very unlikely to
be prefetched by the hardware prefetcher.

It seems pretty clear that this should be changed to be something more
like

typedef struct GIST_COL_STATE
{
    FmgrInfo    consistentFn;
    FmgrInfo    unionFn;
    FmgrInfo    compressFn;
    FmgrInfo    decompressFn;
    FmgrInfo    penaltyFn;
    FmgrInfo    picksplitFn;
    FmgrInfo    equalFn;
    FmgrInfo    distanceFn;
    FmgrInfo    fetchFn;

    /* Collations to pass to the support functions */
    Oid            supportCollation;
} GIST_COL_STATE;

typedef struct GISTSTATE
{
    MemoryContext scanCxt;        /* context for scan-lifespan data */
    MemoryContext tempCxt;        /* short-term context for calling functions */

    TupleDesc    leafTupdesc;    /* index's tuple descriptor */
    TupleDesc    nonLeafTupdesc; /* truncated tuple descriptor for non-leaf
                                 * pages */
    TupleDesc    fetchTupdesc;    /* tuple descriptor for tuples returned in an
                                 * index-only scan */

        GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER];
}

with initGISTstate allocating based on
IndexRelationGetNumberOfKeyAttributes() instead of using a constant.

And then subsequently change GIST_COL_STATE to embed the
FunctionCallInfo, rather than initializiing them on the stack for every
call.


I'm not planning on doing this work, but I thought it's sensible to send
to the list anyway.

Greetings,

Andres Freund

[1] https://twitter.com/komzpa/status/1386420422225240065



Re: GISTSTATE is too large

From
Andrey Borodin
Date:

> 26 апр. 2021 г., в 03:20, Andres Freund <andres@anarazel.de> написал(а):
>
> So the basic GISTSTATE is 14kB large. And all the information needed to
> call support functions for one attribute is spaced so far appart that
> it's guaranteed to be on different cachelines and to be very unlikely to
> be prefetched by the hardware prefetcher.
>
> It seems pretty clear that this should be changed to be something more
> like
>
> ...
>
> with initGISTstate allocating based on
> IndexRelationGetNumberOfKeyAttributes() instead of using a constant.
>
> And then subsequently change GIST_COL_STATE to embed the
> FunctionCallInfo, rather than initializiing them on the stack for every
> call.
Yes, this makes sense. Also, it's viable to reorder fields to group scan and insert routines together, currently they
areinterlaced. 
Or maybe we could even split state into insert state and scan state.


> I'm not planning on doing this work, but I thought it's sensible to send
> to the list anyway.

Thanks for idea, I would give it a shot this summer, unless someone else will take it earlier.
BTW, It would make sense to avoid penalty call at all: there are many GiST-based access methods that want to see all
itemstogether to choose insertion subtree (e.g. R*-tree and RR-tree). Calling penalty function for each tuple on page
oftenis not a good idea at all. 

Thanks!

Best regards, Andrey Borodin.


Re: GISTSTATE is too large

From
Andreas Karlsson
Date:
On 4/26/21 12:20 AM, Andres Freund wrote:
> It seems pretty clear that this should be changed to be something more
> like
> 
> [...]
> 
> with initGISTstate allocating based on
> IndexRelationGetNumberOfKeyAttributes() instead of using a constant.
> 
> And then subsequently change GIST_COL_STATE to embed the
> FunctionCallInfo, rather than initializiing them on the stack for every
> call.
> 
> 
> I'm not planning on doing this work, but I thought it's sensible to send
> to the list anyway.

I did the first part since it seemed easy enough and an obvious win for 
all workloads.

Andreas

Attachment

Re: GISTSTATE is too large

From
Zhihong Yu
Date:


On Sun, May 30, 2021 at 6:14 AM Andreas Karlsson <andreas@proxel.se> wrote:
On 4/26/21 12:20 AM, Andres Freund wrote:
> It seems pretty clear that this should be changed to be something more
> like
>
> [...]
>
> with initGISTstate allocating based on
> IndexRelationGetNumberOfKeyAttributes() instead of using a constant.
>
> And then subsequently change GIST_COL_STATE to embed the
> FunctionCallInfo, rather than initializiing them on the stack for every
> call.
>
>
> I'm not planning on doing this work, but I thought it's sensible to send
> to the list anyway.

I did the first part since it seemed easy enough and an obvious win for
all workloads.

Andreas

Hi,
Minor comment:

+   /* Collations to pass to the support functions */
+   Oid         supportCollation;

 Collations -> Collation
The field used to be an array. Now it is one Oid.

Cheers

Re: GISTSTATE is too large

From
Andres Freund
Date:
Hi,

On 2021-05-30 15:14:33 +0200, Andreas Karlsson wrote:
> I did the first part since it seemed easy enough and an obvious win for all
> workloads.

Cool!


> +typedef struct GIST_COL_STATE
> +{
> +    FmgrInfo    consistentFn;
> +    FmgrInfo    unionFn;
> +    FmgrInfo    compressFn;
> +    FmgrInfo    decompressFn;
> +    FmgrInfo    penaltyFn;
> +    FmgrInfo    picksplitFn;
> +    FmgrInfo    equalFn;
> +    FmgrInfo    distanceFn;
> +    FmgrInfo    fetchFn;
> +
> +    /* Collations to pass to the support functions */
> +    Oid            supportCollation;
> +} GIST_COL_STATE;
> +
>  /*
>   * GISTSTATE: information needed for any GiST index operation
>   *
> @@ -83,18 +99,7 @@ typedef struct GISTSTATE
>      TupleDesc    fetchTupdesc;    /* tuple descriptor for tuples returned in an
>                                   * index-only scan */
>  
> -    FmgrInfo    consistentFn[INDEX_MAX_KEYS];
> -    FmgrInfo    unionFn[INDEX_MAX_KEYS];
> -    FmgrInfo    compressFn[INDEX_MAX_KEYS];
> -    FmgrInfo    decompressFn[INDEX_MAX_KEYS];
> -    FmgrInfo    penaltyFn[INDEX_MAX_KEYS];
> -    FmgrInfo    picksplitFn[INDEX_MAX_KEYS];
> -    FmgrInfo    equalFn[INDEX_MAX_KEYS];
> -    FmgrInfo    distanceFn[INDEX_MAX_KEYS];
> -    FmgrInfo    fetchFn[INDEX_MAX_KEYS];
> -
> -    /* Collations to pass to the support functions */
> -    Oid            supportCollation[INDEX_MAX_KEYS];
> +    GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER];
>  } GISTSTATE;

This makes me wonder if the better design would be to keep the layout of
dense arrays for each type of function, but to make it more dense by
allocating dynamically. As above GIST_COL_STATE is 436 bytes (I think),
i.e. *well* above a cache line - far enough apart that accessing
different column's equalFn or such will be hard for the hardware
prefetcher to understand. Presumably not all functions are accessed all
the time.

So we could lay it out as

struct GISTSTATE
{
...
    FmgrInfo        *consistentFn;
    FmgrInfo        *unionFn;
...
}
[ncolumns consistentFns follow]
[ncolumns unionFn's follow]

Which'd likely end up with better cache locality for gist indexes with a
few columns. At the expense of a pointer indirection, of course.


Another angle: I don't know how it is for GIST, but for btree, the
FunctionCall2Coll() etc overhead shows up significantly - directly
allocating the FunctionCallInfo and initializing it once, instead of
every call, reduces overhead noticeably (but is a bit complicated to
implement, due to the insertion scan and stuff). I'd be surprised if we
didn't get better performance for gist if it had initialized-once
FunctionCallInfos intead of the FmgrInfos.

And that's not even just true because of needing to re-initialize
FunctionCallInfo on every call, but also because the function call
itself rarely accesses the data from the FmgrInfo, but always accesses
the FunctionCallInfo. And a FunctionCallInfos with 1 argument is the
same size as a FmgrInfo, with 2 it's 16 bytes more. So storing the
once-initialized FunctionCallInfo results in considerably better
locality, by not having all the FmgrInfos in cache.

One annoying bit: Right now it's not trivial to declare arrays of
specific-number-of-arguments FunctionCallInfos. See the uglyness of
LOCAL_FCINFO. I think we should consider having a bunch of typedefs for
1..3 argument FunctionCallInfo structs to make that easier. Probably
would still need union trickery, but it'd not be *too* bad I think.

Greetings,

Andres Freund