Thread: GISTSTATE is too large
Hi, On twitter it was mentioned [1] that gist index builds spend a lot of time in FunctionCall3Coll. Which could be addressed to a good degree by not using FunctionCall3Coll() which needs to call InitFunctionCallInfoData() every time, but instead doing so once by including the FunctionCallInfo in GISTSTATE. Which made me look at GISTSTATEs layout. And, uh, I was a bit shocked: struct GISTSTATE { MemoryContext scanCxt; /* 0 8 */ MemoryContext tempCxt; /* 8 8 */ TupleDesc leafTupdesc; /* 16 8 */ TupleDesc nonLeafTupdesc; /* 24 8 */ TupleDesc fetchTupdesc; /* 32 8 */ FmgrInfo consistentFn[32]; /* 40 1536 */ /* --- cacheline 24 boundary (1536 bytes) was 40 bytes ago --- */ FmgrInfo unionFn[32]; /* 1576 1536 */ ... /* --- cacheline 216 boundary (13824 bytes) was 40 bytes ago --- */ Oid supportCollation[32]; /* 13864 128 */ /* size: 13992, cachelines: 219, members: 15 */ /* last cacheline: 40 bytes */ }; So the basic GISTSTATE is 14kB large. And all the information needed to call support functions for one attribute is spaced so far appart that it's guaranteed to be on different cachelines and to be very unlikely to be prefetched by the hardware prefetcher. It seems pretty clear that this should be changed to be something more like typedef struct GIST_COL_STATE { FmgrInfo consistentFn; FmgrInfo unionFn; FmgrInfo compressFn; FmgrInfo decompressFn; FmgrInfo penaltyFn; FmgrInfo picksplitFn; FmgrInfo equalFn; FmgrInfo distanceFn; FmgrInfo fetchFn; /* Collations to pass to the support functions */ Oid supportCollation; } GIST_COL_STATE; typedef struct GISTSTATE { MemoryContext scanCxt; /* context for scan-lifespan data */ MemoryContext tempCxt; /* short-term context for calling functions */ TupleDesc leafTupdesc; /* index's tuple descriptor */ TupleDesc nonLeafTupdesc; /* truncated tuple descriptor for non-leaf * pages */ TupleDesc fetchTupdesc; /* tuple descriptor for tuples returned in an * index-only scan */ GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER]; } with initGISTstate allocating based on IndexRelationGetNumberOfKeyAttributes() instead of using a constant. And then subsequently change GIST_COL_STATE to embed the FunctionCallInfo, rather than initializiing them on the stack for every call. I'm not planning on doing this work, but I thought it's sensible to send to the list anyway. Greetings, Andres Freund [1] https://twitter.com/komzpa/status/1386420422225240065
> 26 апр. 2021 г., в 03:20, Andres Freund <andres@anarazel.de> написал(а): > > So the basic GISTSTATE is 14kB large. And all the information needed to > call support functions for one attribute is spaced so far appart that > it's guaranteed to be on different cachelines and to be very unlikely to > be prefetched by the hardware prefetcher. > > It seems pretty clear that this should be changed to be something more > like > > ... > > with initGISTstate allocating based on > IndexRelationGetNumberOfKeyAttributes() instead of using a constant. > > And then subsequently change GIST_COL_STATE to embed the > FunctionCallInfo, rather than initializiing them on the stack for every > call. Yes, this makes sense. Also, it's viable to reorder fields to group scan and insert routines together, currently they areinterlaced. Or maybe we could even split state into insert state and scan state. > I'm not planning on doing this work, but I thought it's sensible to send > to the list anyway. Thanks for idea, I would give it a shot this summer, unless someone else will take it earlier. BTW, It would make sense to avoid penalty call at all: there are many GiST-based access methods that want to see all itemstogether to choose insertion subtree (e.g. R*-tree and RR-tree). Calling penalty function for each tuple on page oftenis not a good idea at all. Thanks! Best regards, Andrey Borodin.
On 4/26/21 12:20 AM, Andres Freund wrote: > It seems pretty clear that this should be changed to be something more > like > > [...] > > with initGISTstate allocating based on > IndexRelationGetNumberOfKeyAttributes() instead of using a constant. > > And then subsequently change GIST_COL_STATE to embed the > FunctionCallInfo, rather than initializiing them on the stack for every > call. > > > I'm not planning on doing this work, but I thought it's sensible to send > to the list anyway. I did the first part since it seemed easy enough and an obvious win for all workloads. Andreas
Attachment
On Sun, May 30, 2021 at 6:14 AM Andreas Karlsson <andreas@proxel.se> wrote:
On 4/26/21 12:20 AM, Andres Freund wrote:
> It seems pretty clear that this should be changed to be something more
> like
>
> [...]
>
> with initGISTstate allocating based on
> IndexRelationGetNumberOfKeyAttributes() instead of using a constant.
>
> And then subsequently change GIST_COL_STATE to embed the
> FunctionCallInfo, rather than initializiing them on the stack for every
> call.
>
>
> I'm not planning on doing this work, but I thought it's sensible to send
> to the list anyway.
I did the first part since it seemed easy enough and an obvious win for
all workloads.
Andreas
Hi,
Minor comment:
+ Oid supportCollation;
Collations -> Collation
The field used to be an array. Now it is one Oid.
Cheers
Hi, On 2021-05-30 15:14:33 +0200, Andreas Karlsson wrote: > I did the first part since it seemed easy enough and an obvious win for all > workloads. Cool! > +typedef struct GIST_COL_STATE > +{ > + FmgrInfo consistentFn; > + FmgrInfo unionFn; > + FmgrInfo compressFn; > + FmgrInfo decompressFn; > + FmgrInfo penaltyFn; > + FmgrInfo picksplitFn; > + FmgrInfo equalFn; > + FmgrInfo distanceFn; > + FmgrInfo fetchFn; > + > + /* Collations to pass to the support functions */ > + Oid supportCollation; > +} GIST_COL_STATE; > + > /* > * GISTSTATE: information needed for any GiST index operation > * > @@ -83,18 +99,7 @@ typedef struct GISTSTATE > TupleDesc fetchTupdesc; /* tuple descriptor for tuples returned in an > * index-only scan */ > > - FmgrInfo consistentFn[INDEX_MAX_KEYS]; > - FmgrInfo unionFn[INDEX_MAX_KEYS]; > - FmgrInfo compressFn[INDEX_MAX_KEYS]; > - FmgrInfo decompressFn[INDEX_MAX_KEYS]; > - FmgrInfo penaltyFn[INDEX_MAX_KEYS]; > - FmgrInfo picksplitFn[INDEX_MAX_KEYS]; > - FmgrInfo equalFn[INDEX_MAX_KEYS]; > - FmgrInfo distanceFn[INDEX_MAX_KEYS]; > - FmgrInfo fetchFn[INDEX_MAX_KEYS]; > - > - /* Collations to pass to the support functions */ > - Oid supportCollation[INDEX_MAX_KEYS]; > + GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER]; > } GISTSTATE; This makes me wonder if the better design would be to keep the layout of dense arrays for each type of function, but to make it more dense by allocating dynamically. As above GIST_COL_STATE is 436 bytes (I think), i.e. *well* above a cache line - far enough apart that accessing different column's equalFn or such will be hard for the hardware prefetcher to understand. Presumably not all functions are accessed all the time. So we could lay it out as struct GISTSTATE { ... FmgrInfo *consistentFn; FmgrInfo *unionFn; ... } [ncolumns consistentFns follow] [ncolumns unionFn's follow] Which'd likely end up with better cache locality for gist indexes with a few columns. At the expense of a pointer indirection, of course. Another angle: I don't know how it is for GIST, but for btree, the FunctionCall2Coll() etc overhead shows up significantly - directly allocating the FunctionCallInfo and initializing it once, instead of every call, reduces overhead noticeably (but is a bit complicated to implement, due to the insertion scan and stuff). I'd be surprised if we didn't get better performance for gist if it had initialized-once FunctionCallInfos intead of the FmgrInfos. And that's not even just true because of needing to re-initialize FunctionCallInfo on every call, but also because the function call itself rarely accesses the data from the FmgrInfo, but always accesses the FunctionCallInfo. And a FunctionCallInfos with 1 argument is the same size as a FmgrInfo, with 2 it's 16 bytes more. So storing the once-initialized FunctionCallInfo results in considerably better locality, by not having all the FmgrInfos in cache. One annoying bit: Right now it's not trivial to declare arrays of specific-number-of-arguments FunctionCallInfos. See the uglyness of LOCAL_FCINFO. I think we should consider having a bunch of typedefs for 1..3 argument FunctionCallInfo structs to make that easier. Probably would still need union trickery, but it'd not be *too* bad I think. Greetings, Andres Freund