Thread: Re: Should pg 11 use a lot more memory building an spgist index?
I wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> How about modifying SysScanDescData to have a memory context member, >> which is created by systable_beginscan and destroyed by endscan? > I think it would still have problems, in that it would affect in which > context index_getnext delivers its output. We could reasonably make > these sorts of changes in places where the entire index_beginscan/ > index_getnext/index_endscan call structure is in one place, but that > doesn't apply for the systable functions. Actually, after looking more closely, index_getnext doesn't return a palloc'd object anyway, or at least not one that the caller is responsible for managing. So maybe that could work. I was confused about why the memory leak in Bruno's example is so much larger in HEAD than v11; spgbeginscan does allocate more stuff than before, but only if numberOfOrderBys > 0, which surely doesn't apply for the exclusion-check code path. Eventually I realized that the difference is because commit 2a6368343 made SpGistScanOpaqueData a good deal larger than it had been (10080 vs 6264 bytes, on my x86_64 box), so it was just the failure to pfree that struct that accounted for the bigger leak. There's another issue with 2a6368343, though: it added a couple of fmgr_info_copy calls to spgbeginscan. I don't understand why it'd be a good idea to do that rather than using the FmgrInfos in the index's relcache entry directly. Making a copy every time risks a memory leak, because spgendscan has no way to free any fn_extra data that gets allocated by the called function; and by the same token it's inefficient, because the function has no way to cache fn_extra data across queries. If we consider only the leak aspect, this coding presents a reason why we should try to change things as Alvaro suggests. However, because of the poor-caching aspect, I think this code is pretty broken anyway, and once we fix it the issue is much less clear-cut. (Looking around, it seems like we're very schizophrenic about whether to copy index relcache support function FmgrInfos or just use them directly. But nbtree and hash seem to always do the latter, so I think it's probably reasonable to standardize on that.) Anyway, the attached proposed patch for HEAD makes Bruno's problem go away, and it also fixes an unrelated leak introduced by 2a6368343 because it reallocates so->orderByTypes each time through spgrescan. I think we should apply this, and suitable subsets in the back branches, to fix the immediate problem. Then we can work at leisure on a more invasive HEAD-only patch, if anyone is excited about that. regards, tom lane diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c index a63fde2..c883ae9 100644 *** a/src/backend/access/spgist/spgscan.c --- b/src/backend/access/spgist/spgscan.c *************** spgbeginscan(Relation rel, int keysz, in *** 294,303 **** /* Set up indexTupDesc and xs_hitupdesc in case it's an index-only scan */ so->indexTupDesc = scan->xs_hitupdesc = RelationGetDescr(rel); if (scan->numberOfOrderBys > 0) { ! so->zeroDistances = palloc(sizeof(double) * scan->numberOfOrderBys); ! so->infDistances = palloc(sizeof(double) * scan->numberOfOrderBys); for (i = 0; i < scan->numberOfOrderBys; i++) { --- 294,311 ---- /* Set up indexTupDesc and xs_hitupdesc in case it's an index-only scan */ so->indexTupDesc = scan->xs_hitupdesc = RelationGetDescr(rel); + /* Allocate various arrays needed for order-by scans */ if (scan->numberOfOrderBys > 0) { ! /* This will be filled in spgrescan, but allocate the space here */ ! so->orderByTypes = (Oid *) ! palloc(sizeof(Oid) * scan->numberOfOrderBys); ! ! /* These arrays have constant contents, so we can fill them now */ ! so->zeroDistances = (double *) ! palloc(sizeof(double) * scan->numberOfOrderBys); ! so->infDistances = (double *) ! palloc(sizeof(double) * scan->numberOfOrderBys); for (i = 0; i < scan->numberOfOrderBys; i++) { *************** spgbeginscan(Relation rel, int keysz, in *** 305,313 **** so->infDistances[i] = get_float8_infinity(); } ! scan->xs_orderbyvals = palloc0(sizeof(Datum) * scan->numberOfOrderBys); ! scan->xs_orderbynulls = palloc(sizeof(bool) * scan->numberOfOrderBys); ! memset(scan->xs_orderbynulls, true, sizeof(bool) * scan->numberOfOrderBys); } fmgr_info_copy(&so->innerConsistentFn, --- 313,324 ---- so->infDistances[i] = get_float8_infinity(); } ! scan->xs_orderbyvals = (Datum *) ! palloc0(sizeof(Datum) * scan->numberOfOrderBys); ! scan->xs_orderbynulls = (bool *) ! palloc(sizeof(bool) * scan->numberOfOrderBys); ! memset(scan->xs_orderbynulls, true, ! sizeof(bool) * scan->numberOfOrderBys); } fmgr_info_copy(&so->innerConsistentFn, *************** spgrescan(IndexScanDesc scan, ScanKey sc *** 336,341 **** --- 347,353 ---- memmove(scan->keyData, scankey, scan->numberOfKeys * sizeof(ScanKeyData)); + /* initialize order-by data if needed */ if (orderbys && scan->numberOfOrderBys > 0) { int i; *************** spgrescan(IndexScanDesc scan, ScanKey sc *** 343,350 **** memmove(scan->orderByData, orderbys, scan->numberOfOrderBys * sizeof(ScanKeyData)); - so->orderByTypes = (Oid *) palloc(sizeof(Oid) * scan->numberOfOrderBys); - for (i = 0; i < scan->numberOfOrderBys; i++) { ScanKey skey = &scan->orderByData[i]; --- 355,360 ---- *************** spgendscan(IndexScanDesc scan) *** 380,390 **** --- 390,411 ---- MemoryContextDelete(so->tempCxt); MemoryContextDelete(so->traversalCxt); + if (so->keyData) + pfree(so->keyData); + + if (so->state.deadTupleStorage) + pfree(so->state.deadTupleStorage); + if (scan->numberOfOrderBys > 0) { + pfree(so->orderByTypes); pfree(so->zeroDistances); pfree(so->infDistances); + pfree(scan->xs_orderbyvals); + pfree(scan->xs_orderbynulls); } + + pfree(so); } /*
On 2018/10/30 4:48, Tom Lane wrote: > I was confused about why the memory leak in Bruno's example is so much > larger in HEAD than v11; spgbeginscan does allocate more stuff than > before, but only if numberOfOrderBys > 0, which surely doesn't apply for > the exclusion-check code path. Eventually I realized that the difference > is because commit 2a6368343 made SpGistScanOpaqueData a good deal larger > than it had been (10080 vs 6264 bytes, on my x86_64 box), so it was just > the failure to pfree that struct that accounted for the bigger leak. > > There's another issue with 2a6368343, though: it added a couple of > fmgr_info_copy calls to spgbeginscan. I don't understand why it'd be a > good idea to do that rather than using the FmgrInfos in the index's > relcache entry directly. Making a copy every time risks a memory leak, > because spgendscan has no way to free any fn_extra data that gets > allocated by the called function; and by the same token it's inefficient, > because the function has no way to cache fn_extra data across queries. > > If we consider only the leak aspect, this coding presents a reason why > we should try to change things as Alvaro suggests. However, because of > the poor-caching aspect, I think this code is pretty broken anyway, > and once we fix it the issue is much less clear-cut. > > (Looking around, it seems like we're very schizophrenic about whether > to copy index relcache support function FmgrInfos or just use them > directly. But nbtree and hash seem to always do the latter, so I think > it's probably reasonable to standardize on that.) Agreed about trying to avoid fmgr_info_copy(), at least in this case. By the way, do I get it right that the reason to want to avoid copying in this instance is that the currently active context could be a long-lived one, as in the case of create index, alter table add constraint, etc. calling the copying code for every tuple? There are other instances of fmgr_info_copy throughout index AM implementations, including the helper function ScanKeyEntryInitializeWithInfo() called from them, but the copies made in those cases don't appear very leak-prone. > Anyway, the attached proposed patch for HEAD makes Bruno's problem go > away, and it also fixes an unrelated leak introduced by 2a6368343 > because it reallocates so->orderByTypes each time through spgrescan. > I think we should apply this, and suitable subsets in the back branches, > to fix the immediate problem. Then we can work at leisure on a more > invasive HEAD-only patch, if anyone is excited about that. Makes sense to fix it like this for back-patching. Thanks, Amit
On 2018/10/30 4:48, Tom Lane wrote: > I wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> How about modifying SysScanDescData to have a memory context member, >>> which is created by systable_beginscan and destroyed by endscan? > >> I think it would still have problems, in that it would affect in which >> context index_getnext delivers its output. We could reasonably make >> these sorts of changes in places where the entire index_beginscan/ >> index_getnext/index_endscan call structure is in one place, but that >> doesn't apply for the systable functions. > > Actually, after looking more closely, index_getnext doesn't return a > palloc'd object anyway, or at least not one that the caller is responsible > for managing. So maybe that could work. Instead of SysScanDescData, could we add one to IndexScanData? It's somewhat clear that catalog scans don't leak much, but user index scans can, as seen in this case. In this particular case, I see that it's IndexCheckExclusion() that invokes check_unique_or_exclusion_constraint() repeatedly for each heap tuple after finishing building index on the heap. The latter performs index_beginscan/endscan() for every heap tuple, but doesn't bother to release the memory allocated for IndexScanDesc and its members. I've tried to fix that with the attached patches. 0001 adds the ability for the callers of index_beginscan to specify a memory context. index_beginscan_internals switches to that context before calling ambeginscan and stores into a new field xs_scan_cxt of IndexScanData, but maybe storing it is unnecessary. 0002 builds on that and adds the ability for the callers of check_exclusion_or_unique_constraints to specify a memory context. Using that, it fixes the leak by teaching IndexCheckExclusion and ExecIndexInsertTuples to pass a memory context that's reset before calling check_exclusion_or_unique_constraints() for the next tuple. The following example shows that the patch works. With HEAD: create table foo (a int4range); alter table foo add exclude using spgist (a with &&); -- this consumes about 1GB insert into foo select int4range(i,i+1) from generate_series(1, 100000) i; alter table foo drop constraint foo_a_excl; -- this too alter table foo add exclude using spgist (a with &&); Patched: create table foo (a int4range); alter table foo add exclude using spgist (a with &&); -- memory consumption doesn't grow above 37MB insert into foo select int4range(i,i+1) from generate_series(1, 100000) i; alter table foo drop constraint foo_a_excl; -- ditto alter table foo add exclude using spgist (a with &&); Thoughts? Thanks, Amit
Attachment
On Tue, Oct 30, 2018 at 7:11 PM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > On 2018/10/30 4:48, Tom Lane wrote: > > I wrote: > >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >>> How about modifying SysScanDescData to have a memory context member, > >>> which is created by systable_beginscan and destroyed by endscan? > > > >> I think it would still have problems, in that it would affect in which > >> context index_getnext delivers its output. We could reasonably make > >> these sorts of changes in places where the entire index_beginscan/ > >> index_getnext/index_endscan call structure is in one place, but that > >> doesn't apply for the systable functions. > > > > Actually, after looking more closely, index_getnext doesn't return a > > palloc'd object anyway, or at least not one that the caller is responsible > > for managing. So maybe that could work. > > Instead of SysScanDescData, could we add one to IndexScanData? It's > somewhat clear that catalog scans don't leak much, but user index scans > can, as seen in this case. > > In this particular case, I see that it's IndexCheckExclusion() that > invokes check_unique_or_exclusion_constraint() repeatedly for each heap > tuple after finishing building index on the heap. The latter performs > index_beginscan/endscan() for every heap tuple, but doesn't bother to > release the memory allocated for IndexScanDesc and its members. > > I've tried to fix that with the attached patches. > > 0001 adds the ability for the callers of index_beginscan to specify a > memory context. index_beginscan_internals switches to that context before > calling ambeginscan and stores into a new field xs_scan_cxt of > IndexScanData, but maybe storing it is unnecessary. > > 0002 builds on that and adds the ability for the callers of > check_exclusion_or_unique_constraints to specify a memory context. Using > that, it fixes the leak by teaching IndexCheckExclusion and > ExecIndexInsertTuples to pass a memory context that's reset before calling > check_exclusion_or_unique_constraints() for the next tuple. > > The following example shows that the patch works. > > With HEAD: > > create table foo (a int4range); > alter table foo add exclude using spgist (a with &&); > -- this consumes about 1GB > insert into foo select int4range(i,i+1) from generate_series(1, 100000) i; > alter table foo drop constraint foo_a_excl; > -- this too > alter table foo add exclude using spgist (a with &&); > > Patched: > > create table foo (a int4range); > alter table foo add exclude using spgist (a with &&); > -- memory consumption doesn't grow above 37MB > insert into foo select int4range(i,i+1) from generate_series(1, 100000) i; > alter table foo drop constraint foo_a_excl; > -- ditto > alter table foo add exclude using spgist (a with &&); Sorry I forgot to try the example with your patch. Maybe, it will have more or less the same behavior as mine, although I didn't realize that when I started writing my patch. Thanks, Amit
On 2018/10/30 21:27, Amit Langote wrote: > On Tue, Oct 30, 2018 at 7:11 PM Amit Langote >> I've tried to fix that with the attached patches. >> >> 0001 adds the ability for the callers of index_beginscan to specify a >> memory context. index_beginscan_internals switches to that context before >> calling ambeginscan and stores into a new field xs_scan_cxt of >> IndexScanData, but maybe storing it is unnecessary. >> >> 0002 builds on that and adds the ability for the callers of >> check_exclusion_or_unique_constraints to specify a memory context. Using >> that, it fixes the leak by teaching IndexCheckExclusion and >> ExecIndexInsertTuples to pass a memory context that's reset before calling >> check_exclusion_or_unique_constraints() for the next tuple. >> >> The following example shows that the patch works. >> >> With HEAD: >> >> create table foo (a int4range); >> alter table foo add exclude using spgist (a with &&); >> -- this consumes about 1GB >> insert into foo select int4range(i,i+1) from generate_series(1, 100000) i; >> alter table foo drop constraint foo_a_excl; >> -- this too >> alter table foo add exclude using spgist (a with &&); >> >> Patched: >> >> create table foo (a int4range); >> alter table foo add exclude using spgist (a with &&); >> -- memory consumption doesn't grow above 37MB >> insert into foo select int4range(i,i+1) from generate_series(1, 100000) i; >> alter table foo drop constraint foo_a_excl; >> -- ditto >> alter table foo add exclude using spgist (a with &&); > > Sorry I forgot to try the example with your patch. Maybe, it will > have more or less the same behavior as mine, although I didn't realize > that when I started writing my patch. Yep, I checked that fix-spgist-memory-leaks-1.patch posted upthread gives almost the same numbers, i.e., the maximum amount of memory consumed. Maybe, we don't need to spoil the interface of index_beginscan with the new memory context argument like my patch does if the simple following of its contract by amendscan would suffice. Thanks, Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > Maybe, we don't need to spoil the interface of index_beginscan with the > new memory context argument like my patch does if the simple following of > its contract by amendscan would suffice. Yeah, I'm not enamored of changing the API of index_beginscan for this; the existing convention that it should allocate in CurrentMemoryContext seems perfectly suitable. And changing it would create a lot of code churn, not only for us but for externally-maintained AMs. What is at stake is changing the API of amendscan, to specify that it need not release memory because the current context is expected to be destroyed or reset shortly after ending the scan. Then, for the small number of call sites where that wouldn't be true, it's on those callers to set up a suitable context and switch into it. Note this is actually forwards compatible, in that an AM that's still following the convention of releasing stuff manually would not be broken. regards, tom lane
I see that a fix got committed. Thanks! I'll double check it after the point release comes out (which looks like it will be next week) and let you know if there is still a problem.