Re: [HACKERS] aggregation memory leak and fix - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] aggregation memory leak and fix |
Date | |
Msg-id | 199903211920.OAA28744@candle.pha.pa.us Whole thread Raw |
In response to | Re: [HACKERS] aggregation memory leak and fix (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] aggregation memory leak and fix
|
List | pgsql-hackers |
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > My only quick solution would seem to be to add a new "expression" memory > > context, that can be cleared after every tuple is processed, clearing > > out temporary values allocated inside an expression. > > Right, this whole problem of growing backend memory use during a large > SELECT (or COPY, or probably a few other things) is one of the things > that we were talking about addressing by revising the memory management > structure. > > I think what we want inside the executor is a distinction between > storage that must live to the end of the statement and storage that is > only needed while processing the current tuple. The second kind of > storage would go into a separate context that gets flushed every so > often. (It could be every tuple, or every dozen or hundred tuples > depending on what seems the best tradeoff of cycles against memory > usage.) > > I'm not sure that just two contexts is enough, either. For example in > SELECT field1, SUM(field2) GROUP BY field1; > the working memory for the SUM aggregate could not be released after > each tuple, but perhaps we don't want it to live for the whole statement > either --- in that case we'd need a per-group context. (This particular > example isn't very convincing, because the same storage for the SUM > *could* be recycled from group to group. But I don't know whether it > actually *is* reused or not. If fresh storage is palloc'd for each > instantiation of SUM then we have a per-group leak in this scenario. > In any case, I'm not sure all aggregate functions have constant memory > requirements that would let them recycle storage across groups.) > > What we need to do is work out what the best set of memory context > definitions is, and then decide on a strategy for making sure that > lower-level routines allocate their return values in the right context. > It'd be nice if the lower-level routines could still call palloc() and > not have to worry about this explicitly --- otherwise we'll break not > only a lot of our own code but perhaps a lot of user code. (User- > specific data types and SPI code all use palloc, no?) Let me make an argument here. Let's suppose that we want to free all the memory used as expression intermediate values after each row is processed. It is my understanding that all these are created in utils/adt/*.c files, and that the entry point to all those functions via fmgr()/fmgr_c(). So, if we go into an expression memory context before calling fmgr/fmgr_c in the executor, and return to the normal context after the function call, all our intermediates are trapped in the expression memory context. At the end of each row, we just free the expression memory context. In almost all cases, the data is stored in tuples, and we can free it. In a few cases like aggregates, we have to save off the value we need to keep before freeing the expression context. In fact, you could even optimize the cleanup to only do free'ing if some expression memory was allocated. In most cases, it is not. In fact the nodeAgg.c patch that I backed out attempted to do that, though because there wasn't code that checked if the Datum was pg_type.typbyval, it didn't work 100%. In fact, a quick look at the executor shows:#$ grep "fmgr(" *.c |detab -t 4execUtils.c: predString = fmgr(F_TEXTOUT, &indexStruct->indpred);nodeGroup.c: val1 = fmgr(typoutput, attr1, typelem,nodeGroup.c: val2 = fmgr(typoutput, attr2,typelem,nodeUnique.c: val1 = fmgr(typoutput, attr1, typelem,nodeUnique.c: val2 = fmgr(typoutput, attr2, typelem,spi.c: return (fmgr(foutoid, val, typelem, #$ grep "fmgr_c(" *.c |detab -t 4execQual.c: return (Datum) fmgr_c(&fcache->func, (FmgrValues *)argV, isNull);nodeAgg.c: value1[aggno] = (Datum) fmgr_c(&aggfns->xfn1,nodeAgg.c: value2[aggno] = (Datum) fmgr_c(&aggfns->xfn2,nodeAgg.c: value1[aggno] = (Datum) fmgr_c(&aggfns->finalfn, The fmgr(out*) calls are probably not an issue, because they are already cleaned up. The only issue are the fmgr_c calls. execQual is the MAJOR one for all expressions, and the nodeAgg calls would have to have some saving of the last entry done. Seems pretty straight-forward to me. The fact is we have a pretty clean memory allocation system. Not sure if a redesign is necessary if we can clean up any per-tuple allocations, and I think this would do the trick. Comments? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: