Thread: GiST memory allocation
Memory allocation in access/gist/gist.c is pretty heinous, IMHO. There are retail pallocs and pfrees all over the place, and the requirements for which allocations need to be released and by whom is pretty messy. AFAICS, GiST doesn't take any advantage of the palloc() infrastructure beyond treating palloc() as a better malloc(). One relatively easy fix would be to create a per-backend "GiST context" as a static variable in gist.c. At the external entry points to gist.c (of which there are only three, gistbuild(), gistinsert(), and gistbulkdelete()), the memory context would be initialized if necessary and switched into. palloc() will still be used to allocate memory, but most/all of the retail pfrees can be eliminated; instead, we can reset the private memory context before returning to the caller of the GiST entry point (the observation here is that 99% of the allocations done by gist.c are for internal use only -- we rarely allocate anything that needs to live longer than the current GiST API call). Additional resets can be done where appropriate to reduce memory consumption -- for example, we will definitely need a reset in gistbuildCallback(). I haven't looked that carefully at the various clients of the GiST API (e.g. contrib/btree_gist and so on), but I think this should simplify them as well: rather than needing to ensure they pfree all their allocations, they can now assume that they are invoked in a relatively short-lived memory context. One downside is that the GiST API won't be reentrant, but I don't think that's a major problem. Comments? Is there a better way to clean up GiST's memory allocation? (I looked primarily at gist.c; from a brief glance at gistscan.c, it doesn't seem nearly so bad.) -Neil
Neil Conway <neilc@samurai.com> writes: > AFAICS, GiST doesn't take any advantage of the palloc() infrastructure > beyond treating palloc() as a better malloc(). This is pretty much true of all the index AMs, I think. I looked briefly at using a short-term memory context in the btree code, but gave it up after seeing that many of the allocations have to survive from one index_getnext call to the next. It didn't seem that there would be much net gain in notational simplicity. > (the observation here is that 99% of the allocations done by > gist.c are for internal use only -- we rarely allocate anything that > needs to live longer than the current GiST API call). You sure about that? In btree quite a lot of the allocs need to survive across the current scan. > One relatively easy fix would be to create a per-backend "GiST context" > as a static variable in gist.c. > ... > One downside is that the GiST API won't be reentrant, but I don't think > that's a major problem. I think this is not acceptable. It certainly wouldn't be acceptable for btree --- you couldn't even use a PL-language function as an index operator, because the PL itself would need to do system catalog accesses that could result in re-entrant btree scans. If you got away with it for GiST it would only be because GiST is a stepchild that the system doesn't depend on. That doesn't sound like the direction to go in. You could look at creating per-scan and/or per-tuple-cycle memory contexts during gistbeginscan, keeping pointers to them in the indexscan structure. However this only works for scans --- not sure if there are any data structures that are common to scans and the non-scan operations. regards, tom lane
On Tue, 2004-11-02 at 02:20, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > (the observation here is that 99% of the allocations done by > > gist.c are for internal use only -- we rarely allocate anything that > > needs to live longer than the current GiST API call). > > You sure about that? In btree quite a lot of the allocs need to survive > across the current scan. I'm specifically talking about gist.c, which just handles index creation, tuple insertion and tuple deletion -- AFAICS we only rarely need to allocate anything that lives beyond the current API call. Scans are handled by gistscan.c. It might be nice to make better use of memory contexts in both, but it's gist.c that is particularly in need at the moment, I think. > I think this is not acceptable. It certainly wouldn't be acceptable for > btree --- you couldn't even use a PL-language function as an index > operator, because the PL itself would need to do system catalog accesses > that could result in re-entrant btree scans. If you got away with it > for GiST it would only be because GiST is a stepchild that the system > doesn't depend on. That doesn't sound like the direction to go in. One alternative is to create memory contexts for each insertion / creation / deletion operation, but that is pretty ugly, and probably inefficient for insertion/deletion. I can't really think of a better solution than a static memory context. I don't think the reentrency will be a problem right now, but if it becomes a problem in the future we can solve it via some book-keeping (e.g. on entry to GiST bump a counter, when the counter == 0 and we're exiting a GiST API then reset the memory context). -Neil
Neil Conway <neilc@samurai.com> writes: > One alternative is to create memory contexts for each insertion / > creation / deletion operation, but that is pretty ugly, and probably > inefficient for insertion/deletion. I don't believe memory context creation is very much worse than a malloc (and it's certainly not that much worse than a context reset). If you can't buy back the time spent by avoiding some retail pfrees, then this whole exercise becomes very questionable anyway. > I can't really think of a better solution than a static memory context. > I don't think the reentrency will be a problem right now, but if it > becomes a problem in the future we can solve it via some book-keeping > (e.g. on entry to GiST bump a counter, when the counter == 0 and we're > exiting a GiST API then reset the memory context). And then you have created an error-recovery cleanup issue. I'm really going to have to say NO, don't do that. In my professional opinion this is a bad decision; especially so when the only real reason for doing it at all is code cleanliness. The cleanup is a bit marginal in the first place, and it is definitely not worth the price of turning re-entrant code into non-re-entrant code, even if you can't yet foresee the day when that will bite you. regards, tom lane
Tom Lane wrote: > I don't believe memory context creation is very much worse than a malloc > (and it's certainly not that much worse than a context reset). > If you can't buy back the time spent by avoiding some retail pfrees, then > this whole exercise becomes very questionable anyway. Hmm, okay -- I'll just create and destroy the contexts for each API call then. Thanks for the advice. -Neil