Thread: GiST: memory allocation, cleanup
Attached is a patch that makes some cleanups and improvements to GiST, as well as a few semi-related cleanups. Changes: - previously, GiST did not make any use of the backend's memory context infrastructure. This made implementing indexes using GiST unnecessarily difficult and fragile: if your implementation of a GiST method leaked memory, this could result in a potentially large memory leak in PostgreSQL itself. This patch changes GiST so that all invocations of used-defined GiST methods are done within a short-lived memory context. In practice, this means GiST implementors need not use pfree, unless they are doing a _lot_ of allocations in their method implementations. As part of this work, memory contexts were also used to cleanup a lot of ugly and difficult to maintain memory management code in gist.c itself. QUESTION: given a ScanKey for an index scan, GiST overwrites the ScanKey's sk_func to point to the user-defined Consistent method (gistrescan() in gistscan.c). When doing a GiST index scan we can ensure that the sk_func is invoked in a short-lived memory context (gistfindnext() in gistget.c). Is it safe to assume that anywhere else in the backend that invokes the ScanKey's sk_func, it is done in a short-lived memory context? - previously, src/include/access/gist.h included both the "public" API that is intended for use by GiST implementors as well as declarations for the GiST implementation itself. I've created a new header file "gist_private.h" and moved the latter declarations there. - remove some "extern" keywords from function definitions in contrib/btree_gist, and do some more minor code cleanup there - minor GiST documentation cleanup - mark the array that indicates NULLs that is passed to index_formtuple() as 'const', and fix the resulting fallout - after doing an index build, there was previously a long comment and a few lines of code that used the # of index tuples and heap tuples (computed by the index build) to update the optimizer's statistics. This code was duplicated 4 times (once for each type of index); I added a function called IndexUpdateStats() in catalog/index.c and changed each index's build function to call that function. - in GiST, move the definition of a bunch of scope-local variables to the most tightly-enclosing scope. In other words, code like: { int a, b; a = ...; if (a) { b = ...; } } was changed to move the definition of "b" into the inner scope. This is per good style, IMHO (and it is consistent with the rest of the PostgreSQL source). - moved gistfreestack() to gistscan.c and made it "static", since it is only used there - cleaned up various other things in gist.c: added some comments, basically tried to impose some sanity Comments welcome. I would like to apply this to 8.1 at some point after we branch for 8.0. I thought I would post this patch to get some feedback before starting on further GiST work (speculatively, adding support for concurrency and making it WAL-safe). -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > Attached is a patch that makes some cleanups and improvements to GiST, > as well as a few semi-related cleanups. Changes: > ... > QUESTION: given a ScanKey for an index scan, GiST overwrites the > ScanKey's sk_func to point to the user-defined Consistent method > (gistrescan() in gistscan.c). When doing a GiST index scan we can ensure > that the sk_func is invoked in a short-lived memory context > (gistfindnext() in gistget.c). Is it safe to assume that anywhere else > in the backend that invokes the ScanKey's sk_func, it is done in a > short-lived memory context? I think you can assume that noplace else in the backend will invoke the sk_func, period. If it did, it'd be passing the wrong parameters --- the Consistent method doesn't take the same args as a plain indexable operator would. > - mark the array that indicates NULLs that is passed to > index_formtuple() as 'const', and fix the resulting fallout I'm a bit dubious about this, mainly because you did not likewise const-ify the other input arguments; it seems confusing to do a partial const-ification. > - after doing an index build, there was previously a long comment and a > few lines of code that used the # of index tuples and heap tuples > (computed by the index build) to update the optimizer's statistics. This > code was duplicated 4 times (once for each type of index); I added a > function called IndexUpdateStats() in catalog/index.c and changed each > index's build function to call that function. The only thing I don't like about this is that it's not apparent that the function would close the heap & index relations, so the calling code will now look like it's leaking the relation references. Maybe call it IndexCloseAndUpdateStats or something like that? regards, tom lane
On Fri, Nov 05, 2004 at 04:41:00PM +1100, Neil Conway wrote: > Comments welcome. I would like to apply this to 8.1 at some point after > we branch for 8.0. I thought I would post this patch to get some > feedback before starting on further GiST work (speculatively, adding > support for concurrency and making it WAL-safe). What are you going to base your concurrency work on? I wonder because I skimmed through Marcel Kornacker's theses, and for example it mentioned use of predicate locking (among other interesting things). Do you have any ideas yet on how to handle that? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Oh, oh, las chicas galacianas, lo harán por las perlas, ¡Y las de Arrakis por el agua! Pero si buscas damas Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)
Alvaro Herrera wrote: > What are you going to base your concurrency work on? I wonder because I > skimmed through Marcel Kornacker's theses, and for example it mentioned > use of predicate locking (among other interesting things). Do you have > any ideas yet on how to handle that? I've read a paper[1] on concurrency and recovery by Kornacker, C. Mohan and Hellerstein, which is presumably pretty close to Kornacker's thesis work (which I'll definitely read later). My understanding is that there's no need to implement predicate locking for GiST any more than for B+-trees -- it's a nice feature to prevent phantoms, but not a requirement for implementing page-level concurrency. So I haven't given any thought to implementing it. If I've misunderstood, please let me know. -Neil [1] http://db.cs.berkeley.edu/papers/sigmod97-gist.pdf
Tom Lane wrote: > I think you can assume that noplace else in the backend will invoke the > sk_func, period. Great, thanks for the info. >>- mark the array that indicates NULLs that is passed to >>index_formtuple() as 'const', and fix the resulting fallout > > I'm a bit dubious about this, mainly because you did not likewise > const-ify the other input arguments; it seems confusing to do a partial > const-ification. Well, "partial const-ification" is the rule rather than the exception in the backend right now. I'll take a look at adding more const qualifiers, but I don't really see why "partial const-ification" is confusing. > The only thing I don't like about this is that it's not apparent that > the function would close the heap & index relations, so the calling code > will now look like it's leaking the relation references. Maybe call it > IndexCloseAndUpdateStats or something like that? Good point -- I'll make that change. -Neil
Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> I'm a bit dubious about this, mainly because you did not likewise >> const-ify the other input arguments; it seems confusing to do a partial >> const-ification. > Well, "partial const-ification" is the rule rather than the exception in > the backend right now. I'll take a look at adding more const qualifiers, > but I don't really see why "partial const-ification" is confusing. In this particular case I think it's confusing because the Datum and nulls arrays are really two halves of a single data structure. Const-ifying just one of them obscures that fact. I'd be happy if you marked both of them const. regards, tom lane