Thread: GiST: memory allocation, cleanup

GiST: memory allocation, cleanup

From
Neil Conway
Date:
Attached is a patch that makes some cleanups and improvements to GiST,
as well as a few semi-related cleanups. Changes:

- previously, GiST did not make any use of the backend's memory context
infrastructure. This made implementing indexes using GiST unnecessarily
difficult and fragile: if your implementation of a GiST method leaked
memory, this could result in a potentially large memory leak in
PostgreSQL itself. This patch changes GiST so that all invocations of
used-defined GiST methods are done within a short-lived memory context.
In practice, this means GiST implementors need not use pfree, unless
they are doing a _lot_ of allocations in their method implementations.

As part of this work, memory contexts were also used to cleanup a lot of
ugly and difficult to maintain memory management code in gist.c itself.

QUESTION: given a ScanKey for an index scan, GiST overwrites the
ScanKey's sk_func to point to the user-defined Consistent method
(gistrescan() in gistscan.c). When doing a GiST index scan we can ensure
that the sk_func is invoked in a short-lived memory context
(gistfindnext() in gistget.c). Is it safe to assume that anywhere else
in the backend that invokes the ScanKey's sk_func, it is done in a
short-lived memory context?

- previously, src/include/access/gist.h included both the "public" API
that is intended for use by GiST implementors as well as declarations
for the GiST implementation itself. I've created a new header file
"gist_private.h" and moved the latter declarations there.

- remove some "extern" keywords from function definitions in
contrib/btree_gist, and do some more minor code cleanup there

- minor GiST documentation cleanup

- mark the array that indicates NULLs that is passed to
index_formtuple() as 'const', and fix the resulting fallout

- after doing an index build, there was previously a long comment and a
few lines of code that used the # of index tuples and heap tuples
(computed by the index build) to update the optimizer's statistics. This
code was duplicated 4 times (once for each type of index); I added a
function called IndexUpdateStats() in catalog/index.c and changed each
index's build function to call that function.

- in GiST, move the definition of a bunch of scope-local variables to
the most tightly-enclosing scope. In other words, code like:

{
    int a, b;

    a = ...;
    if (a)
    {
        b = ...;
    }
}

was changed to move the definition of "b" into the inner scope. This is
per good style, IMHO (and it is consistent with the rest of the
PostgreSQL source).

- moved gistfreestack() to gistscan.c and made it "static", since it is
only used there

- cleaned up various other things in gist.c: added some comments,
basically tried to impose some sanity

Comments welcome. I would like to apply this to 8.1 at some point after
we branch for 8.0. I thought I would post this patch to get some
feedback before starting on further GiST work (speculatively, adding
support for concurrency and making it WAL-safe).

-Neil


Attachment

Re: GiST: memory allocation, cleanup

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Attached is a patch that makes some cleanups and improvements to GiST,
> as well as a few semi-related cleanups. Changes:
> ...

> QUESTION: given a ScanKey for an index scan, GiST overwrites the
> ScanKey's sk_func to point to the user-defined Consistent method
> (gistrescan() in gistscan.c). When doing a GiST index scan we can ensure
> that the sk_func is invoked in a short-lived memory context
> (gistfindnext() in gistget.c). Is it safe to assume that anywhere else
> in the backend that invokes the ScanKey's sk_func, it is done in a
> short-lived memory context?

I think you can assume that noplace else in the backend will invoke the
sk_func, period.  If it did, it'd be passing the wrong parameters ---
the Consistent method doesn't take the same args as a plain indexable
operator would.

> - mark the array that indicates NULLs that is passed to
> index_formtuple() as 'const', and fix the resulting fallout

I'm a bit dubious about this, mainly because you did not likewise
const-ify the other input arguments; it seems confusing to do a partial
const-ification.

> - after doing an index build, there was previously a long comment and a
> few lines of code that used the # of index tuples and heap tuples
> (computed by the index build) to update the optimizer's statistics. This
> code was duplicated 4 times (once for each type of index); I added a
> function called IndexUpdateStats() in catalog/index.c and changed each
> index's build function to call that function.

The only thing I don't like about this is that it's not apparent that
the function would close the heap & index relations, so the calling code
will now look like it's leaking the relation references.  Maybe call it
IndexCloseAndUpdateStats or something like that?

            regards, tom lane

Re: GiST: memory allocation, cleanup

From
Alvaro Herrera
Date:
On Fri, Nov 05, 2004 at 04:41:00PM +1100, Neil Conway wrote:

> Comments welcome. I would like to apply this to 8.1 at some point after
> we branch for 8.0. I thought I would post this patch to get some
> feedback before starting on further GiST work (speculatively, adding
> support for concurrency and making it WAL-safe).

What are you going to base your concurrency work on?  I wonder because I
skimmed through Marcel Kornacker's theses, and for example it mentioned
use of predicate locking (among other interesting things).  Do you have
any ideas yet on how to handle that?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)


Re: GiST: memory allocation, cleanup

From
Neil Conway
Date:
Alvaro Herrera wrote:
> What are you going to base your concurrency work on?  I wonder because I
> skimmed through Marcel Kornacker's theses, and for example it mentioned
> use of predicate locking (among other interesting things).  Do you have
> any ideas yet on how to handle that?

I've read a paper[1] on concurrency and recovery by Kornacker, C. Mohan
and Hellerstein, which is presumably pretty close to Kornacker's thesis
work (which I'll definitely read later). My understanding is that
there's no need to implement predicate locking for GiST any more than
for B+-trees -- it's a nice feature to prevent phantoms, but not a
requirement for implementing page-level concurrency. So I haven't given
any thought to implementing it. If I've misunderstood, please let me know.

-Neil

[1] http://db.cs.berkeley.edu/papers/sigmod97-gist.pdf

Re: GiST: memory allocation, cleanup

From
Neil Conway
Date:
Tom Lane wrote:
> I think you can assume that noplace else in the backend will invoke the
> sk_func, period.

Great, thanks for the info.

>>- mark the array that indicates NULLs that is passed to
>>index_formtuple() as 'const', and fix the resulting fallout
>
> I'm a bit dubious about this, mainly because you did not likewise
> const-ify the other input arguments; it seems confusing to do a partial
> const-ification.

Well, "partial const-ification" is the rule rather than the exception in
the backend right now. I'll take a look at adding more const qualifiers,
but I don't really see why "partial const-ification" is confusing.

> The only thing I don't like about this is that it's not apparent that
> the function would close the heap & index relations, so the calling code
> will now look like it's leaking the relation references.  Maybe call it
> IndexCloseAndUpdateStats or something like that?

Good point -- I'll make that change.

-Neil

Re: GiST: memory allocation, cleanup

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Tom Lane wrote:
>> I'm a bit dubious about this, mainly because you did not likewise
>> const-ify the other input arguments; it seems confusing to do a partial
>> const-ification.

> Well, "partial const-ification" is the rule rather than the exception in
> the backend right now. I'll take a look at adding more const qualifiers,
> but I don't really see why "partial const-ification" is confusing.

In this particular case I think it's confusing because the Datum and
nulls arrays are really two halves of a single data structure.
Const-ifying just one of them obscures that fact.  I'd be happy if you
marked both of them const.

            regards, tom lane