Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date
Msg-id 9144.1317402482@sss.pgh.pa.us
Whole thread Raw
In response to Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
List pgsql-hackers
I wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
>> Isn't it possible to cache signature of newitem in gtrgm_penalty
>> like gtrgm_consistent do this for query?

> [ studies that code for awhile ... ]  Ick, what a kluge.

> The main problem with that code is that the cache data gets leaked at
> the conclusion of a scan.  Having just seen the consequences of leaking
> the "giststate", I think this is something we need to fix not emulate.

> I wonder whether it's worth having the GIST code create a special
> scan-lifespan (or insert-lifespan) memory context that could be used
> for cached data such as this?  It's already creating a couple of
> contexts for its own purposes, so one more might not be a big problem.
> We'd have to figure out a way to make that context available to GIST
> support functions, though, as well as something cleaner than fn_extra
> for them to keep pointers in.

I've been chewing on this for awhile and am now thinking that maybe what
gtrgm_consistent is doing isn't that unreasonable.  It's certainly
legitimate for it to use fn_extra to maintain state between calls.
The problem fundamentally is that when a function uses fn_extra to
reference data it keeps in the fn_mcxt context, there's an implicit
contract that fn_extra will survive for the same length of time that the
fn_mcxt context does.  (Otherwise there's no way for the function to
avoid leaking that data after it's been called for the last time using
that FmgrInfo.)  And GIST is violating that assumption: it resets
fn_extra when it does initGISTstate, but fn_mcxt gets set to
CurrentMemoryContext, which in the problematic cases is a query-lifespan
context that will still be around after the GIST indexscan is concluded.

So what I'm thinking we ought to do is redefine things so that
initGISTstate sets fn_mcxt to a context that has the same lifespan as
the GISTSTATE itself does.  We could possibly eliminate a few retail
pfree's in the process, eg by keeping the GISTSTATE itself in that same
context.

Having done that, what gtrgm_consistent is doing would be an officially
supported (dare I suggest even documented?) thing instead of a kluge,
and we could then adopt the same methodology in gtrgm_penalty.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [v9.2] Fix Leaky View Problem
Next
From: Jamie Fox
Date:
Subject: Re: Mismatch of relation names: pg_toast.pg_toast_nnn during pg_upgrade from 8.4 to 9.1