Re: Lifecycle management - Mailing list pgsql-hackers

From Thomas Hallgren
Subject Re: Lifecycle management
Date
Msg-id 435A9C68.5050509@tada.se
Whole thread Raw
In response to Re: Lifecycle management  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Lifecycle management
List pgsql-hackers
Martijn van Oosterhout wrote:
> On Sat, Oct 22, 2005 at 01:49:16PM +0200, Thomas Hallgren wrote:
>   
>> PL/Java has gone through a series of stability improvements over the 
>> last couple of weeks. Now it's time to perhaps improve things even more 
>> but that requires a little help from PostgreSQL itself (nothing related 
>> to threads though ;-) )
>>
>> PL/Java has various "wrapper" objects for PostgreSQL structures. In 
>> essence, such a wrapper holds on to a pointer to some structure and 
>> dispatch calls to backend functions. The challenge is to make sure that 
>> the wrapped pointer is valid at all times. PL/Java uses three distinct 
>> approaches to accomplish this:
>>     
>
> Im curious. What objects are you holding pointers to where you don't
> know how long the lifetime is? The backend has pretty clear rules about
> how long something lives for.
>
>   
I guess some of my questions originate in lack of knowledge about the 
rules you mention. I haven't been able to find documentation that 
explains them thoroughly and I haven't been able to fully deduct it from 
looking at the backend code (partly due to my own laziness perhaps). 
Another reason is that I'm trying to marry two ways of handling object 
life cycle, the Java style using a garbage collector and the backend 
style, stacking MemoryContext's. I want the marriage to be somewhat 
generic and resilient to change.

Let's assume that one Java function executes a query through SPI. The 
query in itself calls another Java function that returns SET OF <complex 
type>. Each tuple returned from this query could potentially be used 'as 
is' in the caller, i.e. the inner Java function could use the same 
wrapper instance as the caller Java function if I had full control over 
the life cycle of the HeapTuple's that are passed on. At present, I copy 
those tuples and use different wrappers.
> Adding callbacks is going to be a pain, primarily because (AIUI) most
> structures not explicitly deallocated but simply dropped when the
> memory context is freed. Hence, no callbacks can be called because not
> even the backend knows exactly when the object in question is not
> valid. To do so would require registration of every object with its
> associated memory context is destroyed, just so we can call them. The
> whole point of the current memory management is to avoid that sort of
> overhead.
>   
OK, I suspected that. My hope was that functions like heap_freetuple() 
was guaranteed to be called when a tuple was freed up. I realize that 
deleting or resetting a MemoryContext makes such calls unnecessary.
> The only other possibility would be to hook into the memory management
> itself so you can called when the context is reset. Except you still
> don't know the objects in it...
>
>   
I have experimented with code that does this. I extend existing contexts 
by swapping function pointers, installing interceptors for certain 
calls. I can for instance extend the alloc method with something that 
creates a double-linked list by which I can keep track of the objects 
that are allocated and a pointer to the associated wrapper. When the 
context is destroyed or reset, I can traverse that list. Trouble is, 
when I get hold of the context it's already too late since some objects 
have been allocated already and the context doesn't expose a method that 
allows me to iterate over it's objects.

If I knew that all objects that I look at indeed are allocated in a 
MemoryContexts and not on the stack or as a part of the allocation of 
another object, then I could make assumptions that would enable a 
generic and safe way of doing this. From my experience though, I can't 
make such assumptions.

Another concern is of course that replacing function pointers in memory 
contexts seems a bit dangerous overall. It violates the separation of 
concern between my module and the backend way more than I'm comfortable 
with. If there was a mechanism by which I could influence what kind of 
context that should be used for the query nodes etc. things would be 
different of course. Then again, what happens if such a mechanism 
existed and several different PL's wanted to influence that in their own 
ways.
>> - I'd like to know when the return value of a function goes out of 
>> scope. "call-local" is often premature since the structure might survive 
>> and be used in the calling function (which may be Java also).
>>     
>
> When it comes to plan execution, at each node the tuple is returned is
> assumed to valid until the next call to that node. If a node further up
> wants to keep it longer (eg Sort node), only then does it need to be
> copied. I don't know what that means in your context.
>   
Nothing probably since I always copy such nodes and keep them until the 
finalizer is called that destroys the wrapper. It would be nice though, 
if the original producer of the tuple could be told to allocate it in a 
designated context from the very start and then *never* free it up. That 
way, PL/Java would assume full responsibility for the object destruction 
and no copying would be necessary. Today, a HeapTuple that is returned 
seems to be freed-up by either calls to heap_freetuple or by destroying 
the context in which it was allocated.
>   
>> Hmm, and the HeapTupleHeader that is passed to RECORD functions, is 
>> there an easy way to transform that into a HeapTuple?
>>     
>
> HeapTupleHeader is a pointer to HeapTupleHeaderData, ie the actual
> data. HeapTuple is a pointer to HeapTupleData which contains a
> HeapTupleHeader and info about the memory context and such. You really
> only deal with the latter unless you're extracting data...
>
> More info would make things a lot clearer.
>   
The primary reason for my desire to wrap the HeapTupleHeader in a fully 
fledged HeapTuple is a) then I can call the heap_copytuple to get a safe 
durable copy and b) I don't need two different wrapper objects (AFAIK, 
there is no heap_copytupleheader function).

Again, I need advice. I'm not fully aware of all the semantics involved, 
how memory contexts are allocated and destroyed, what objects that can 
be trusted to originate from memory contexts etc. Pointers to doc's or 
code that makes this clearer will help a great deal.

Kind Regards,
Thomas Hallgren



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Question about Ctrl-C and less
Next
From: Anuj Tripathi
Date:
Subject: Query Progress Estimator