Re: Copy data to DSA area - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Copy data to DSA area
Date
Msg-id CA+hUKGKWcWrGave6=_sQ7kWPEKDgFAe7KN9owcVbaSeGwJiieg@mail.gmail.com
Whole thread Raw
In response to RE: Copy data to DSA area  ("Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com>)
Responses Re: Copy data to DSA area  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Tue, Jun 25, 2019 at 12:53 AM Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> I've rebased the patch to fit the core code rather than extension.
> Regarding shared memory context (ShmContext), I added three
> APIs:
> - CreatePermShmContext
>   create "permanent" context located in shared memory
> - CreateTempShmContext
>   Create "temporary" context located in local memory,
>   Which has buffer to keep track of possible memory leak objects
> - ChangeToPermShmContext
>   Change allocated objects parent to permanent context
>
> When you allocate something, add an object in the Temp ShmContext,
> and re-parent it to Perm ShmContext after it becomes not leaked.

Hi Ideriha-san,

Thanks for working on this!  It's important for us to figure out how
to share caches, so we can make better use of memory.  I think there
are two things going on in this prototyping work:

1.  The temporary/permanent concept.  I think the same problem will
come up in the future when we eventually have a multi-thread model.

2.  Getting a MemoryContext-compatible allocator to work with the
current multi-process model.

So, I'm wondering if we can find a more generic way to do your
ChangeToPermShmContext thing with an API that isn't tied to this
implementation.  Another thing to consider is that we probably still
need an intermediate level of contexts, to hold individual complex
long-lived objects like (say) cached plans.

What do you think about the following?  Even though I know you want to
start with much simpler kinds of cache, I'm looking ahead to the lofty
end-goal of having a shared plan cache.  No doubt, that involves
solving many other problems that don't belong in this thread, but
please indulge me:

1.  We could provide a way to ask any MemoryContext implementation to
create a new context of the same type and set its parent to any
parent, so that it will be automatically deleted when that parent is
deleted.  For example MemoryContextClone(SharedPlanCache,
some_short_lived_context).  That gives you a memory context that will
be deleted with some_short_lived_context (unless you reparent it or
delete it first), but it's of the same type as SharedPlanCache and (in
this particular case) allocates from the same underlying DSA area.

In some hypothetical future, SharedPlanCache might use a different
memory context implementation depending on whether you selected a
multi-process or multi-thread build, but when you call
MemoryContextClone(SharedPlanCache, ...) you don't know or care about
that, it just has to be some implementation that supports cloning.

2.  One you have finished constructing (say) a plan in that memory
context, you can make it 'permanent' by simply calling
MemoryContextSetParent(temp, SharedPlanCache).  That follows existing
practice for how we put stuff into the existing backend-private cache
context.  Otherwise, it is deleted when its current parent is deleted.

3.  Following existing practice, the cached plan needs to know the
context that it lives in, so that DropCachedPlan() can delete the
whole plan easily according to your LRU scheme (or whatever -- a topic
for another thread).

One of the implications of the above ideas is that the MemoryContext
object itself has to be in shared memory, and the management of
parent/child links needs to be protected by locks for parts of the
context graph that are shared.  And probably there are other
complications in this area.  I don't claim that any of this is easy,
or necessarily the right way to do it.  But I'm pretty sure these
problems are not artifacts to the multi-process/shared memory model,
they're due to sharing and they'll still be waiting for us in the
multi-threaded future!

+    /* To avoid double free by ShmContextDelete, remove its reference */
+    for (buf = shmContext->temp_buffer; buf != NULL; buf = buf->next)
+    {
+        for (idx = 0; idx < buf->idx; idx++)
+        {
+            if (buf->chunks[idx] == dp)
+            {
+                buf->chunks[idx] = 0;
+                break;
+            }
+        }
+    }

Hmm.  I wonder if we should just make ShmContextFree() do nothing! And
make ShmContextAlloc() allocate (say) 8KB chunks (or larger if needed
for larger allocation) and then hand out small pieces from the
'current' chunk as needed.  Then the only way to free memory is to
destroy contexts, but for the use case being discussed, that might
actually be OK.  I suppose you'd want to call this implementation
something different, like ShmRegionContext, ShmZoneContext or
ShmArenaContext[1].  That's a fairly specialised type of memory
allocator, but it seems like it would suit our usage pattern here: you
build objects to cache them, and then destroy them all at once.  This
would only be efficient if the code that you call to build (or, I
guess, copy) plans doesn't do much palloc/free of temporary stuff,
which would leave dead junk in our context if we did it that way.

Earlier I said that it would probably be OK to use backend-local
memory to track all the objects that need to be freed on error,
because I was considering only the problem of preventing leaks on
error.  I wasn't thinking about point 3 above.  We eventually need to
delete the cached object (plan etc) from the cache (eg
DropCachedPlan()), and so we still need to have all its allocations
together in one place as a context, and that control data needs to be
accessible to other processes.  So in the ShmZoneContext idea, there
would need to be a chunk list allocated in shared memory for eventual
freeing.

[1] https://en.wikipedia.org/wiki/Region-based_memory_management

--
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: FETCH FIRST clause PERCENT option
Next
From: David Fetter
Date:
Subject: Re: range_agg