Home > mailing lists

Thread: Copy data to DSA area

Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

07 November 2018, 02:33:47

Hi,

Related to my development (putting relcache and catcache onto shared memory)[1],

I have some questions about how to copy variables into shared memory, especially DSA area, and its implementation way.

Under the current architecture when initializing some data, we sometimes copy certain data using some specified functions

such as CreateTupleDescCopyConstr(), datumCopy(), pstrdup() and so on. These copy functions calls palloc() and allocates the

copied data into current memory context.

But on the shared memory, palloc() cannot be used. Now I'm trying to use DSA (and dshash) to handle data on the shared memory

so for example dsa_allocate() is needed instead of palloc(). I hit upon three ways to implementation.

A. Copy existing functions and write equivalent DSA version copy functions like CreateDSATupleDescCopyConstr(),

datumCopyDSA(), dsa_strdup()

In these functions the logic is same as current one but would be replaced palloc() with dsa_allocate().

But this way looks too straight forward and code becomes less readable and maintainable.

B. Using current functions and copy data on local memory context temporarily and copy it again to DSA area.

This method looks better compared to the plan A because we don't need to write clone functions with copy-paste.

But copying twice would degrade the performance.

C. Enhance the feature of palloc() and MemoryContext.

This is a rough idea but, for instance, make a new global flag to tell palloc() to use DSA area instead of normal MemoryContext.

MemoryContextSwitchToDSA(dsa_area *area) indicates following palloc() to allocate memory to DSA.

And MemoryContextSwitchBack(dsa_area) indicates to palloc is used as normal one.

MemoryContextSwitchToDSA(dsa_area);

palloc(size);

MemoryContextSwitchBack(dsa_area);

Plan C seems a handy way for DSA user because people can take advantage of existing functions.

What do you think about these ideas?

[1]

https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A567B9245%40G01JPEXMBKW04

Regards,

Takeshi Ideriha

Re: Copy data to DSA area

From

Thomas Munro

Date:

07 November 2018, 04:35:18

On Wed, Nov 7, 2018 at 3:34 PM Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> Related to my development (putting relcache and catcache onto shared memory)[1],
>
> I have some questions about how to copy variables into shared memory, especially DSA area, and its implementation
way.
>
> Under the current architecture when initializing some data, we sometimes copy certain data using some specified
functions
>
> such as CreateTupleDescCopyConstr(), datumCopy(), pstrdup() and so on. These copy functions calls palloc() and
allocatesthe

>
> copied data into current memory context.

Yeah, I faced this problem in typcache.c, and you can see the function
share_typledesc() which copies TupleDesc objects into a DSA area.
This doesn't really address the fundamental problem here though... see
below.

> But on the shared memory, palloc() cannot be used. Now I'm trying to use DSA (and dshash) to handle data on the
sharedmemory

>
> so for example dsa_allocate() is needed instead of palloc(). I hit upon three ways to implementation.
>
> A. Copy existing functions and write equivalent DSA version copy functions like CreateDSATupleDescCopyConstr(),
>
>    datumCopyDSA(), dsa_strdup()
>
>    In these functions the logic is same as current one but would be replaced palloc() with dsa_allocate().
>
>    But this way looks too straight forward and code becomes less readable and maintainable.
>
> B. Using current functions and copy data on local memory context temporarily and copy it again to DSA area.
>
>    This method looks better compared to the plan A because we don't need to write clone functions with copy-paste.
>
>    But copying twice would degrade the performance.

It's nice when you can construct objects directly at an address
supplied by the caller.  In other words, if allocation and object
initialisation are two separate steps, you can put the object anywhere
you like without copying.  That could be on the stack, in an array,
inside another object, in regular heap memory, in traditional shared
memory, in a DSM segment or in DSA memory.  I asked for an alloc/init
separation in the Bloom filter code for the same reason.  But this
still isn't the real problem here...

> C. Enhance the feature of palloc() and MemoryContext.
>
>    This is a rough idea but, for instance, make a new global flag to tell palloc() to use DSA area instead of normal
MemoryContext.
>
>    MemoryContextSwitchToDSA(dsa_area *area) indicates following palloc() to allocate memory to DSA.
>
>    And MemoryContextSwitchBack(dsa_area) indicates to palloc is used as normal one.
>
>    MemoryContextSwitchToDSA(dsa_area);
>
>    palloc(size);
>
>    MemoryContextSwitchBack(dsa_area);
>
> Plan C seems a handy way for DSA user because people can take advantage of existing functions.

The problem with plan C is that palloc() has to return a native
pointer, but in order to do anything useful with this memory (ie to
share it) you also need to get the dsa_pointer, but the palloc()
interface doesn't allow for that.  Any object that holds a pointer
allocated with DSA-hiding-behind-palloc() will be useless for another
process.

> What do you think about these ideas?

The real problem is object graphs with pointers.  I solved that
problem for TupleDesc by making them *almost* entirely flat, in commit
c6293249.  I say 'almost' because it won't work for constraints or
defaults, but that didn't matter for the typcache.c case because it
doesn't use those.  In other words I danced carefully around the edge
of the problem.

In theory, object graphs, trees, lists etc could be implemented in a
way that allows for "flat" storage, if they can be allocated in
contiguous memory and refer to sub-objects within that space using
offsets from the start of the space, and then they could be used
without having to know whether they are in DSM/DSA memory or regular
memory.  That seems like a huge project though.  Otherwise they have
to hold dsa_pointer, and deal with that in many places.  You can see
this in the Parallel Hash code.  I had to make the hash table data
structure able to deal with raw pointers OR dsa_pointer.  That's would
be theoretically doable, but really quite painful, for the whole
universe of PostgreSQL node types and data structures.

I know of 3 ideas that would make your idea C work, so that you could
share something as complicated as a query plan directly without having
to deserialise it to use it:

1.  Allow the creation of DSA areas inside the traditional fixed
memory segment (instead of DSM), in a fixed-sized space reserved by
the postmaster.  That is, use dsa.c's ability to allocate and free
memory, and possibly free a whole area at once to avoid leaking memory
in some cases (like MemoryContexts), but in this mode dsa_pointer
would be directly castable to a raw pointer.  Then you could provide a
regular MemoryContext interface for it, and use it via palloc(), as
you said, and all the code that knows how to construct lists and trees
and plan nodes etc would All Just Work.  It would be your plan C, and
all the pointers would be usable in every process, but limited in
total size at start-up time.

2.  Allow regular DSA in DSM to use raw pointers into DSM segments, by
mapping segments at the same address in every backend.  This involves
reserving a large virtual address range up front in the postmaster,
and then managing the space, trapping SEGV to map/unmap segments into
parts of that address space as necessary (instead of doing that in
dsa_get_address()).  AFAIK that would work, but it seems to be a bit
weird to go to such lengths.  It would be a kind of home-made
simulation of threads.  On the other hand, that is what we're already
doing in dsa.c, except more slowly due to extra software address
translation from funky pseudo-addresses.

3.  Something something threads.

-- 
Thomas Munro
http://www.enterprisedb.com

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

09 November 2018, 01:19:25

Hi, thank you for all the comment. 
It's really helpful.

>From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
>Sent: Wednesday, November 7, 2018 1:35 PM
>
>On Wed, Nov 7, 2018 at 3:34 PM Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com>
>wrote:
>> Related to my development (putting relcache and catcache onto shared
>> memory)[1],
>>
>> I have some questions about how to copy variables into shared memory, especially
>DSA area, and its implementation way.
>>
>> Under the current architecture when initializing some data, we
>> sometimes copy certain data using some specified functions
>>
>> such as CreateTupleDescCopyConstr(), datumCopy(), pstrdup() and so on.
>> These copy functions calls palloc() and allocates the
>>
>> copied data into current memory context.
>
>Yeah, I faced this problem in typcache.c, and you can see the function
>share_typledesc() which copies TupleDesc objects into a DSA area.
>This doesn't really address the fundamental problem here though... see below.

I checked share_tupledesc(). My original motivation came from copying tupledesc 
with constraint like CreateTupleDescCopyConstr().
But yes, as you stated here and bellow without having to copy
tupledesc constraint, this method makes sense.

>
>> But on the shared memory, palloc() cannot be used. Now I'm trying to
>> use DSA (and dshash) to handle data on the shared memory
>>
>> so for example dsa_allocate() is needed instead of palloc(). I hit upon three ways to
>implementation.
>>
>> A. Copy existing functions and write equivalent DSA version copy
>> functions like CreateDSATupleDescCopyConstr(),
>>
>>    datumCopyDSA(), dsa_strdup()
>>
>>    In these functions the logic is same as current one but would be replaced palloc()
>with dsa_allocate().
>>
>>    But this way looks too straight forward and code becomes less readable and
>maintainable.
>>
>> B. Using current functions and copy data on local memory context temporarily and
>copy it again to DSA area.
>>
>>    This method looks better compared to the plan A because we don't need to write
>clone functions with copy-paste.
>>
>>    But copying twice would degrade the performance.
>
>It's nice when you can construct objects directly at an address supplied by the caller.
>In other words, if allocation and object initialization are two separate steps, you can
>put the object anywhere you like without copying.  That could be on the stack, in an
>array, inside another object, in regular heap memory, in traditional shared memory, in
>a DSM segment or in DSA memory.  I asked for an alloc/init separation in the Bloom
>filter code for the same reason.  But this still isn't the real problem here...

Yes, actually I tried to create a new function TupleDescCopyConstr() which is almost same 
as TupleDescCopy() except also copying constraints. This is supposed to separate allocation 
and initialization. But as you pointed out bellow, I had to manage object graph with pointes 
and faced the difficulty.

>> C. Enhance the feature of palloc() and MemoryContext.
>>
>>    This is a rough idea but, for instance, make a new global flag to tell palloc() to
>use DSA area instead of normal MemoryContext.
>>
>>    MemoryContextSwitchToDSA(dsa_area *area) indicates following palloc() to
>allocate memory to DSA.
>>
>>    And MemoryContextSwitchBack(dsa_area) indicates to palloc is used as normal
>one.
>>
>>    MemoryContextSwitchToDSA(dsa_area);
>>
>>    palloc(size);
>>
>>    MemoryContextSwitchBack(dsa_area);
>>
>> Plan C seems a handy way for DSA user because people can take advantage of
>existing functions.
>
>The problem with plan C is that palloc() has to return a native pointer, but in order to
>do anything useful with this memory (ie to share it) you also need to get the
>dsa_pointer, but the palloc() interface doesn't allow for that.  Any object that holds a
>pointer allocated with DSA-hiding-behind-palloc() will be useless for another process.

Agreed. I didn't have much consideration on this point.

>> What do you think about these ideas?
>
>The real problem is object graphs with pointers.  I solved that problem for TupleDesc
>by making them *almost* entirely flat, in commit c6293249.  I say 'almost' because it
>won't work for constraints or defaults, but that didn't matter for the typcache.c case
>because it doesn't use those.  In other words I danced carefully around the edge of
>the problem.
>
>In theory, object graphs, trees, lists etc could be implemented in a way that allows for
>"flat" storage, if they can be allocated in contiguous memory and refer to sub-objects
>within that space using offsets from the start of the space, and then they could be used
>without having to know whether they are in DSM/DSA memory or regular memory.
>That seems like a huge project though.  Otherwise they have to hold dsa_pointer, and
>deal with that in many places.  You can see this in the Parallel Hash code.  I had to
>make the hash table data structure able to deal with raw pointers OR dsa_pointer.
>That's would be theoretically doable, but really quite painful, for the whole universe of
>PostgreSQL node types and data structures.

Yeah, thank you for summarizing this point. That's one of the difficulty I've faced with 
but failed to state it in my previous email. 
I agreed your point.
Making everything "flat" is not a realistic choice and holding dsa_pointer here and there 
and switching native pointer or dsa_pointer using union type or other things can be done but still huge one.


>I know of 3 ideas that would make your idea C work, so that you could share
>something as complicated as a query plan directly without having to deserialise it to
>use it:
>
>1.  Allow the creation of DSA areas inside the traditional fixed memory segment
>(instead of DSM), in a fixed-sized space reserved by the postmaster.  That is, use
>dsa.c's ability to allocate and free memory, and possibly free a whole area at once to
>avoid leaking memory in some cases (like MemoryContexts), but in this mode
>dsa_pointer would be directly castable to a raw pointer.  Then you could provide a
>regular MemoryContext interface for it, and use it via palloc(), as you said, and all the
>code that knows how to construct lists and trees and plan nodes etc would All Just
>Work.  It would be your plan C, and all the pointers would be usable in every process,
>but limited in total size at start-up time.
>
>2.  Allow regular DSA in DSM to use raw pointers into DSM segments, by mapping
>segments at the same address in every backend.  This involves reserving a large
>virtual address range up front in the postmaster, and then managing the space,
>trapping SEGV to map/unmap segments into parts of that address space as necessary
>(instead of doing that in dsa_get_address()).  AFAIK that would work, but it seems to
>be a bit weird to go to such lengths.  It would be a kind of home-made simulation of
>threads.  On the other hand, that is what we're already doing in dsa.c, except more
>slowly due to extra software address translation from funky pseudo-addresses.
>
>3.  Something something threads.

I'm thinking to go with plan 1. No need to think about address translation
seems tempting. Plan 2 (as well as plan 3) looks a big project.

Regards,
Takeshi Ideriha

Re: Copy data to DSA area

From

Thomas Munro

Date:

09 November 2018, 02:04:39

On Fri, Nov 9, 2018 at 2:19 PM Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
> >I know of 3 ideas that would make your idea C work, so that you could share
> >something as complicated as a query plan directly without having to deserialise it to
> >use it:
> >
> >1.  Allow the creation of DSA areas inside the traditional fixed memory segment
> >(instead of DSM), in a fixed-sized space reserved by the postmaster.  That is, use
> >dsa.c's ability to allocate and free memory, and possibly free a whole area at once to
> >avoid leaking memory in some cases (like MemoryContexts), but in this mode
> >dsa_pointer would be directly castable to a raw pointer.  Then you could provide a
> >regular MemoryContext interface for it, and use it via palloc(), as you said, and all the
> >code that knows how to construct lists and trees and plan nodes etc would All Just
> >Work.  It would be your plan C, and all the pointers would be usable in every process,
> >but limited in total size at start-up time.
> >
> >2.  Allow regular DSA in DSM to use raw pointers into DSM segments, by mapping
> >segments at the same address in every backend.  This involves reserving a large
> >virtual address range up front in the postmaster, and then managing the space,
> >trapping SEGV to map/unmap segments into parts of that address space as necessary
> >(instead of doing that in dsa_get_address()).  AFAIK that would work, but it seems to
> >be a bit weird to go to such lengths.  It would be a kind of home-made simulation of
> >threads.  On the other hand, that is what we're already doing in dsa.c, except more
> >slowly due to extra software address translation from funky pseudo-addresses.
> >
> >3.  Something something threads.
>
> I'm thinking to go with plan 1. No need to think about address translation
> seems tempting. Plan 2 (as well as plan 3) looks a big project.

The existing function dsa_create_in_place() interface was intended to
support that, but has never been used in that way so I'm not sure what
extra problems will come up.  Here are some assorted thoughts:

* You can prevent a DSA area from creating extra DSM segments, so that
it is constrained to stay entirely in the space you give it, by
calling dsa_set_size_limit(area, size) using the same size that you
gave to dsa_create_in_place(); now you have a DSA area that manages a
single fixed-sized chunk of memory that you gave it, in your case
inside the traditional shared memory segment (but it could be
anywhere, including inside a DSM segment or another DSA area!)

* You can probably write a MemoryContext wrapper for it, if it has
only one segment that is in the traditional shared memory segment.
You would need to do very simple kind of address translation: the
result from palloc() needs to be base + dsa_allocate()'s result, and
the argument to pfree() needs to be subtracted from base when
dsa_free() is called.  That is a version of your idea C that should
work AFAIK.

* Once you have that working, you now have a new kind of resource
management problem on your hands: memory leaks will be cluster-wide
and cluster-life-time!  That's hard, because the goal is to be able to
use arbitrary code in the tree that deals with plans etc, but that
code all assumes that it can "throw" (elog()) on errors.  PostgreSQL C
is generally "garbage collected" (in a way), but in this sketch, that
doesn't work anymore: this area *never* goes out of scope and gets
cleaned up.  Generally, languages with exceptions either need garbage
collection or scoped destructors to clean up the mess, but in this
sketch we don't have that anymore...  much like allocating stuff in
TopMemoryContext, except worse because it doesn't go away when one
backend exits.

* I had some ideas about some kind of "allocation rollback" interface:
you begin an "allocation transaction", allocate a bunch of stuff
(perhaps indirectly, by calling some API that makes query plans or
whatever and is totally unaware of this stuff).  Then if there is an
error, whatever was allocated so far is freed in the usual cleanup
paths by a rollback that happens via the resource manager machinery.
If you commit, then the allocation becomes permanent.  Then you only
commit stuff that you promise not to leak (perhaps stuff that has been
added to a very carefully managed cluster-wide plan cache).  I am not
sure of the details, and this might be crazy...

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Copy data to DSA area

From

Robert Haas

Date:

12 November 2018, 20:45:28

On Thu, Nov 8, 2018 at 9:05 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> * I had some ideas about some kind of "allocation rollback" interface:
> you begin an "allocation transaction", allocate a bunch of stuff
> (perhaps indirectly, by calling some API that makes query plans or
> whatever and is totally unaware of this stuff).  Then if there is an
> error, whatever was allocated so far is freed in the usual cleanup
> paths by a rollback that happens via the resource manager machinery.
> If you commit, then the allocation becomes permanent.  Then you only
> commit stuff that you promise not to leak (perhaps stuff that has been
> added to a very carefully managed cluster-wide plan cache).  I am not
> sure of the details, and this might be crazy...

Hmm, my first thought was that you were just reinventing memory
contexts, but it's really not quite the same thing, because you want
the allocations to end up owned by a long-lived context when you
succeed but a transient context when you fail.  Still, if it weren't
for the fact that the memory context interface is hostile to dynamic
shared memory's map-this-anywhere semantics, I suspect we'd try to
find a way to make memory contexts fit the bill, maybe by reparenting
contexts or even individual allocations.  You could imagine using the
sorts of union algorithms that are described in
https://en.wikipedia.org/wiki/Disjoint-set_data_structure to get very
low asymptotic complexity here.

I wonder if it's possible to rethink our current memory context
machinery so that it is not so DSM-hostile.  At one point, I had the
idea of replacing the pointer in the chunk header with an
array-offset, which might also avoid repeatedly creating and
destroying AllocSetContext objects over and over at high speed.  Or
you could come up with some intermediate idea: if the value there is
MAXALIGN'd, it's a pointer; if not, it's some other kind of identifier
that you have to go look up in a table someplace to find the real
context.

Part of the problem here is that, on the one hand, it's really useful
that all memory management in PostgreSQL currently uses a single
system: memory contexts.  I'd be loathe to change that.  On the other
hand, there are several different allocation patterns which have
noticeably different optimal strategies:

1. allocate for a while and then free everything at once => allocate
from a single slab
2. allocate and free for a while and then free anything that's left =>
allocate from multiple slabs, dividing allocations by size class
3. perform a short series of allocations, freeing everything if we hit
an error midway through => keep an allocation log, roll back by retail
frees
4. like (3) but with a long series of allocations => allocate from a
slab, free the whole slab on error

Our current AllocSetContext is not optimal for any of these
situations.  It uses size classes, but they are rather coarse-grained,
which wastes a lot of memory.  They also have big headers, which also
wastes a lot of memory -- unnecessarily, in the case where we don't
need to free anything until the end.  The main advantage of
AllocSetAlloc is that it's very fast for small contexts while still
being able to support individual freeing individual allocations when
necessary, though not efficiently.

I'm rambling here, I guess...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Copy data to DSA area

From

Thomas Munro

Date:

13 November 2018, 00:43:24

On Tue, Nov 13, 2018 at 9:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 8, 2018 at 9:05 PM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > * I had some ideas about some kind of "allocation rollback" interface:
> > you begin an "allocation transaction", allocate a bunch of stuff
> > (perhaps indirectly, by calling some API that makes query plans or
> > whatever and is totally unaware of this stuff).  Then if there is an
> > error, whatever was allocated so far is freed in the usual cleanup
> > paths by a rollback that happens via the resource manager machinery.
> > If you commit, then the allocation becomes permanent.  Then you only
> > commit stuff that you promise not to leak (perhaps stuff that has been
> > added to a very carefully managed cluster-wide plan cache).  I am not
> > sure of the details, and this might be crazy...
>
> Hmm, my first thought was that you were just reinventing memory
> contexts, but it's really not quite the same thing, because you want
> the allocations to end up owned by a long-lived context when you
> succeed but a transient context when you fail.  Still, if it weren't
> for the fact that the memory context interface is hostile to dynamic
> shared memory's map-this-anywhere semantics, I suspect we'd try to
> find a way to make memory contexts fit the bill, maybe by reparenting
> contexts or even individual allocations.  You could imagine using the
> sorts of union algorithms that are described in
> https://en.wikipedia.org/wiki/Disjoint-set_data_structure to get very
> low asymptotic complexity here.

Yeah, I was imagining something that still works with MemoryContexts.
Interesting idea re: unions.  I haven't got as far as thinking about
how you'd actually make that work.  But I think we're both describing
the same general kind of semantics; you need to be able to build stuff
with automatic clean-up on non-local exit, but also keep that stuff
for longer once you decide you're ready.

Anyway, avoiding all the hard questions about new kinds of foot gun
for now, here is a stupid hack that shows a DSA area inside the
traditional fixed-address shared memory region, wrapped in a custom
(and somewhat defective for now) MemoryContext.  It shows a "List"
being manipulated by standard PostgreSQL list code that is entirely
unaware that it is working in shared memory:

postgres=# call hoge_list(3);
NOTICE:  Contents of list:
NOTICE:   1
NOTICE:   2
NOTICE:   3
CALL

You can call that procedure from various different backends to add and
remove integers from a List.  The point is that it could just as
easily be a query plan or any other palloc-based stuff in our tree,
and the pointers work in any backend.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

0001-Quick-hack-to-try-wrapping-a-DSA-area-in-a-MemoryCon.patch

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

13 November 2018, 09:59:36

Thank you for the comment.

From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
>> I'm thinking to go with plan 1. No need to think about address
>> translation seems tempting. Plan 2 (as well as plan 3) looks a big project.
>
>The existing function dsa_create_in_place() interface was intended to support that,
>but has never been used in that way so I'm not sure what extra problems will come up.
>Here are some assorted thoughts:
>
>* You can prevent a DSA area from creating extra DSM segments, so that it is
>constrained to stay entirely in the space you give it, by calling dsa_set_size_limit(area,
>size) using the same size that you gave to dsa_create_in_place(); now you have a
>DSA area that manages a single fixed-sized chunk of memory that you gave it, in your
>case inside the traditional shared memory segment (but it could be anywhere,
>including inside a DSM segment or another DSA area!)
Yeah, I will use it.

>* You can probably write a MemoryContext wrapper for it, if it has only one segment
>that is in the traditional shared memory segment.
>You would need to do very simple kind of address translation: the result from palloc()
>needs to be base + dsa_allocate()'s result, and the argument to pfree() needs to be
>subtracted from base when
>dsa_free() is called.  That is a version of your idea C that should work AFAIK.

I didn't notice that if only one segment is used dsa_get_address() is not needed and
simple math is enough.

>* Once you have that working, you now have a new kind of resource management
>problem on your hands: memory leaks will be cluster-wide and cluster-life-time!
>That's hard, because the goal is to be able to use arbitrary code in the tree that deals
>with plans etc, but that code all assumes that it can "throw" (elog()) on errors.
>PostgreSQL C is generally "garbage collected" (in a way), but in this sketch, that
>doesn't work anymore: this area *never* goes out of scope and gets cleaned up.
>Generally, languages with exceptions either need garbage collection or scoped
>destructors to clean up the mess, but in this sketch we don't have that anymore...
>much like allocating stuff in TopMemoryContext, except worse because it doesn't go
>away when one backend exits.
>
>* I had some ideas about some kind of "allocation rollback" interface:
>you begin an "allocation transaction", allocate a bunch of stuff (perhaps indirectly, by
>calling some API that makes query plans or whatever and is totally unaware of this
>stuff).  Then if there is an error, whatever was allocated so far is freed in the usual
>cleanup paths by a rollback that happens via the resource manager machinery.
>If you commit, then the allocation becomes permanent.  Then you only commit stuff
>that you promise not to leak (perhaps stuff that has been added to a very carefully
>managed cluster-wide plan cache).  I am not sure of the details, and this might be
>crazy...


Can I check my understanding?
The situation you are talking about is the following:
Data structure A and B will be allocated on DSA. Data A has a pointer to B.
Another data X is already allocated on some area (might be on local heap) and has a pointer variable X->a,
which will be linked to A.

 old_context = MemoryContextSwitchTo(dsa_memory_context);
 A = palloc();
 B = palloc();
 A->b = B; 
 X->a = A;
 MemoryContextSwitchTo(old_context);
 
If error happens in the way of this flow, palloc'd data (A & B) should be freed 
and pointer to freed data (X->a) should be back to its original one. 
So handling these cases introduces an "transaction" API like begin_allocate() and end_allocate().

I'm thinking begin_allocate() starts to keeping a record of palloc'd data 
until end_allocate() is done. If error occurs, just pfree() these data. However, to rollback pointers we need to 
take notes of old value of the pointer. This would introduce a new API like 
"points_to_pallocd_data(pointer, new_data)" to remember old_value and do `X->a = A`. 
To implement them I still need further consideration about how to fit or extend existing MemoryContext machinery.

Regards,
Takeshi Ideriha

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

13 November 2018, 11:40:38

From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
>
>On Tue, Nov 13, 2018 at 9:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Nov 8, 2018 at 9:05 PM Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>> > * I had some ideas about some kind of "allocation rollback" interface:
>> > you begin an "allocation transaction", allocate a bunch of stuff
>> > (perhaps indirectly, by calling some API that makes query plans or
>> > whatever and is totally unaware of this stuff).  Then if there is an
>> > error, whatever was allocated so far is freed in the usual cleanup
>> > paths by a rollback that happens via the resource manager machinery.
>> > If you commit, then the allocation becomes permanent.  Then you only
>> > commit stuff that you promise not to leak (perhaps stuff that has
>> > been added to a very carefully managed cluster-wide plan cache).  I
>> > am not sure of the details, and this might be crazy...
>>
>> Hmm, my first thought was that you were just reinventing memory
>> contexts, but it's really not quite the same thing, because you want
>> the allocations to end up owned by a long-lived context when you
>> succeed but a transient context when you fail.  Still, if it weren't
>> for the fact that the memory context interface is hostile to dynamic
>> shared memory's map-this-anywhere semantics, I suspect we'd try to
>> find a way to make memory contexts fit the bill, maybe by reparenting
>> contexts or even individual allocations.  You could imagine using the
>> sorts of union algorithms that are described in
>> https://en.wikipedia.org/wiki/Disjoint-set_data_structure to get very
>> low asymptotic complexity here.
>
>Yeah, I was imagining something that still works with MemoryContexts.
>Interesting idea re: unions.  I haven't got as far as thinking about how you'd actually
>make that work.  But I think we're both describing the same general kind of
>semantics; you need to be able to build stuff with automatic clean-up on non-local exit,
>but also keep that stuff for longer once you decide you're ready.
>
>Anyway, avoiding all the hard questions about new kinds of foot gun for now, here is a
>stupid hack that shows a DSA area inside the traditional fixed-address shared memory
>region, wrapped in a custom (and somewhat defective for now) MemoryContext.  It
>shows a "List"
>being manipulated by standard PostgreSQL list code that is entirely unaware that it is
>working in shared memory:
>
>postgres=# call hoge_list(3);
>NOTICE:  Contents of list:
>NOTICE:   1
>NOTICE:   2
>NOTICE:   3
>CALL
>
>You can call that procedure from various different backends to add and remove
>integers from a List.  The point is that it could just as easily be a query plan or any
>other palloc-based stuff in our tree, and the pointers work in any backend.

Thanks for the patch. I know it's a rough sketch but basically that's what I want to do. 
I tried it and it works well.

Regards,
Takeshi Ideriha

Re: Copy data to DSA area

From

Thomas Munro

Date:

14 November 2018, 00:50:06

On Tue, Nov 13, 2018 at 10:59 PM Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> Can I check my understanding?
> The situation you are talking about is the following:
> Data structure A and B will be allocated on DSA. Data A has a pointer to B.
> Another data X is already allocated on some area (might be on local heap) and has a pointer variable X->a,
> which will be linked to A.
>
>  old_context = MemoryContextSwitchTo(dsa_memory_context);
>  A = palloc();
>  B = palloc();
>  A->b = B;
>  X->a = A;
>  MemoryContextSwitchTo(old_context);
>
> If error happens in the way of this flow, palloc'd data (A & B) should be freed
> and pointer to freed data (X->a) should be back to its original one.
> So handling these cases introduces an "transaction" API like begin_allocate() and end_allocate().

I wasn't thinking of resetting X->a to its original value, just
freeing A and B.  For a shared cache in this memory area, you need to
make sure that you never leak memory; it must either be referenced by
the cache book keeping data structures so that you can find it and
manually free it eventually (probably with an LRU system?), or it must
somehow be cleaned up if an error occurs before you manage to insert
it into the cache.  During construction of an object graph, before you
attach it to your manual book keeping data structures, there is a
danger zone.

For example, what if, while copying a query plan or a catcache entry,
we successfully allocate a new cache entry with palloc(), but then
another palloc() call fails because we run out of memory when copying
the keys?  It looks like the current catcache.c coding just doesn't
care: memory will be leaked permanently in CacheMemoryContext.  That's
OK because it is process-local.  Such a process is probably doomed
anyway.  If you exit and then start a new backend the problem is gone.
Here we need something better, because once this new kind of memory
fills up with leaked objects, you have to restart the whole cluster to
recover.

If the solution involves significantly different control flows (like
PG_TRY(), PG_FINALLY() manual cleanup all over the place), then lots
of existing palloc-based code will need to be reviewed and rewritten
very carefully for use in this new kind of memory context.  If it can
be done automagically, then more of our existing code will be able to
participate in the construction of stuff that we want to put in shared
memory.

I first thought about this in the context of shared plan caches, a
topic that appears on this mailing list from time to time.  There are
some other fun problems, like cache invalidation for shared caches,
space recycling (LRUs etc), and of course locking/concurrency
(dsa_allocate() and dsa_free() are concurrency safe, but whatever data
structures you build with the memory of course need their own plan for
dealing with concurrency).  But a strategy for clean-up on errors
seems to be a major subproblem to deal with early in the project.  I
will be very happy if we can come up with something :-)

On Wed, Nov 14, 2018 at 12:40 AM Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
> >postgres=# call hoge_list(3);
> >NOTICE:  Contents of list:
> >NOTICE:   1
> >NOTICE:   2
> >NOTICE:   3
> >CALL
> >
> >You can call that procedure from various different backends to add and remove
> >integers from a List.  The point is that it could just as easily be a query plan or any
> >other palloc-based stuff in our tree, and the pointers work in any backend.
>
> Thanks for the patch. I know it's a rough sketch but basically that's what I want to do.
> I tried it and it works well.

After your message I couldn't resist a small experiment.  But it seems
there are plenty more problems to be solved to make it useful...

-- 
Thomas Munro
http://www.enterprisedb.com

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

28 November 2018, 05:13:26

Hi

>From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
>Sent: Wednesday, November 14, 2018 9:50 AM
>To: Ideriha, Takeshi/出利葉 健 <ideriha.takeshi@jp.fujitsu.com>
>
>On Tue, Nov 13, 2018 at 10:59 PM Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com>
>wrote:
>> Can I check my understanding?
>> The situation you are talking about is the following:
>> Data structure A and B will be allocated on DSA. Data A has a pointer to B.
>> Another data X is already allocated on some area (might be on local
>> heap) and has a pointer variable X->a, which will be linked to A.
>>
>>  old_context = MemoryContextSwitchTo(dsa_memory_context);
>>  A = palloc();
>>  B = palloc();
>>  A->b = B;
>>  X->a = A;
>>  MemoryContextSwitchTo(old_context);
>>
>> If error happens in the way of this flow, palloc'd data (A & B) should
>> be freed and pointer to freed data (X->a) should be back to its original one.
>> So handling these cases introduces an "transaction" API like begin_allocate() and
>end_allocate().
>
>I wasn't thinking of resetting X->a to its original value, just freeing A and B.  For a
>shared cache in this memory area, you need to make sure that you never leak
>memory; it must either be referenced by the cache book keeping data structures so
>that you can find it and manually free it eventually (probably with an LRU system?), or
>it must somehow be cleaned up if an error occurs before you manage to insert it into
>the cache.  During construction of an object graph, before you attach it to your
>manual book keeping data structures, there is a danger zone.
>
>For example, what if, while copying a query plan or a catcache entry, we successfully
>allocate a new cache entry with palloc(), but then another palloc() call fails because
>we run out of memory when copying the keys?  It looks like the current catcache.c
>coding just doesn't
>care: memory will be leaked permanently in CacheMemoryContext.  That's OK
>because it is process-local.  Such a process is probably doomed anyway.  If you exit
>and then start a new backend the problem is gone.
>Here we need something better, because once this new kind of memory fills up with
>leaked objects, you have to restart the whole cluster to recover.

Thank you for the clarification. I agree that leaving memory leaking in shared memory 
without freeing it ends up cluster-wide problem. 

>If the solution involves significantly different control flows (like PG_TRY(),
>PG_FINALLY() manual cleanup all over the place), then lots of existing palloc-based
>code will need to be reviewed and rewritten very carefully for use in this new kind of
>memory context.  If it can be done automagically, then more of our existing code will
>be able to participate in the construction of stuff that we want to put in shared
>memory.

About resetting X->a I was thinking about how to treat dangling pointer inside a new memory
context machinery instead of using PG_TRY() stuff. That's because putting PG_TRY() all over the 
place becomes responsible for a developer trying to shared-memory-version MemoryContext and 
it needs a lot of efforts as you noted. But in my mind even if PG_TRY() is used, it only affects new
code uses shared-memory version MemoryContext and doesn't affect current code.
I may be missing something here...

>I first thought about this in the context of shared plan caches, a topic that appears on
>this mailing list from time to time.  There are some other fun problems, like cache
>invalidation for shared caches, space recycling (LRUs etc), and of course
>locking/concurrency
>(dsa_allocate() and dsa_free() are concurrency safe, but whatever data structures
>you build with the memory of course need their own plan for dealing with concurrency).
>But a strategy for clean-up on errors seems to be a major subproblem to deal with
>early in the project.  I will be very happy if we can come up with something :-)

Yeah, I hope we can do it somehow.


>On Wed, Nov 14, 2018 at 12:40 AM Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com>
>wrote:
>> From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
>> >postgres=# call hoge_list(3);
>> >NOTICE:  Contents of list:
>> >NOTICE:   1
>> >NOTICE:   2
>> >NOTICE:   3
>> >CALL
>> >
>> >You can call that procedure from various different backends to add
>> >and remove integers from a List.  The point is that it could just as
>> >easily be a query plan or any other palloc-based stuff in our tree, and the pointers
>work in any backend.
>>
>> Thanks for the patch. I know it's a rough sketch but basically that's what I want to
>do.
>> I tried it and it works well.
>
>After your message I couldn't resist a small experiment.  But it seems there are
>plenty more problems to be solved to make it useful...

Thank you for kind remark. 
By the way I just thought meta variable "hoge" is used only in Japan :)

Regards,
Takeshi Ideriha

Re: Copy data to DSA area

From

Kyotaro HORIGUCHI

Date:

28 November 2018, 08:08:46

Hello.

At Wed, 28 Nov 2018 05:13:26 +0000, "Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com> wrote in
<4E72940DA2BF16479384A86D54D0988A6F3BD73A@G01JPEXMBKW04>
> Hi
> 
> >From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
> >Here we need something better, because once this new kind of memory fills up with
> >leaked objects, you have to restart the whole cluster to recover.
> 
> Thank you for the clarification. I agree that leaving memory leaking in shared memory 
> without freeing it ends up cluster-wide problem. 

If some of the "leaked" objects are liked each other with other
"live" objects including those on local memory, we must tell
whether each "maybe-leaked" object is actually a leak or
not. That looks apprently stupid. Maybe we shouldn't manipulate
objects on shared memory at such a low-level interface. We would
need a kind of resource manager for such structures holding
shared objects and should (perhaps explicitly) register the
objects that shuold be free'd on error-exiting from a scope. This
seems a bit different from reset callbacks of the current
MemoryContext.

> >If the solution involves significantly different control flows (like PG_TRY(),
> >PG_FINALLY() manual cleanup all over the place), then lots of existing palloc-based
> >code will need to be reviewed and rewritten very carefully for use in this new kind of
> >memory context.  If it can be done automagically, then more of our existing code will
> >be able to participate in the construction of stuff that we want to put in shared
> >memory.
> 
> About resetting X->a I was thinking about how to treat dangling pointer inside a new memory
> context machinery instead of using PG_TRY() stuff. That's because putting PG_TRY() all over the 
> place becomes responsible for a developer trying to shared-memory-version MemoryContext and 
> it needs a lot of efforts as you noted. But in my mind even if PG_TRY() is used, it only affects new
> code uses shared-memory version MemoryContext and doesn't affect current code.
> I may be missing something here...

Thomas mentioned exiting upper-layer library code using palloc,
like pg_list.  As I wrote above, pg_list is one of such feature
that would need to be carefully adjusted.

> >I first thought about this in the context of shared plan caches, a topic that appears on
> >this mailing list from time to time.  There are some other fun problems, like cache
> >invalidation for shared caches, space recycling (LRUs etc), and of course
> >locking/concurrency
> >(dsa_allocate() and dsa_free() are concurrency safe, but whatever data structures
> >you build with the memory of course need their own plan for dealing with concurrency).
> >But a strategy for clean-up on errors seems to be a major subproblem to deal with
> >early in the project.  I will be very happy if we can come up with something :-)
> 
> Yeah, I hope we can do it somehow.

In addition to recycling issue within a fixed area, I suppose
that we will need to "expand" the fixed-address shared
memory. PROT_NONE could work for that but it must break some
platforms..

> >On Wed, Nov 14, 2018 at 12:40 AM Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com>
> >wrote:
> >> From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
> >> >postgres=# call hoge_list(3);
> >> >NOTICE:  Contents of list:
> >> >NOTICE:   1
> >> >NOTICE:   2
> >> >NOTICE:   3
> >> >CALL
> >> >
> >> >You can call that procedure from various different backends to add
> >> >and remove integers from a List.  The point is that it could just as
> >> >easily be a query plan or any other palloc-based stuff in our tree, and the pointers
> >work in any backend.
> >>
> >> Thanks for the patch. I know it's a rough sketch but basically that's what I want to
> >do.
> >> I tried it and it works well.
> >
> >After your message I couldn't resist a small experiment.  But it seems there are
> >plenty more problems to be solved to make it useful...
> 
> Thank you for kind remark. 
> By the way I just thought meta variable "hoge" is used only in Japan :)

I've seen it many-many times including mine in this list, thus
why don't you know it is already a world-wide jargon? :p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

05 December 2018, 05:42:20

Hi, thank you for the comment.

>-----Original Message-----
>From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
>Sent: Wednesday, November 28, 2018 5:09 PM
>
>Hello.
>
>At Wed, 28 Nov 2018 05:13:26 +0000, "Ideriha, Takeshi"
><ideriha.takeshi@jp.fujitsu.com> wrote in
><4E72940DA2BF16479384A86D54D0988A6F3BD73A@G01JPEXMBKW04>
>> Hi
>>
>> >From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]
>> >Here we need something better, because once this new kind of memory
>> >fills up with leaked objects, you have to restart the whole cluster to recover.
>>
>> Thank you for the clarification. I agree that leaving memory leaking
>> in shared memory without freeing it ends up cluster-wide problem.
>
>If some of the "leaked" objects are liked each other with other "live" objects including
>those on local memory, we must tell whether each "maybe-leaked" object is actually
>a leak or not. That looks apprently stupid. Maybe we shouldn't manipulate objects on
>shared memory at such a low-level interface. We would need a kind of resource
>manager for such structures holding shared objects and should (perhaps explicitly)
>register the objects that shuold be free'd on error-exiting from a scope. This seems a
>bit different from reset callbacks of the current MemoryContext.

Sure. We want to get rid of dangling pointer at the error handling but this should be 
transparent. 


>> >If the solution involves significantly different control flows (like
>> >PG_TRY(),
>> >PG_FINALLY() manual cleanup all over the place), then lots of
>> >existing palloc-based code will need to be reviewed and rewritten
>> >very carefully for use in this new kind of memory context.  If it can
>> >be done automagically, then more of our existing code will be able to
>> >participate in the construction of stuff that we want to put in shared memory.
>>
>> About resetting X->a I was thinking about how to treat dangling
>> pointer inside a new memory context machinery instead of using
>> PG_TRY() stuff. That's because putting PG_TRY() all over the place
>> becomes responsible for a developer trying to shared-memory-version
>> MemoryContext and it needs a lot of efforts as you noted. But in my mind even if
>PG_TRY() is used, it only affects new code uses shared-memory version
>MemoryContext and doesn't affect current code.
>> I may be missing something here...
>
>Thomas mentioned exiting upper-layer library code using palloc, like pg_list.  As I
>wrote above, pg_list is one of such feature that would need to be carefully adjusted.

Thank you for the clarification. I think I was misunderstanding. In some previous email 
I suggested "points_to_pallocd_data(pointer, new_data)" to remember old_value and 
do reset a pointer at the time of error handling. Yeah, this interface needs to be implemented 
at existing function wrapping palloc and set the pointer.

>> >I first thought about this in the context of shared plan caches, a
>> >topic that appears on this mailing list from time to time.  There are
>> >some other fun problems, like cache invalidation for shared caches,
>> >space recycling (LRUs etc), and of course locking/concurrency
>> >(dsa_allocate() and dsa_free() are concurrency safe, but whatever
>> >data structures you build with the memory of course need their own plan for
>dealing with concurrency).
>> >But a strategy for clean-up on errors seems to be a major subproblem
>> >to deal with early in the project.  I will be very happy if we can
>> >come up with something :-)
>>
>> Yeah, I hope we can do it somehow.
>
>In addition to recycling issue within a fixed area, I suppose that we will need to
>"expand" the fixed-address shared memory. PROT_NONE could work for that but it
>must break some platforms..
If we are only talking about DSA created in the place of Postgres-initailized shared area,
I think we can prevent memory from expanding using dsa_set_limit_size().

>> Thank you for kind remark.
>> By the way I just thought meta variable "hoge" is used only in Japan
>> :)
>
>I've seen it many-many times including mine in this list, thus why don't you know it is
>already a world-wide jargon? :p

Ah I didn't know that, thanks!

Regards,
Takeshi Ideriha

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

17 April 2019, 05:07:11

>From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>Sent: Wednesday, December 5, 2018 2:42 PM
>Subject: RE: Copy data to DSA area

Hi
It's been a long while since we discussed this topic.
Let me recap first and I'll give some thoughts.

It seems things we got consensus is:

- Want to palloc/pfree transparently in DSA
- Use Postgres-initialized shared memory as DSA
- Don’t leak memory in shared memory

Things under discussion:
- How we prevent memory leak
- How we prevent dangling pointer after cleaning up about-to-leak-objects

Regarding memory leak, I think Robert's idea that allocate objects under temporal context
while building and re-parent it to permanent one at some point is promising.
While building objects they are under temporal DSA-MemoryContext, which is
child of TopTransactionContext (if it's in the transaction) and are freed all at once when error happens.
To do delete all the chunks allocated under temporal DSA context, we need to search
or remember all chunks location under the context. Unlike AllocAset we don't have block information
to delete them altogether.

So I'm thinking to manage dsa_allocated chunks with single linked list to keep track of them and delete them.
The context has head of linked list and all chunks have pointer to next allocated chunk.
But this way adds space overhead to every dsa_allocated chunk and we maybe want to avoid it because shared memory size
islimited.

In this case, we can free these pointer area at some point when we make sure that allocation is successful.

Another debate is when we should think the allocation is successful (when we make sure object won't leak).
If allocation is done in the transaction, we think if transaction is committed we can think it's safe.
Or I assume this DSA memory context for cache such as relcache, catcache, plancache and so on.
In this case cache won't leak once it's added to hash table or list because I assume some eviction mechanism like LRU
willbe implemented

and it will erase useless cache some time later.

What do you think about these ideas?

Regarding dangling pointer I think it's also problem.
After cleaning up objects to prevent memory leak we don't have mechanism to reset dangling pointer.
On this point I gave some thoughts while ago though begin_allocate/end_allocate don't seem good names.
Maybe more explaining names are like start_pointing_to_dsa_object_under_construction() and
end_pointing_to_dsa_object_under_construction().

https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F1F259F%40G01JPEXMBKW04
If we make sure that such dangling pointer never happen, we don't need to use it.
As Thomas mentioned before, where these interface should be put needs review but couldn't hit upon another solution
rightnow.

Do you have some thoughts?

best regards,
Ideriha, Takeshi

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

26 April 2019, 14:50:00

Hi, I've updated Thomas's quick PoC.

>From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>Sent: Wednesday, April 17, 2019 2:07 PM
>>From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>>Sent: Wednesday, December 5, 2018 2:42 PM
>>Subject: RE: Copy data to DSA area
>
>Things under discussion:
>- How we prevent memory leak
>- How we prevent dangling pointer after cleaning up about-to-leak-objects
>
>Regarding memory leak, I think Robert's idea that allocate objects under temporal
>context while building and re-parent it to permanent one at some point is promising.
>While building objects they are under temporal DSA-MemoryContext, which is child of
>TopTransactionContext (if it's in the transaction) and are freed all at once when error
>happens.
>To do delete all the chunks allocated under temporal DSA context, we need to search
>or remember all chunks location under the context. Unlike AllocAset we don't have
>block information to delete them altogether.
>
>So I'm thinking to manage dsa_allocated chunks with single linked list to keep track of
>them and delete them.
>The context has head of linked list and all chunks have pointer to next allocated chunk.
>But this way adds space overhead to every dsa_allocated chunk and we maybe want
>to avoid it because shared memory size is limited.
>In this case, we can free these pointer area at some point when we make sure that
>allocation is successful.
>
>Another debate is when we should think the allocation is successful (when we make
>sure object won't leak).
>If allocation is done in the transaction, we think if transaction is committed we can
>think it's safe.
>Or I assume this DSA memory context for cache such as relcache, catcache, plancache
>and so on.
>In this case cache won't leak once it's added to hash table or list because I assume
>some eviction mechanism like LRU will be implemented and it will erase useless cache
>some time later.

I changed design from my last email. 
I've introduced "local" and "shared" MemoryContext for DSA. 
Local means MemoryContext object is allocated on local heap and shared means on shared memory.
While dsa_allocating, operation should be done on local context and after developer thinks
they don't leak, set local context to shared one. 

Dsa_pointer to chunks is stored in the array linked by local context.
When error happens before chunks become "shared",  all chunks are freed by checking the array.
On the other hand, chunk gets "committed" and become shared, we don't need that array of pointers.

This PoC has three test cases, which is updated version of Thomas's ones.
- hoge() : palloc(), pfree() then set the context shared and do it again
- hoge_list(): add chunk to shared single linked list and set context to shared
- hoge_list_error(): add chunk to linked list but error happens before it becomes shared one, 
                    so free the chunk to prevent memory leak

Well, after developing PoC, I realized that this PoC doesn't solve the local process is crashed before
the context becomes shared because local process keeps track of pointer to chunks. 
Maybe all of you have already noticed and pointed out this case :) 
So it needs another work but this poc is a good step for me to advance more.

Another thing is that I don't want put backpointer to MemoryContext before every chunks
since it has some overhead in limited shared memory. But pfree uses it so I compromised it.
And when set local context to shared one, I need to change every backpointer to shared MemoryContext
so it has some cost.

I think there is more efficient way.
(Maybe Robert mentioned it in previous email?)

>Regarding dangling pointer I think it's also problem.
>After cleaning up objects to prevent memory leak we don't have mechanism to reset
>dangling pointer.

I haven't addressed the dangling pointer yet.
Actually hoge_list() issued after hoge_list_error() is executed leads backends crash.
That's because it seems to me that allocated chunk is freed after error 
but the tail pointer of shared linked list is not recovered.
It becomes dangling pointer.
So this would be a good example of dangling pointer.

Regards,
Takeshi Ideriha

Attachment

PoC-DSA-MemoryContext-extension.tar.gz

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

08 May 2019, 05:29:30

Hi, 

>From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>Sent: Friday, April 26, 2019 11:50 PM
>Well, after developing PoC, I realized that this PoC doesn't solve the local process is
>crashed before the context becomes shared because local process keeps track of
>pointer to chunks.
>Maybe all of you have already noticed and pointed out this case :) So it needs another
>work but this poc is a good step for me to advance more.

I think the point to prevent memory leak is allocating memory and storing its 
address into a structure at the same time. This structure should be trackable from
other process. 

So I'm going to change the design. I'll allocate a buffer of pointers to 
dsa objects which are not permanent yet. This buffer is located on shared 
memory. For simplicity, the buffer is allocated per process. That is the 
number of backends equals to MaxBackends. 

Developer calls API to make objects permanent. If objects become permanent, 
corresponding pointers are deleted from the buffer. If they fail to become 
permanent, they are freed from shared memory by checking the buffer and 
corresponding pointers also deleted from the buffer. 

There are two cases of failure: one is transaction abort, the other is process
crash. If transaction aborts, that process itself has responsibility to free
the objects. The process free the objects. In case of process crash, another 
backend needs to take care of it. I'm assuming on_shmem_exit callback can be 
used to free objects by postmaster. 

Regarding MemoryContext, one Shared MemoryContext is on the shared memory
and each chunk has a back pointer to it to pfree and so on.

I'm going to make a new patch and send it later.

Regards,
Takeshi Ideriha

Re: Copy data to DSA area

From

Thomas Munro

Date:

08 May 2019, 06:04:06

On Wed, May 8, 2019 at 5:29 PM Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> >From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
> >Sent: Friday, April 26, 2019 11:50 PM
> >Well, after developing PoC, I realized that this PoC doesn't solve the local process is
> >crashed before the context becomes shared because local process keeps track of
> >pointer to chunks.
> >Maybe all of you have already noticed and pointed out this case :) So it needs another
> >work but this poc is a good step for me to advance more.
>
> I think the point to prevent memory leak is allocating memory and storing its
> address into a structure at the same time. This structure should be trackable from
> other process.

I'm not sure that it's necessarily wrong to keep tracking information
in private memory.  If any backend crashes, the postmaster will
terminate all backends and reinitialise everything anyway, because
shared memory might be corrupted.


--
Thomas Munro
https://enterprisedb.com

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

08 May 2019, 06:55:54

Hi, Thomas 

>-----Original Message-----
>From: Thomas Munro [mailto:thomas.munro@gmail.com]
>Subject: Re: Copy data to DSA area
>
>On Wed, May 8, 2019 at 5:29 PM Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com>
>wrote:
>> >From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>> >Sent: Friday, April 26, 2019 11:50 PM Well, after developing PoC, I
>> >realized that this PoC doesn't solve the local process is crashed
>> >before the context becomes shared because local process keeps track
>> >of pointer to chunks.
>> >Maybe all of you have already noticed and pointed out this case :) So
>> >it needs another work but this poc is a good step for me to advance more.
>>
>> I think the point to prevent memory leak is allocating memory and
>> storing its address into a structure at the same time. This structure
>> should be trackable from other process.
>
>I'm not sure that it's necessarily wrong to keep tracking information in private memory.
>If any backend crashes, the postmaster will terminate all backends and reinitialise
>everything anyway, because shared memory might be corrupted.

Thank you very much for the clarification. I haven't looked into reinitialize sequence
so much. I checked CreateSharedMemoryAndSemaphores is called again and shared
memory gets initialized. So I'm going to put keep tracking information in private 
memory and send a patch.

Regards,
Takeshi Ideriha

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

14 May 2019, 11:46:15

Hi, 

>>From: Thomas Munro [mailto:thomas.munro@gmail.com]
>>Subject: Re: Copy data to DSA area
>>
>>On Wed, May 8, 2019 at 5:29 PM Ideriha, Takeshi
>><ideriha.takeshi@jp.fujitsu.com>
>>wrote:
>>> >From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>>> >Sent: Friday, April 26, 2019 11:50 PM Well, after developing PoC, I
>>> >realized that this PoC doesn't solve the local process is crashed
>>> >before the context becomes shared because local process keeps track
>>> >of pointer to chunks.
>>> >Maybe all of you have already noticed and pointed out this case :)
>>> >So it needs another work but this poc is a good step for me to advance more.
>>>
>>> I think the point to prevent memory leak is allocating memory and
>>> storing its address into a structure at the same time. This structure
>>> should be trackable from other process.
>>
>>I'm not sure that it's necessarily wrong to keep tracking information in private
>memory.
>>If any backend crashes, the postmaster will terminate all backends and
>>reinitialise everything anyway, because shared memory might be corrupted.
>
>I'm going to put keep tracking information in private
>memory and send a patch.

I updated a PoC patch.
This has memory tracking buffer in local process. The old version also has this 
system but I refactored the code. Memory leak while allocating memory seems to 
be solved thanks to memory tracking buffer.

What I haven't addressed is memory leak while freeing objects. In current
sequence a cache (e.g. relcache) is freed after removed from its hash table. 
If cache and hash table gets shared, memory leak is likely to happen
between removal from hash table and free. We lose track of cache objects
if error happens after cache is unlinked from the hash table. And also a cache
consists of graph structure. So we also take care of freeing cache partially.

Maybe we need to remember pointers of objects before unlink from the hash.
Also, we need to free them all at once after we can make sure that all the 
pointers are registered to local buffer. Followings are some idea to implement
this: 
- change the order of removal from hash table and deletion
- pfree in shared memory context doesn't dsa_free but just add pointer to
  the local buffer. 
- remove link from hash table after all pfree() is done
- then, call a function, which does actual dsa_free taking a look at the local
  Buffer

But I'm not sure this solution is good one. Do you have any thoughts?

Regards,
Takeshi Ideriha

Attachment

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

24 June 2019, 12:53:29

>From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>Sent: Friday, April 26, 2019 11:50 PM
>To: 'Kyotaro HORIGUCHI' <horiguchi.kyotaro@lab.ntt.co.jp>;
>thomas.munro@enterprisedb.com; robertmhaas@gmail.com
>Cc: pgsql-hackers@postgresql.org
>Subject: RE: Copy data to DSA area
>
>Hi, I've updated Thomas's quick PoC.

Hi.

I've rebased the patch to fit the core code rather than extension.
Regarding shared memory context (ShmContext), I added three 
APIs:
- CreatePermShmContext
  create "permanent" context located in shared memory
- CreateTempShmContext
  Create "temporary" context located in local memory,
  Which has buffer to keep track of possible memory leak objects
- ChangeToPermShmContext
  Change allocated objects parent to permanent context

When you allocate something, add an object in the Temp ShmContext, 
and re-parent it to Perm ShmContext after it becomes not leaked.

Current method of keeping track of objects and freeing them at
rollback works well for the case where delete both the parent object 
and child object, which is pointed by parent. This is because no dangling 
pointer remains after rollback. If child object is freed but parent object
was already allocated in the permeant context, this object has a
dangling pointer. But if an object is pointed by already permanent object,
this means that just allocated object won't be leaked. So in such cases
we could skip temporary allocation and allocate it directory to permanent
context. At rollback case, we could just leave it in the shared memory 
and could make upper layer function handle its "ghost" object in a good way.
I cannot see the solution around here clearly. Do you have any thoughts?

MemoryContextMethods are not fully supported but some methods
like stats() might be needed. 

Current test case is very simple and same as previous one but
implemented as isolation test. It checks if interger can be set in 
shared memory and get it by another process. Actually, current test
case doesn't cover all possible case so more cases should be added.

I'll add this coming CF.

P.S 
Thomas, thank you for taking some time at PGCon 
to discuss this item and shared catalog cache. It was very helpful.
I'm going to submit email about shared catcache soon.

Regards,
Takeshi Ideriha

Attachment

0001-Shared-memory-context-backed-by-DSA-and-its-test.patch

Re: Copy data to DSA area

From

Thomas Munro

Date:

10 July 2019, 06:03:31

On Tue, Jun 25, 2019 at 12:53 AM Ideriha, Takeshi
<ideriha.takeshi@jp.fujitsu.com> wrote:
> I've rebased the patch to fit the core code rather than extension.
> Regarding shared memory context (ShmContext), I added three
> APIs:
> - CreatePermShmContext
>   create "permanent" context located in shared memory
> - CreateTempShmContext
>   Create "temporary" context located in local memory,
>   Which has buffer to keep track of possible memory leak objects
> - ChangeToPermShmContext
>   Change allocated objects parent to permanent context
>
> When you allocate something, add an object in the Temp ShmContext,
> and re-parent it to Perm ShmContext after it becomes not leaked.

Hi Ideriha-san,

Thanks for working on this!  It's important for us to figure out how
to share caches, so we can make better use of memory.  I think there
are two things going on in this prototyping work:

1.  The temporary/permanent concept.  I think the same problem will
come up in the future when we eventually have a multi-thread model.

2.  Getting a MemoryContext-compatible allocator to work with the
current multi-process model.

So, I'm wondering if we can find a more generic way to do your
ChangeToPermShmContext thing with an API that isn't tied to this
implementation.  Another thing to consider is that we probably still
need an intermediate level of contexts, to hold individual complex
long-lived objects like (say) cached plans.

What do you think about the following?  Even though I know you want to
start with much simpler kinds of cache, I'm looking ahead to the lofty
end-goal of having a shared plan cache.  No doubt, that involves
solving many other problems that don't belong in this thread, but
please indulge me:

1.  We could provide a way to ask any MemoryContext implementation to
create a new context of the same type and set its parent to any
parent, so that it will be automatically deleted when that parent is
deleted.  For example MemoryContextClone(SharedPlanCache,
some_short_lived_context).  That gives you a memory context that will
be deleted with some_short_lived_context (unless you reparent it or
delete it first), but it's of the same type as SharedPlanCache and (in
this particular case) allocates from the same underlying DSA area.

In some hypothetical future, SharedPlanCache might use a different
memory context implementation depending on whether you selected a
multi-process or multi-thread build, but when you call
MemoryContextClone(SharedPlanCache, ...) you don't know or care about
that, it just has to be some implementation that supports cloning.

2.  One you have finished constructing (say) a plan in that memory
context, you can make it 'permanent' by simply calling
MemoryContextSetParent(temp, SharedPlanCache).  That follows existing
practice for how we put stuff into the existing backend-private cache
context.  Otherwise, it is deleted when its current parent is deleted.

3.  Following existing practice, the cached plan needs to know the
context that it lives in, so that DropCachedPlan() can delete the
whole plan easily according to your LRU scheme (or whatever -- a topic
for another thread).

One of the implications of the above ideas is that the MemoryContext
object itself has to be in shared memory, and the management of
parent/child links needs to be protected by locks for parts of the
context graph that are shared.  And probably there are other
complications in this area.  I don't claim that any of this is easy,
or necessarily the right way to do it.  But I'm pretty sure these
problems are not artifacts to the multi-process/shared memory model,
they're due to sharing and they'll still be waiting for us in the
multi-threaded future!

+    /* To avoid double free by ShmContextDelete, remove its reference */
+    for (buf = shmContext->temp_buffer; buf != NULL; buf = buf->next)
+    {
+        for (idx = 0; idx < buf->idx; idx++)
+        {
+            if (buf->chunks[idx] == dp)
+            {
+                buf->chunks[idx] = 0;
+                break;
+            }
+        }
+    }

Hmm.  I wonder if we should just make ShmContextFree() do nothing! And
make ShmContextAlloc() allocate (say) 8KB chunks (or larger if needed
for larger allocation) and then hand out small pieces from the
'current' chunk as needed.  Then the only way to free memory is to
destroy contexts, but for the use case being discussed, that might
actually be OK.  I suppose you'd want to call this implementation
something different, like ShmRegionContext, ShmZoneContext or
ShmArenaContext[1].  That's a fairly specialised type of memory
allocator, but it seems like it would suit our usage pattern here: you
build objects to cache them, and then destroy them all at once.  This
would only be efficient if the code that you call to build (or, I
guess, copy) plans doesn't do much palloc/free of temporary stuff,
which would leave dead junk in our context if we did it that way.

Earlier I said that it would probably be OK to use backend-local
memory to track all the objects that need to be freed on error,
because I was considering only the problem of preventing leaks on
error.  I wasn't thinking about point 3 above.  We eventually need to
delete the cached object (plan etc) from the cache (eg
DropCachedPlan()), and so we still need to have all its allocations
together in one place as a context, and that control data needs to be
accessible to other processes.  So in the ShmZoneContext idea, there
would need to be a chunk list allocated in shared memory for eventual
freeing.

[1] https://en.wikipedia.org/wiki/Region-based_memory_management

--
Thomas Munro
https://enterprisedb.com

Re: Copy data to DSA area

From

Thomas Munro

Date:

11 July 2019, 02:24:17

On Wed, Jul 10, 2019 at 6:03 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Hmm.  I wonder if we should just make ShmContextFree() do nothing! And
> make ShmContextAlloc() allocate (say) 8KB chunks (or larger if needed
> for larger allocation) and then hand out small pieces from the
> 'current' chunk as needed.  Then the only way to free memory is to
> destroy contexts, but for the use case being discussed, that might
> actually be OK.  I suppose you'd want to call this implementation
> something different, like ShmRegionContext, ShmZoneContext or
> ShmArenaContext[1].

<after sleeping on this>

I guess what I said above is only really appropriate for complex
things like plans that have their own contexts so that we can delete
them easily "in bulk".  I guess it's not true for caches of simpler
objects like catcache, that don't want a context for each cached thing
and want to free objects "retail" (one by one).  So I guess you might
want something more like your current patch for (say) SharedCatCache,
and something like the above-quoted idea for (say) SharedPlanCache or
SharedRelCache.

For an implementation that supports retail free, perhaps you could
store the address of the clean-up list element in some extra bytes
before the returned pointer, so you don't have to find it by linear
search.  Next, I suppose you don't want to leave holes in the middle
of the array, so perhaps instead of writing NULL there, you could
transfer the last item in the array to this location (with associated
concurrency problems).

Since I don't think anyone ever said it explicitly, the above
discussion is all about how we get to this situation, while making
sure that we're mostly solving problems that occur in both
multi-process and multi-threaded designs:

shared_metadata_cache = '100MB'
shared_plan_cache = '100MB'

-- 
Thomas Munro
https://enterprisedb.com

RE: Copy data to DSA area

From

"Ideriha, Takeshi"

Date:

12 July 2019, 09:46:15

Hi, thank you for the previous two emails. 

Thomas Munro <thomas.munro@gmail.com> wrote:
>What do you think about the following?  Even though I know you want to start with
>much simpler kinds of cache, I'm looking ahead to the lofty end-goal of having a shared
>plan cache.  No doubt, that involves solving many other problems that don't belong in
>this thread, but please indulge me:

My initial motivation came from shared catcache and relcache but I also think shared plan
cache is one of the big topics and I'd be very excited if it's come true. Sometimes making 
plan at each backend becomes enormous overhead for speed.

>On Wed, Jul 10, 2019 at 6:03 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Hmm.  I wonder if we should just make ShmContextFree() do nothing! And
>> make ShmContextAlloc() allocate (say) 8KB chunks (or larger if needed
>> for larger allocation) and then hand out small pieces from the
>> 'current' chunk as needed.  Then the only way to free memory is to
>> destroy contexts, but for the use case being discussed, that might
>> actually be OK.  I suppose you'd want to call this implementation
>> something different, like ShmRegionContext, ShmZoneContext or
>> ShmArenaContext[1].
>
><after sleeping on this>
>
>I guess what I said above is only really appropriate for complex things like plans that
>have their own contexts so that we can delete them easily "in bulk".  I guess it's not
>true for caches of simpler objects like catcache, that don't want a context for each
>cached thing and want to free objects "retail" (one by one).  So I guess you might
>want something more like your current patch for (say) SharedCatCache, and
>something like the above-quoted idea for (say) SharedPlanCache or SharedRelCache.

Here's what I get: 
- Temporary/Permanent concept can be implemented by following current cached plan
  schema of reparanting and also adding general API MemoryContextClone.

- There are at least two strategies (implementation ways) of allocation. One is 
  ShmZoneContext; region-based allocation, which drops chunks in bulk. This type is
  useful for cache made of complex objects graph like plan cache. This case clean up
  list (list of pointers to dsa_allocated objects) needs to be located in shared memory.
  That's because this list is used not only in error case but in case of DropCachedPlan.

- The other is not region based, but allocate or free objects "retail". I would call it
  ShmSimpleContext case. This would be suitable for, say, catalog cache.  

The type of ShmSimpleContext is OK to have clean up list in local memory, right?
Unlike ShmZoneContext, this list is only used on error and cache entry can be
freed by just traversing simple object graph as discussed before.

Regarding the tracing list for cleanup, I'd change the design based on either
ShmSimpleContext or ShmZoneContext assuming the clean-up list would be 
pointed by the context object in both type.

In type of ShmSimpleContext, the list is deleted after the context is reparent
to long lived one. The context for building, say, catcache entry is 
located in shared memory but its tracking list is in local memory.
So it seems wired to that the shared object points to local object. But there
seems no problem because other process cannot touch the context until
its reparent.

In case of ShmZoneContext, it would be ok to still keep the list after its reparent.
That's because both this intermediate context and the list is supposed be 
located in shared memory and used.

Regarding Intermediate level context, I was wondering if we need it in 
ShmSimpleContext case. Every time we build the catcache entry, we make 
MemoryContext object in shared memory and this becomes some memory 
Overhead. But right now context object per catcache entry seems simple 
architecture and moreover easier to operate on error case.

>For an implementation that supports retail free, perhaps you could store the address of
>the clean-up list element in some extra bytes before the returned pointer, so you don't
>have to find it by linear search.  Next, I suppose you don't want to leave holes in the
>middle of the array, so perhaps instead of writing NULL there, you could transfer the
>last item in the array to this location (with associated concurrency problems).

Sure, I'll fix it.

>Since I don't think anyone ever said it explicitly, the above discussion is all about how
>we get to this situation, while making sure that we're mostly solving problems that
>occur in both multi-process and multi-threaded designs:
>
>shared_metadata_cache = '100MB'
>shared_plan_cache = '100MB'

Yes, we are talking about this. Thank you for the clarification.

Regards,
Takeshi Ideriha

Re: Copy data to DSA area

From

Alvaro Herrera

Date:

25 September 2019, 20:59:48

Should now this be closed as Returned with Feedback, or are we really
waiting on an updated patch in the short term?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Copy data to DSA area

From

"ideriha.takeshi@fujitsu.com"

Date:

16 October 2019, 03:22:06

Hi, 

Sorry for waiting. 
>Thomas Munro <thomas.munro@gmail.com> wrote:
>>What do you think about the following?  Even though I know you want to
>>start with much simpler kinds of cache, I'm looking ahead to the lofty
>>end-goal of having a shared plan cache.  No doubt, that involves
>>solving many other problems that don't belong in this thread, but please indulge me:
>
>My initial motivation came from shared catcache and relcache but I also think shared
>plan cache is one of the big topics and I'd be very excited if it's come true. Sometimes
>making plan at each backend becomes enormous overhead for speed.
>
>>On Wed, Jul 10, 2019 at 6:03 PM Thomas Munro <thomas.munro@gmail.com>
>wrote:
>>> Hmm.  I wonder if we should just make ShmContextFree() do nothing!
>>> And make ShmContextAlloc() allocate (say) 8KB chunks (or larger if
>>> needed for larger allocation) and then hand out small pieces from the
>>> 'current' chunk as needed.  Then the only way to free memory is to
>>> destroy contexts, but for the use case being discussed, that might
>>> actually be OK.  I suppose you'd want to call this implementation
>>> something different, like ShmRegionContext, ShmZoneContext or
>>> ShmArenaContext[1].
>>
>><after sleeping on this>
>>
>>I guess what I said above is only really appropriate for complex things
>>like plans that have their own contexts so that we can delete them
>>easily "in bulk".  I guess it's not true for caches of simpler objects
>>like catcache, that don't want a context for each cached thing and want
>>to free objects "retail" (one by one).  So I guess you might want
>>something more like your current patch for (say) SharedCatCache, and something
>>like the above-quoted idea for (say) SharedPlanCache or SharedRelCache.

I updated shared memory context for SharedCatCache, which I call
ShmRetailContext. I refactored my previous PoC, 
which palloc calls dsa_allocate every time in a retail way.
And I also implemented MemoryContextClone(template_context, 
short_lived_parent_context).

>>For an implementation that supports retail free, perhaps you could
>>store the address of the clean-up list element in some extra bytes
>>before the returned pointer, so you don't have to find it by linear
>>search.  

ShmRetailContext is supposed to use SharedCatCache.
Here are some features of current CatCache entries:
1. The number of cache entries is generally larger than compared to relcache
  and plan cache. This is because relcache is proportional to the number of 
  tables and indexes. Catcache has much more kinds and some kind like 
  pg_statistic is proportional to the number of attributes of each table.

2. Cache entry (catctup) is built via only one or two times palloc(). 

3. When cache entry is evicted from hash table, it is deleted by pfree() 
   one by one.

Because of my point 1, I'd rather not to have the extra pointer to
clean-up list element. This pointer is allocated per catcache entry 
and it would take space. And also the length of clean-up list is not
much big becase of my point 2. So it would be fine with linear search.

And also because of my point 1 again, I didn't create MemoryContext
header for each catalog cache. These memory context header is located
in shared memory and takes space. So I use 
ShmRetailContextMoveChunk(), (which I called ChangeToPermShmContext() before),
instead of MemoryContextSetParent(). This moves only chunks 
from locally allocated parent to shared parent memory context.

Due to my point 3, I think it's also ok not to have clean-up list in
shared memory in ShmRetailContext. There is no situation to
free chunks all at once.

What do you think about above things?

>>Next, I suppose you don't want to leave holes in the middle of
>>the array, so perhaps instead of writing NULL there, you could transfer the last item
>in the array to this location (with associated concurrency problems).
Done.

ShmZoneContext for SharedPlan and SharedRelCache is not implemented
but I'm going to do it following your points.

Regards,
Takeshi Ideriha

Attachment

shm_retail_context-v1.patch

RE: Copy data to DSA area

From

"ideriha.takeshi@fujitsu.com"

Date:

16 October 2019, 06:10:39

Hi,

>ShmZoneContext for SharedPlan and SharedRelCache is not implemented but I'm
>going to do it following your points.

After looking into existing code, I'm thinking Generation Memory Context
seems to have the similar purpose. So I'll implement ShmZoneContext 
by reference it.
Generation context manages chunks having almost same life span.
ShmZoneContext would dsa_allocate() block and hand out chunks
and free chunks and its block all together.

Regards,
Takeshi Ideriha

RE: Copy data to DSA area

From

"ideriha.takeshi@fujitsu.com"

Date:

18 October 2019, 12:53:16

>>ShmZoneContext for SharedPlan and SharedRelCache is not implemented but
>>I'm going to do it following your points.
>
>After looking into existing code, I'm thinking Generation Memory Context seems to
>have the similar purpose. So I'll implement ShmZoneContext by reference it.
>Generation context manages chunks having almost same life span.
>ShmZoneContext would dsa_allocate() block and hand out chunks and free chunks and
>its block all together.

I added ShmZoneContext to my patch.
I haven't added detailed comments and test set, so let me explain how to
use it here. I followed Thomas' suggestion. 

At start up, ShmZoneContext is created in shared memory by 
ShmZoneContextCreateOrigin(). Before allocating memory, another context 
is created and set to short-lived parent context via MemoryContextClone()
so that objects and contexts are automatically freed. Then you can use, 
palloc() which returns chunk from dsa_allocated block. When you use 
MemoryContextSetParent() to long-lived ShmContext,
you need to acquire lock to protect parent-child path. The LWLock object
is get by ShmZoneContextGetLock(). Every cloned ShmZoneContext
uses the same lock instance. If you want to free allocated object, 
use MemoryContextDelete(). After the context becomes long-lived, 
you need to get lock again to do MemoryContextDelete() in order to
protect MemoryContext parent-child path.

Thinking about use case of Shared RelCache/PlanCache, allocation 
happens only before the parent context is switch to long-lived shared one,
so I think we don't need to take lock while palloc(). 
I also think MemoryContextDelete() should be done after allocated objects
are deleted from some shared index structure (ex. hash table or list 
in shared memory) so that another process can take a look at it

What do you think?

Regards,
Takeshi Ideriha

Attachment

shm_zone_retail_context-v2.patch

Re: Copy data to DSA area

From

Michael Paquier

Date:

01 December 2019, 01:59:42

On Fri, Oct 18, 2019 at 12:53:16PM +0000, ideriha.takeshi@fujitsu.com wrote:
> I added ShmZoneContext to my patch.
> I haven't added detailed comments and test set, so let me explain how to
> use it here. I followed Thomas' suggestion.

The latest patch sent fails to apply.  Could you please send a rebase?
I am moving this patch to next CF, waiting on author.
--
Michael

Attachment

signature.asc

Re: Copy data to DSA area

From

Tomas Vondra

Date:

01 February 2020, 11:24:11

I've marked this patch as returned with feedback. It's been sitting in
the CF for a long time without any update (and does not apply anymore).
That does not mean we don't want the feature (it seems useful) so feel
free to resubmit an updated patch.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services