Thread: Is custom MemoryContext prohibited?

Is custom MemoryContext prohibited?

From

Kohei KaiGai

Date:

28 January 2020, 01:55:11

Hello,

I noticed MemoryContextIsValid() called by various kinds of memory context
routines checks its node-tag as follows:

#define MemoryContextIsValid(context) \
    ((context) != NULL && \
     (IsA((context), AllocSetContext) || \
      IsA((context), SlabContext) || \
      IsA((context), GenerationContext)))

It allows only "known" memory context methods, even though the memory context
mechanism enables to implement custom memory allocator by extensions.
Here is a node tag nobody used: T_MemoryContext.
It looks to me T_MemoryContext is a neutral naming for custom memory context,
and here is no reason why memory context functions prevents custom methods.


https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
I recently implemented a custom memory context for shared memory allocation
with portable pointers. It shall be used for cache of pre-built gpu
binary code and
metadata cache of apache arrow files.
However, the assertion check above requires extension to set a fake node-tag
to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
feel a bit bad.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Re: Is custom MemoryContext prohibited?

From

Tomas Vondra

Date:

28 January 2020, 14:09:51

On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
>Hello,
>
>I noticed MemoryContextIsValid() called by various kinds of memory context
>routines checks its node-tag as follows:
>
>#define MemoryContextIsValid(context) \
>    ((context) != NULL && \
>     (IsA((context), AllocSetContext) || \
>      IsA((context), SlabContext) || \
>      IsA((context), GenerationContext)))
>
>It allows only "known" memory context methods, even though the memory context
>mechanism enables to implement custom memory allocator by extensions.
>Here is a node tag nobody used: T_MemoryContext.
>It looks to me T_MemoryContext is a neutral naming for custom memory context,
>and here is no reason why memory context functions prevents custom methods.
>

Good question. I don't think there's an explicit reason not to allow
extensions to define custom memory contexts, and using T_MemoryContext
seems like a possible solution. It's a bit weird though, because all the
actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
T_CustomMemoryContext would be a better choice, but that only works in
master, of course.

Also, it won't work if we need to add memory contexts to equalfuncs.c
etc. but maybe won't need that - it's more a theoretical issue.

>
>https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
>I recently implemented a custom memory context for shared memory allocation
>with portable pointers. It shall be used for cache of pre-built gpu
>binary code and
>metadata cache of apache arrow files.
>However, the assertion check above requires extension to set a fake node-tag
>to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
>feel a bit bad.
>

Interesting. Does that mean the hared memory contexts are part of the
same hierarchy as "normal" contexts? That would be a bit confusing, I
think.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Is custom MemoryContext prohibited?

From

Kohei KaiGai

Date:

28 January 2020, 14:32:49

2020年1月28日(火) 23:09 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
>
> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
> >Hello,
> >
> >I noticed MemoryContextIsValid() called by various kinds of memory context
> >routines checks its node-tag as follows:
> >
> >#define MemoryContextIsValid(context) \
> >    ((context) != NULL && \
> >     (IsA((context), AllocSetContext) || \
> >      IsA((context), SlabContext) || \
> >      IsA((context), GenerationContext)))
> >
> >It allows only "known" memory context methods, even though the memory context
> >mechanism enables to implement custom memory allocator by extensions.
> >Here is a node tag nobody used: T_MemoryContext.
> >It looks to me T_MemoryContext is a neutral naming for custom memory context,
> >and here is no reason why memory context functions prevents custom methods.
> >
>
> Good question. I don't think there's an explicit reason not to allow
> extensions to define custom memory contexts, and using T_MemoryContext
> seems like a possible solution. It's a bit weird though, because all the
> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> T_CustomMemoryContext would be a better choice, but that only works in
> master, of course.
>
From the standpoint of extension author, reuse of T_MemoryContext and
back patching the change on MemoryContextIsValid() makes us happy. :)
However, even if we add a new node-tag here, the custom memory-context
can leave to use fake node-tag a few years later. It's better than nothing.

> Also, it won't work if we need to add memory contexts to equalfuncs.c
> etc. but maybe won't need that - it's more a theoretical issue.
>
Right now, none of nodes/XXXfuncs.c support these class of nodes related to
MemoryContext. It shall not be a matter.

> >https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> >I recently implemented a custom memory context for shared memory allocation
> >with portable pointers. It shall be used for cache of pre-built gpu
> >binary code and
> >metadata cache of apache arrow files.
> >However, the assertion check above requires extension to set a fake node-tag
> >to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
> >feel a bit bad.
> >
>
> Interesting. Does that mean the hared memory contexts are part of the
> same hierarchy as "normal" contexts? That would be a bit confusing, I
> think.
>
If this shared memory context is a child of normal context, likely, it should be
reset or deleted as usual. However, if this shared memory context performs
as a parent of normal context, it makes a problem when different process tries
to delete this context, because its child (normal context) exists at the creator
process only. So, it needs to be used carefully.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

28 January 2020, 15:35:32

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
>> I noticed MemoryContextIsValid() called by various kinds of memory context
>> routines checks its node-tag as follows:
>> #define MemoryContextIsValid(context) \
>> ((context) != NULL && \
>> (IsA((context), AllocSetContext) || \
>> IsA((context), SlabContext) || \
>> IsA((context), GenerationContext)))

> Good question. I don't think there's an explicit reason not to allow
> extensions to define custom memory contexts, and using T_MemoryContext
> seems like a possible solution. It's a bit weird though, because all the
> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> T_CustomMemoryContext would be a better choice, but that only works in
> master, of course.

I think the original reason for having distinct node types was to let
individual mcxt functions verify that they were passed the right type
of node ... but I don't see any of them actually doing so, and the
context-methods API makes it a bit hard to credit that we need to.
So there's certainly a case for replacing all three node typecodes
with just MemoryContext.  On the other hand, it'd be ugly and it would
complicate debugging: you could not be totally sure which struct type
to cast a pointer-to-MemoryContext to when trying to inspect the
contents.  We have a roughly comparable situation with respect to
Value --- the node type codes we use with it, such as T_String, don't
have anything to do with the struct typedef name.  I've always found
that very confusing when debugging.  When gdb tells me that
"*(Node*) address" is T_String, the first thing I want to do is write
"p *(String*) address" and of course that doesn't work.

I don't actually believe that private context types in extensions is
a very likely use-case, so on the whole I'd just as soon leave this
alone.  If we did want to do something, I'd vote for one NodeTag
code T_MemoryContext and then a secondary ID field that's an enum
over specific memory context types.  But that doesn't really improve
matters for debugging extension contexts, because they still don't
have a way to add elements to the secondary enum.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Tomas Vondra

Date:

28 January 2020, 15:56:28

On Tue, Jan 28, 2020 at 11:32:49PM +0900, Kohei KaiGai wrote:
>2020年1月28日(火) 23:09 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
>>
>> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
>> >Hello,
>> >
>> >I noticed MemoryContextIsValid() called by various kinds of memory context
>> >routines checks its node-tag as follows:
>> >
>> >#define MemoryContextIsValid(context) \
>> >    ((context) != NULL && \
>> >     (IsA((context), AllocSetContext) || \
>> >      IsA((context), SlabContext) || \
>> >      IsA((context), GenerationContext)))
>> >
>> >It allows only "known" memory context methods, even though the memory context
>> >mechanism enables to implement custom memory allocator by extensions.
>> >Here is a node tag nobody used: T_MemoryContext.
>> >It looks to me T_MemoryContext is a neutral naming for custom memory context,
>> >and here is no reason why memory context functions prevents custom methods.
>> >
>>
>> Good question. I don't think there's an explicit reason not to allow
>> extensions to define custom memory contexts, and using T_MemoryContext
>> seems like a possible solution. It's a bit weird though, because all the
>> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
>> T_CustomMemoryContext would be a better choice, but that only works in
>> master, of course.
>>
>From the standpoint of extension author, reuse of T_MemoryContext and
>back patching the change on MemoryContextIsValid() makes us happy. :)
>However, even if we add a new node-tag here, the custom memory-context
>can leave to use fake node-tag a few years later. It's better than nothing.
>

Oh, right. I forgot we still need to backpatch this bit. But that seems
like a fairly small amount of code, so it should work.

I think we can't backpatch the addition of T_CustomMemoryContext anyway
as it essentially breaks ABI, as it changes the values assigned to T_
constants.

>> Also, it won't work if we need to add memory contexts to equalfuncs.c
>> etc. but maybe won't need that - it's more a theoretical issue.
>>
>Right now, none of nodes/XXXfuncs.c support these class of nodes related to
>MemoryContext. It shall not be a matter.
>

Yes. I did not really mean it as argument against the patch, it was
meant more like "This could be an issue, but it actually is not." Sorry
if that wasn't clear.

>> >https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
>> >I recently implemented a custom memory context for shared memory allocation
>> >with portable pointers. It shall be used for cache of pre-built gpu
>> >binary code and
>> >metadata cache of apache arrow files.
>> >However, the assertion check above requires extension to set a fake node-tag
>> >to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
>> >feel a bit bad.
>> >
>>
>> Interesting. Does that mean the hared memory contexts are part of the
>> same hierarchy as "normal" contexts? That would be a bit confusing, I
>> think.
>>
>If this shared memory context is a child of normal context, likely, it should be
>reset or deleted as usual. However, if this shared memory context performs
>as a parent of normal context, it makes a problem when different process tries
>to delete this context, because its child (normal context) exists at the creator
>process only. So, it needs to be used carefully.
>

Yeah, handling life cycle of a mix of those contexts may be quite tricky.

But my concern was a bit more general - is it a good idea to hide the
nature of the memory context behind the same API. If you call palloc()
shouldn't you really know whether you're allocating the stuff in regular
or shared memory context?

Maybe it makes perfect sense, but my initial impression is that those
seem rather different, so maybe we should keep them separate (in
separate hierarchies or something). But I admit I don't have much
experience with use cases that would require such shmem contexts.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

28 January 2020, 16:10:47

On Tue, Jan 28, 2020 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't actually believe that private context types in extensions is
> a very likely use-case, so on the whole I'd just as soon leave this
> alone.  If we did want to do something, I'd vote for one NodeTag
> code T_MemoryContext and then a secondary ID field that's an enum
> over specific memory context types.

I generally like this idea, but I'd like to propose that we instead
replace the NodeTag with a 4-byte magic number. I was studying how
feasible it would be to make memory contexts available in frontend
code, and it doesn't look all that bad, but one of the downsides is
that nodes/memnodes.h includes nodes/nodes.h, and I think it would not
be a good idea to make frontend code depend on nodes/nodes.h, which
seems like it's really a backend-only piece of infrastructure. Using a
magic number would allow us to avoid that, and it doesn't really cost
anything, because the memory context nodes really don't participate in
any of that infrastructure anyway.

Along with that, I think we could also change MemoryContextIsValid()
to just check the magic number and not validate the type field. I
think the spirit of the code is just to check that we've got some kind
of memory context rather than random garbage in memory, and checking
the magic number is good enough for that. It's possibly a little
better than what we have now, since a node tag is a small integer,
which might be more likely to occur at a random pointer address. At
any rate I think it's no worse.

Proposed patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

v1-0001-Teach-MemoryContext-infrastructure-not-to-depend-.patch

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

28 January 2020, 16:24:11

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 28, 2020 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't actually believe that private context types in extensions is
>> a very likely use-case, so on the whole I'd just as soon leave this
>> alone.  If we did want to do something, I'd vote for one NodeTag
>> code T_MemoryContext and then a secondary ID field that's an enum
>> over specific memory context types.

> I generally like this idea, but I'd like to propose that we instead
> replace the NodeTag with a 4-byte magic number.

Yeah, there's something to be said for that.  It's unlikely that it'd
ever make sense for us to have copy/equal/write/read/etc support for
memory context headers, so having them be part of the Node taxonomy
doesn't seem very necessary.

> Along with that, I think we could also change MemoryContextIsValid()
> to just check the magic number and not validate the type field.

Right, that's isomorphic to what I was imagining: there'd be just
one check not N.

> Proposed patch attached.

I strongly object to having the subtype field be just "char".
I want it to be declared "MemoryContextType", so that gdb will
still be telling me explicitly what struct type this really is.
I realize that you probably did that for alignment reasons, but
maybe we could shave the magic number down to 2 bytes, and then
rearrange the field order?  Or just not sweat so much about
wasting a longword here.  Having those bools up at the front
is pretty ugly anyway.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

28 January 2020, 17:18:15

On Tue, Jan 28, 2020 at 11:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I strongly object to having the subtype field be just "char".
> I want it to be declared "MemoryContextType", so that gdb will
> still be telling me explicitly what struct type this really is.
> I realize that you probably did that for alignment reasons, but
> maybe we could shave the magic number down to 2 bytes, and then
> rearrange the field order?  Or just not sweat so much about
> wasting a longword here.  Having those bools up at the front
> is pretty ugly anyway.

I kind of dislike having the magic number not be the first thing in
the struct on aesthetic grounds, and possibly on the grounds that
somebody might be examining the initial bytes manually to try to
figure out what they've got, and having the magic number there makes
it easier. Regarding space consumption, I guess this structure is
already over 64 bytes and not close to 128 bytes, so adding another 8
bytes probably isn't very meaningful, but I don't love it. One thing
that concerns me a bit is that if somebody adds their own type of
memory context, then MemoryContextType will have a value that is not
actually in the enum. If compilers are optimizing the code on the
assumption that this cannot occur, do we need to worry about undefined
behavior?

Actually, I have what I think is a better idea. I notice that the
"type" field is actually completely unused. We initialize it, and then
nothing in the code ever checks it or relies on it for any purpose.
So, we could have a bug in the code that initializes that field with
the wrong value, for a new context type or in general, and only a
developer with a debugger would ever notice. That's because the thing
that really matters is the 'methods' array. So what I propose is that
we nuke the type field from orbit. If a developer wants to figure out
what type of context they've got, they should print
context->methods[0]; gdb will tell you the function names stored
there, and then you'll know *for real* what type of context you've
got.

Here's a v2 that approaches the problem that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

v2-0001-Teach-MemoryContext-infrastructure-not-to-depend-.patch

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

28 January 2020, 18:08:55

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 28, 2020 at 11:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I strongly object to having the subtype field be just "char".
>> I want it to be declared "MemoryContextType", so that gdb will
>> still be telling me explicitly what struct type this really is.
>> I realize that you probably did that for alignment reasons, but
>> maybe we could shave the magic number down to 2 bytes, and then
>> rearrange the field order?  Or just not sweat so much about
>> wasting a longword here.  Having those bools up at the front
>> is pretty ugly anyway.

> I kind of dislike having the magic number not be the first thing in
> the struct on aesthetic grounds,

Huh?  I didn't propose that.  I was thinking either

    uint16      magic;
    bool        isReset;
    bool        allowInCritSection;
    enum        type;
    ... 64-bit fields follow ...

or

    uint32      magic;
    enum        type;
    bool        isReset;
    bool        allowInCritSection;
    ... 64-bit fields follow ...

where the latter wastes space unless the compiler chooses to fit the
enum into 16 bits, but it's not really our fault if it doesn't.  Besides,
what's the reason to think we'll never add any more bools here?  I don't
think we need to be that excited about the padding.

> So, we could have a bug in the code that initializes that field with
> the wrong value, for a new context type or in general, and only a
> developer with a debugger would ever notice.

Right, but that is a pretty important use-case.

> That's because the thing
> that really matters is the 'methods' array. So what I propose is that
> we nuke the type field from orbit. If a developer wants to figure out
> what type of context they've got, they should print
> context->methods[0]; gdb will tell you the function names stored
> there, and then you'll know *for real* what type of context you've
> got.

No.  No.  Just NO.  (A) that's overly complex for developers to use,
and (B) you have far too much faith in the debugger producing something
useful.  (My experience is that it'll fail to render function pointers
legibly on an awful lot of platforms.)  Plus, you won't actually save
any space by removing both of those fields.

If we were going to conclude that we don't really need a magic number,
I'd opt for replacing the NodeTag with an enum MemoryContextType field
that's decoupled from NodeTag.  But I don't feel tremendously happy
about not having a magic number.  That'd make it noticeably harder
to recognize cases where you're referencing an invalid context pointer.

In the end, trying to shave a couple of bytes from context headers
seems pretty penny-wise and pound-foolish.  There are lots of other
structs with significantly higher usage where we've not stopped to
worry about alignment padding, so why here?  Personally I'd just
put the bools back at the end where they used to be.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

28 January 2020, 18:57:27

On Tue, Jan 28, 2020 at 1:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That's because the thing
> > that really matters is the 'methods' array. So what I propose is that
> > we nuke the type field from orbit. If a developer wants to figure out
> > what type of context they've got, they should print
> > context->methods[0]; gdb will tell you the function names stored
> > there, and then you'll know *for real* what type of context you've
> > got.
>
> No.  No.  Just NO.  (A) that's overly complex for developers to use,
> and (B) you have far too much faith in the debugger producing something
> useful.  (My experience is that it'll fail to render function pointers
> legibly on an awful lot of platforms.)  Plus, you won't actually save
> any space by removing both of those fields.

I half-expected this reaction, but I think it's unjustified. Two
sources of truth are not better than one, and I don't think that any
other place where we use a vtable-type approach includes a redundant
type field just for decoration. Can you think of a counterexample?

After scrounging around the source tree a bit, the most direct
parallel I can find is probably TupleTableSlot, which contains a
pointer to a TupleTableSlotOps, but not a separate field identifying
the slot type. I don't remember you or anybody objecting that
TupleTableSlot should contain both "const TupleTableSlotOps *const
tts_ops" and also "enum TupleTableSlotType", and I don't think that
such a suggestion would have been looked upon very favorably, not only
because it would have made the struct bigger and served no necessary
purpose, but also because having a centralized list of all
TupleTableSlot types flies in the face of the essential goal of the
table AM interface, which is to allow adding new table type (and a
slot type for each) without having to modify core code. That exact
consideration is also relevant here: KaiGai wants to be able to add
his own memory context type in third-party code without having to
modify core code. I've wanted to do that in the past, too. Having to
list all the context types in an enum means that you really can't do
that, which sucks, unless you're willing to lie about the context type
and hope that nobody adds code that cares about it. Is there an
alternate solution that you can propose that does not prevent that?

You might be entirely correct that there are some debuggers that can't
print function pointers correctly. I have not run across one, if at
all, in a really long time, but I also work mostly on MacOS and Linux
these days, and those are pretty mainstream platforms where such
problems are less likely. However, I suspect that there are very few
developers who do the bulk of their work on obscure platforms with
poorly-functioning debuggers. The only time it's likely to come up is
if a buggy commit makes things crash on some random buildfarm critter
and we need to debug from a core dump or whatever. But, if that does
happen, we're not dead. The only possible values of the "methods"
pointer -- if only core code is at issue -- are &AllocSetMethods,
&SlabMethods, and &GenerationMethods, so somebody can just print out
those values and compare it to the pointer they find. That is a lot
less convenient than being able to just print context->methods[0] and
see everything, but if it only comes up when debugging irreproducible
crashes on obscure platforms, it seems OK to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Thomas Munro

Date:

28 January 2020, 19:27:06

On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> I noticed MemoryContextIsValid() called by various kinds of memory context
> routines checks its node-tag as follows:
>
> #define MemoryContextIsValid(context) \
>     ((context) != NULL && \
>      (IsA((context), AllocSetContext) || \
>       IsA((context), SlabContext) || \
>       IsA((context), GenerationContext)))
>
> It allows only "known" memory context methods, even though the memory context
> mechanism enables to implement custom memory allocator by extensions.
> Here is a node tag nobody used: T_MemoryContext.
> It looks to me T_MemoryContext is a neutral naming for custom memory context,
> and here is no reason why memory context functions prevents custom methods.
>
>
> https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> I recently implemented a custom memory context for shared memory allocation
> with portable pointers. It shall be used for cache of pre-built gpu
> binary code and
> metadata cache of apache arrow files.
> However, the assertion check above requires extension to set a fake node-tag
> to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
> feel a bit bad.

FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
exactly the same problem while making nearly exactly the same kind of
thing (namely, a MemoryContext backed by space in the main shm area,
in this case reusing the dsa.c allocator).

Re: Is custom MemoryContext prohibited?

From

Kohei KaiGai

Date:

29 January 2020, 00:59:57

2020年1月29日(水) 0:56 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
>
> On Tue, Jan 28, 2020 at 11:32:49PM +0900, Kohei KaiGai wrote:
> >2020年1月28日(火) 23:09 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
> >>
> >> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
> >> >Hello,
> >> >
> >> >I noticed MemoryContextIsValid() called by various kinds of memory context
> >> >routines checks its node-tag as follows:
> >> >
> >> >#define MemoryContextIsValid(context) \
> >> >    ((context) != NULL && \
> >> >     (IsA((context), AllocSetContext) || \
> >> >      IsA((context), SlabContext) || \
> >> >      IsA((context), GenerationContext)))
> >> >
> >> >It allows only "known" memory context methods, even though the memory context
> >> >mechanism enables to implement custom memory allocator by extensions.
> >> >Here is a node tag nobody used: T_MemoryContext.
> >> >It looks to me T_MemoryContext is a neutral naming for custom memory context,
> >> >and here is no reason why memory context functions prevents custom methods.
> >> >
> >>
> >> Good question. I don't think there's an explicit reason not to allow
> >> extensions to define custom memory contexts, and using T_MemoryContext
> >> seems like a possible solution. It's a bit weird though, because all the
> >> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> >> T_CustomMemoryContext would be a better choice, but that only works in
> >> master, of course.
> >>
> >From the standpoint of extension author, reuse of T_MemoryContext and
> >back patching the change on MemoryContextIsValid() makes us happy. :)
> >However, even if we add a new node-tag here, the custom memory-context
> >can leave to use fake node-tag a few years later. It's better than nothing.
> >
>
> Oh, right. I forgot we still need to backpatch this bit. But that seems
> like a fairly small amount of code, so it should work.
>
> I think we can't backpatch the addition of T_CustomMemoryContext anyway
> as it essentially breaks ABI, as it changes the values assigned to T_
> constants.
>
> >> Also, it won't work if we need to add memory contexts to equalfuncs.c
> >> etc. but maybe won't need that - it's more a theoretical issue.
> >>
> >Right now, none of nodes/XXXfuncs.c support these class of nodes related to
> >MemoryContext. It shall not be a matter.
> >
>
> Yes. I did not really mean it as argument against the patch, it was
> meant more like "This could be an issue, but it actually is not." Sorry
> if that wasn't clear.
>
> >> >https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> >> >I recently implemented a custom memory context for shared memory allocation
> >> >with portable pointers. It shall be used for cache of pre-built gpu
> >> >binary code and
> >> >metadata cache of apache arrow files.
> >> >However, the assertion check above requires extension to set a fake node-tag
> >> >to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
> >> >feel a bit bad.
> >> >
> >>
> >> Interesting. Does that mean the hared memory contexts are part of the
> >> same hierarchy as "normal" contexts? That would be a bit confusing, I
> >> think.
> >>
> >If this shared memory context is a child of normal context, likely, it should be
> >reset or deleted as usual. However, if this shared memory context performs
> >as a parent of normal context, it makes a problem when different process tries
> >to delete this context, because its child (normal context) exists at the creator
> >process only. So, it needs to be used carefully.
> >
>
> Yeah, handling life cycle of a mix of those contexts may be quite tricky.
>
> But my concern was a bit more general - is it a good idea to hide the
> nature of the memory context behind the same API. If you call palloc()
> shouldn't you really know whether you're allocating the stuff in regular
> or shared memory context?
>
> Maybe it makes perfect sense, but my initial impression is that those
> seem rather different, so maybe we should keep them separate (in
> separate hierarchies or something). But I admit I don't have much
> experience with use cases that would require such shmem contexts.
>
Yeah, as you mentioned, we have no way to distinguish whether a particular
memory chunk is private memory or shared memory, right now.
It is the responsibility of software developers, and I assume the shared-
memory chunks are applied on a limited usages where it has a certain reason
why it should be shared.
On the other hand, it is the same situation even if private memory.
We should pay attention to the memory context to allocate a memory chunk from.
For example, state object to be valid during query execution must be allocated
from estate->es_query_cxt. If someone allocates it from CurrentMemoryContext,
then implicitly released, we shall consider it is a bug.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Re: Is custom MemoryContext prohibited?

From

Kohei KaiGai

Date:

29 January 2020, 01:23:04

2020年1月29日(水) 2:18 Robert Haas <robertmhaas@gmail.com>:
>
> On Tue, Jan 28, 2020 at 11:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I strongly object to having the subtype field be just "char".
> > I want it to be declared "MemoryContextType", so that gdb will
> > still be telling me explicitly what struct type this really is.
> > I realize that you probably did that for alignment reasons, but
> > maybe we could shave the magic number down to 2 bytes, and then
> > rearrange the field order?  Or just not sweat so much about
> > wasting a longword here.  Having those bools up at the front
> > is pretty ugly anyway.
>
> I kind of dislike having the magic number not be the first thing in
> the struct on aesthetic grounds, and possibly on the grounds that
> somebody might be examining the initial bytes manually to try to
> figure out what they've got, and having the magic number there makes
> it easier. Regarding space consumption, I guess this structure is
> already over 64 bytes and not close to 128 bytes, so adding another 8
> bytes probably isn't very meaningful, but I don't love it. One thing
> that concerns me a bit is that if somebody adds their own type of
> memory context, then MemoryContextType will have a value that is not
> actually in the enum. If compilers are optimizing the code on the
> assumption that this cannot occur, do we need to worry about undefined
> behavior?
>
> Actually, I have what I think is a better idea. I notice that the
> "type" field is actually completely unused. We initialize it, and then
> nothing in the code ever checks it or relies on it for any purpose.
> So, we could have a bug in the code that initializes that field with
> the wrong value, for a new context type or in general, and only a
> developer with a debugger would ever notice. That's because the thing
> that really matters is the 'methods' array. So what I propose is that
> we nuke the type field from orbit. If a developer wants to figure out
> what type of context they've got, they should print
> context->methods[0]; gdb will tell you the function names stored
> there, and then you'll know *for real* what type of context you've
> got.
>
> Here's a v2 that approaches the problem that way.
>
How about to have "const char *name" in MemoryContextMethods?
It is more human readable for debugging, than raw function pointers.
We already have similar field to identify the methods at CustomScanMethods.
(it is also used for EXPLAIN, not only debugging...)

I love the idea to identify the memory-context type with single identifiers
rather than two. If we would have sub-field Id and memory-context methods
separately, it probably needs Assert() to check the consistency of
them, isn't it?

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Re: Is custom MemoryContext prohibited?

From

Kohei KaiGai

Date:

29 January 2020, 01:49:32

2020年1月29日(水) 4:27 Thomas Munro <thomas.munro@gmail.com>:
>
> On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > I noticed MemoryContextIsValid() called by various kinds of memory context
> > routines checks its node-tag as follows:
> >
> > #define MemoryContextIsValid(context) \
> >     ((context) != NULL && \
> >      (IsA((context), AllocSetContext) || \
> >       IsA((context), SlabContext) || \
> >       IsA((context), GenerationContext)))
> >
> > It allows only "known" memory context methods, even though the memory context
> > mechanism enables to implement custom memory allocator by extensions.
> > Here is a node tag nobody used: T_MemoryContext.
> > It looks to me T_MemoryContext is a neutral naming for custom memory context,
> > and here is no reason why memory context functions prevents custom methods.
> >
> >
> > https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> > I recently implemented a custom memory context for shared memory allocation
> > with portable pointers. It shall be used for cache of pre-built gpu
> > binary code and
> > metadata cache of apache arrow files.
> > However, the assertion check above requires extension to set a fake node-tag
> > to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
> > feel a bit bad.
>
> FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> exactly the same problem while making nearly exactly the same kind of
> thing (namely, a MemoryContext backed by space in the main shm area,
> in this case reusing the dsa.c allocator).
>
Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
months ago, however, missed the thread right now.

The main point of the differences from this approach is portability of pointers
to shared memory chunks. (If I understand correctly)
PG-Strom preserves logical address space, but no physical pages, on startup
time, then maps shared memory segment on the fixed address on demand.
So, pointers are portable to all the backend processes, thus, suitable to build
tree structure on shared memory also.

This article below introduces how our "shared memory context" works.
It is originally written in Japanese, and Google translate may
generate unnatural
English. However, its figures probably explain how it works.

https://translate.google.co.jp/translate?hl=ja&sl=auto&tl=en&u=http%3A%2F%2Fkaigai.hatenablog.com%2Fentry%2F2016%2F06%2F19%2F095127

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Re: Is custom MemoryContext prohibited?

From

Thomas Munro

Date:

29 January 2020, 02:06:01

On Wed, Jan 29, 2020 at 2:49 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> 2020年1月29日(水) 4:27 Thomas Munro <thomas.munro@gmail.com>:
> > On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> > exactly the same problem while making nearly exactly the same kind of
> > thing (namely, a MemoryContext backed by space in the main shm area,
> > in this case reusing the dsa.c allocator).
> >
> Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
> months ago, however, missed the thread right now.
>
> The main point of the differences from this approach is portability of pointers
> to shared memory chunks. (If I understand correctly)
> PG-Strom preserves logical address space, but no physical pages, on startup
> time, then maps shared memory segment on the fixed address on demand.
> So, pointers are portable to all the backend processes, thus, suitable to build
> tree structure on shared memory also.

Very interesting.  PostgreSQL's DSM segments could potentially have
been implemented that way (whereas today they are not mapped with
MAP_FIXED), but I suppose people were worried about portability
problems and ASLR.  The Windows port struggles with that stuff.

Actually the WIP code in that patch reserves a chunk of space up front
in the postmaster, and then puts a DSA allocator inside it.  Normally,
DSA allocators create/destroy new DSM segments on demand and deal with
the address portability stuff through a lot of extra work (this
probably makes Parallel Hash Join slightly slower than it should be),
but as a special case a DSA allocator can be created in preexisting
memory and then not allowed to grow.  In exchange for accepting a
fixed space, you get normal shareable pointers.  This means that you
can use the resulting weird MemoryContext for stuff like building
query plans that can be used by other processes, but when you run out
of memory, allocations begin to fail.  That WIP code is experimenting
with caches that can tolerate running out of memory (or at least that
is the intention), so a fixed sized space is OK for that.

> This article below introduces how our "shared memory context" works.
> It is originally written in Japanese, and Google translate may
> generate unnatural
> English. However, its figures probably explain how it works.
>
https://translate.google.co.jp/translate?hl=ja&sl=auto&tl=en&u=http%3A%2F%2Fkaigai.hatenablog.com%2Fentry%2F2016%2F06%2F19%2F095127

Thanks!

Re: Is custom MemoryContext prohibited?

From

Peter Geoghegan

Date:

29 January 2020, 02:08:26

On Tue, Jan 28, 2020 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> No.  No.  Just NO.  (A) that's overly complex for developers to use,
> and (B) you have far too much faith in the debugger producing something
> useful.  (My experience is that it'll fail to render function pointers
> legibly on an awful lot of platforms.)  Plus, you won't actually save
> any space by removing both of those fields.

FWIW, I noticed that GDB becomes much better at this when you add "set
print symbol on" to your .gdbinit dot file about a year ago. In theory
you shouldn't need to do that to print the symbol that a function
pointer points to, I think. At least that's what the documentation
says. But in practice this seems to help a lot.

I don't recall figuring out a reason for this. Could have been due to
GDB being fussier about the declared type of a pointer than it needs
to be, or something along those lines.

-- 
Peter Geoghegan

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

29 January 2020, 02:53:58

Peter Geoghegan <pg@bowt.ie> writes:
> On Tue, Jan 28, 2020 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> No.  No.  Just NO.  (A) that's overly complex for developers to use,
>> and (B) you have far too much faith in the debugger producing something
>> useful.  (My experience is that it'll fail to render function pointers
>> legibly on an awful lot of platforms.)  Plus, you won't actually save
>> any space by removing both of those fields.

> FWIW, I noticed that GDB becomes much better at this when you add "set
> print symbol on" to your .gdbinit dot file about a year ago.

Interesting.  But I bet there are platform and version dependencies
in the mix, too.  I'd still not wish to rely on this for debugging.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Peter Geoghegan

Date:

29 January 2020, 03:16:16

On Tue, Jan 28, 2020 at 6:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > FWIW, I noticed that GDB becomes much better at this when you add "set
> > print symbol on" to your .gdbinit dot file about a year ago.
>
> Interesting.  But I bet there are platform and version dependencies
> in the mix, too.  I'd still not wish to rely on this for debugging.

I agree that there are a lot of moving pieces here. I wouldn't like to
have to rely on this working myself.

-- 
Peter Geoghegan

Re: Is custom MemoryContext prohibited?

From

Kohei KaiGai

Date:

29 January 2020, 04:16:30

2020年1月29日(水) 11:06 Thomas Munro <thomas.munro@gmail.com>:
>
> On Wed, Jan 29, 2020 at 2:49 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > 2020年1月29日(水) 4:27 Thomas Munro <thomas.munro@gmail.com>:
> > > On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> > > exactly the same problem while making nearly exactly the same kind of
> > > thing (namely, a MemoryContext backed by space in the main shm area,
> > > in this case reusing the dsa.c allocator).
> > >
> > Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
> > months ago, however, missed the thread right now.
> >
> > The main point of the differences from this approach is portability of pointers
> > to shared memory chunks. (If I understand correctly)
> > PG-Strom preserves logical address space, but no physical pages, on startup
> > time, then maps shared memory segment on the fixed address on demand.
> > So, pointers are portable to all the backend processes, thus, suitable to build
> > tree structure on shared memory also.
>
> Very interesting.  PostgreSQL's DSM segments could potentially have
> been implemented that way (whereas today they are not mapped with
> MAP_FIXED), but I suppose people were worried about portability
> problems and ASLR.  The Windows port struggles with that stuff.
>
Yes. I'm not certain whether Windows can support equivalen behavior
to mmap(..., PROT_NONE) and SIGSEGV/SIGBUS handling.
It is also a reason why PG-Strom (that is only for Linux) wants to have
own shared memory management logic, at least, right now.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Re: Is custom MemoryContext prohibited?

From

Andres Freund

Date:

04 February 2020, 00:26:46

Hi,

On 2020-01-28 11:10:47 -0500, Robert Haas wrote:
> I generally like this idea, but I'd like to propose that we instead
> replace the NodeTag with a 4-byte magic number. I was studying how
> feasible it would be to make memory contexts available in frontend
> code, and it doesn't look all that bad, but one of the downsides is
> that nodes/memnodes.h includes nodes/nodes.h, and I think it would not
> be a good idea to make frontend code depend on nodes/nodes.h, which
> seems like it's really a backend-only piece of infrastructure. Using a
> magic number would allow us to avoid that, and it doesn't really cost
> anything, because the memory context nodes really don't participate in
> any of that infrastructure anyway.

Hm. I kinda like the idea of still having one NodeTag identifying memory
contexts, and then some additional field identifying the actual
type. Being able to continue to rely on IsA() etc imo is nice.  I think
nodes.h itself only would be a problem for frontend code because we put
a lot of other stuff too. We should just separate the actually generic
stuff out. I think it's going to be like 2 seconds once we have memory
contexts until we're e.g. going to want to also have pg_list.h - which
is harder without knowing the tags.

It seems like a good idea to still have an additional identifier for
each node type, for some cross checking. How about just frobbing the
pointer to the MemoryContextMethod slightly, and storing that in an
additional field? That'd be something fairly unlikely to ever be a false
positive, and it doesn't require dereferencing any additional memory.

Greetings,

Andres Freund

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

05 February 2020, 14:19:04

On Tue, Jan 28, 2020 at 9:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Interesting.  But I bet there are platform and version dependencies
> in the mix, too.  I'd still not wish to rely on this for debugging.

Hmm.  What if we put a "const char *name" in the methods array? Then
even if you couldn't print out the function pointers, you would at
least see the name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

05 February 2020, 14:28:08

On Mon, Feb 3, 2020 at 7:26 PM Andres Freund <andres@anarazel.de> wrote:
> Hm. I kinda like the idea of still having one NodeTag identifying memory
> contexts, and then some additional field identifying the actual
> type. Being able to continue to rely on IsA() etc imo is nice.  I think
> nodes.h itself only would be a problem for frontend code because we put
> a lot of other stuff too. We should just separate the actually generic
> stuff out. I think it's going to be like 2 seconds once we have memory
> contexts until we're e.g. going to want to also have pg_list.h - which
> is harder without knowing the tags.

The problem with IsA() is that it assumes that you've got all the node
tags that can ever exist in one big enum. I don't see how to make that
work once you extend the system to work with more than one program. I
think it will be really confusing if frontend code starts reusing
random backend data structures. Like, fundamental things like List,
sure, that should be exposed. But if people start creating Plan or FDW
objects in the frontend, it's just going to be chaos. And I don't
think we want new objects that people may add for frontend code to be
visible to backend code, either.

> It seems like a good idea to still have an additional identifier for
> each node type, for some cross checking. How about just frobbing the
> pointer to the MemoryContextMethod slightly, and storing that in an
> additional field? That'd be something fairly unlikely to ever be a false
> positive, and it doesn't require dereferencing any additional memory.

That would be fine as an internal sanity check, but if Tom is unhappy
with the idea of having to try to make sense of a function pointer,
he's probably going to be even less happy about trying to make sense
of a frobbed pointer. And I would actually agree with him on that
point.

I think we're all pursuing slightly different goals here. KaiGai's
main goal is to make it possible for third-party code to add new kinds
of memory contexts. My main goal is to make memory contexts not depend
on backend-only infrastructure. Tom is concerned about debuggability.
Your concern here is about sanity checking. There's some overlap
between those goals but the absolute best thing for any given one of
them might be really bad for one of the other ones; hopefully we can
find some compromise that gets everybody the things they care about
most.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

05 February 2020, 15:09:01

Robert Haas <robertmhaas@gmail.com> writes:
> Hmm.  What if we put a "const char *name" in the methods array? Then
> even if you couldn't print out the function pointers, you would at
> least see the name.

Yeah, that idea had occurred to me too.  It'd definitely be better than
relying on the ability to interpret function pointers, and there might
be other uses for it besides manual debugging (eg if we had an outfuncs
function for MemoryContext, it could print that).  So I'd be a bit in
favor of adding that independently of this discussion.  I still think
that it'd be inconvenient for debugging, though, compared to having
an enum field right in the context.  You'll have to do an extra step to
discover the context's type, and if you jump to the wrong conclusion
and do, say,
    p *(AllocSetContext *) ptr_value
when it's really some other context type, there won't be anything
as obvious as "type = T_GenerationContext" in what is printed to
tell you you were wrong.  So I really want to also have an enum
field of some sort, and it does not seem to me that there's any
compelling reason not to have one.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

05 February 2020, 15:13:44

On Wed, Feb 5, 2020 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I really want to also have an enum
> field of some sort, and it does not seem to me that there's any
> compelling reason not to have one.

I mean, then it's not extensible, right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

05 February 2020, 15:20:05

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Feb 3, 2020 at 7:26 PM Andres Freund <andres@anarazel.de> wrote:
>> It seems like a good idea to still have an additional identifier for
>> each node type, for some cross checking. How about just frobbing the
>> pointer to the MemoryContextMethod slightly, and storing that in an
>> additional field? That'd be something fairly unlikely to ever be a false
>> positive, and it doesn't require dereferencing any additional memory.

> That would be fine as an internal sanity check, but if Tom is unhappy
> with the idea of having to try to make sense of a function pointer,
> he's probably going to be even less happy about trying to make sense
> of a frobbed pointer. And I would actually agree with him on that
> point.

Yeah, it seems a bit overcomplicated for what it accomplishes.

> I think we're all pursuing slightly different goals here. KaiGai's
> main goal is to make it possible for third-party code to add new kinds
> of memory contexts. My main goal is to make memory contexts not depend
> on backend-only infrastructure. Tom is concerned about debuggability.
> Your concern here is about sanity checking. There's some overlap
> between those goals but the absolute best thing for any given one of
> them might be really bad for one of the other ones; hopefully we can
> find some compromise that gets everybody the things they care about
> most.

Good summary.  I think that the combination of a magic number to identify
"this is a memory context struct" and an enum to identify the specific
type of context should meet all these goals moderately well:

* Third-party context types would have to force the compiler to take
context-type values that weren't among the known enum values ---
although they could ask us to reserve a value by adding an otherwise-
unreferenced-by-core-code enum entry, and I don't really see why
we shouldn't accept such requests.

* Frontend and backend would have to share the enum, but the list
is short enough that that shouldn't be a killer maintenance problem.
(Also, the enum field would be pretty much write-only from the
code's standpoint, so even if two programs were out of sync on it,
there would be at worst a debugging hazard.)

* The enum does what I want for debuggability, especially if we
back-stop it with a name string in the methods struct as you suggested.

* The magic value does what Andres wants for sanity checking.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

05 February 2020, 15:40:59

On Wed, Feb 5, 2020 at 10:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Good summary.  I think that the combination of a magic number to identify
> "this is a memory context struct" and an enum to identify the specific
> type of context should meet all these goals moderately well:
>
> * Third-party context types would have to force the compiler to take
> context-type values that weren't among the known enum values ---
> although they could ask us to reserve a value by adding an otherwise-
> unreferenced-by-core-code enum entry, and I don't really see why
> we shouldn't accept such requests.
>
> * Frontend and backend would have to share the enum, but the list
> is short enough that that shouldn't be a killer maintenance problem.
> (Also, the enum field would be pretty much write-only from the
> code's standpoint, so even if two programs were out of sync on it,
> there would be at worst a debugging hazard.)
>
> * The enum does what I want for debuggability, especially if we
> back-stop it with a name string in the methods struct as you suggested.
>
> * The magic value does what Andres wants for sanity checking.

I'm pretty unimpressed with the enum proposal - I think it's pretty
nasty for an extension author to have to make up a value that's not in
the enum. One, how are they supposed to know that they should do that?
Two, how are they supposed to know that the code doesn't actually
depend on that enum value for anything important? And three, how do
they know that the compiler isn't going to hose them by assuming that
isn't a can't-happen scenario?

I mean, I'd rather get a patch committed here than not, but I have a
hard time understanding why this is a good way to go.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

05 February 2020, 15:41:28

On Wed, Feb 5, 2020 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> an enum field right in the context.  You'll have to do an extra step to
> discover the context's type, and if you jump to the wrong conclusion
> and do, say,
>         p *(AllocSetContext *) ptr_value
> when it's really some other context type, there won't be anything
> as obvious as "type = T_GenerationContext" in what is printed to
> tell you you were wrong.

Doesn't the proposed magic number address this concern?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Chapman Flack

Date:

05 February 2020, 15:52:25

On 2/5/20 10:20 AM, Tom Lane wrote:

> * Third-party context types would have to force the compiler to take
> context-type values that weren't among the known enum values ---

Doesn't that seem like a long run for a short slide? An extension
author gets expected to do something awkward-bordering-on-smelly
so that debugging can rely on an enum saying "this is a Foo" rather
than a string saying "this is a Foo"?

Granted, it's possible the extension-authoring situation is rare,
and debugging often happens under time pressure and dire stakes,
so perhaps that would be the right balance for this case. I have
certainly seen emails from Tom in this space with the analysis of
some reported bug completed preternaturally fast, so if he judges
that losing the enum would make that harder, that's something.

Regards,
-Chap

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

05 February 2020, 15:56:42

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 5, 2020 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> an enum field right in the context.  You'll have to do an extra step to
>> discover the context's type, and if you jump to the wrong conclusion
>> and do, say,
>>     p *(AllocSetContext *) ptr_value
>> when it's really some other context type, there won't be anything
>> as obvious as "type = T_GenerationContext" in what is printed to
>> tell you you were wrong.

> Doesn't the proposed magic number address this concern?

No, because (a) it will be a random magic number that nobody will
remember, and gdb won't print in any helpful form; (b) at least
as I understood the proposal, there'd be just one magic number for
all types of memory context.

Another issue with relying on only a magic number is that if you
get confused and do "p *(AllocSetContext *) ptr_value" on something
that doesn't point at any sort of memory context at all, there will
not be *anything* except confusing field values to help you realize
that.  One of the great advantages of the Node system, IME, is that
when you try to print out a Node-subtype struct, the first field
in what is printed is either the Node type you were expecting, or
some recognizable other Node code, or obvious garbage.  If we don't
have an enum field in MemoryContexts then there's not going to be
any similarly-obvious clue that what you're looking at isn't a memory
context in the first place.  I'm okay with the enum not being the
first field, but much less okay with not having one at all.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

05 February 2020, 15:59:32

Robert Haas <robertmhaas@gmail.com> writes:
> I'm pretty unimpressed with the enum proposal - I think it's pretty
> nasty for an extension author to have to make up a value that's not in
> the enum. One, how are they supposed to know that they should do that?
> Two, how are they supposed to know that the code doesn't actually
> depend on that enum value for anything important? And three, how do
> they know that the compiler isn't going to hose them by assuming that
> isn't a can't-happen scenario?

Well, as I mentioned, the enum field will be pretty much write-only from
the code's standpoint, so that last point is not as killer as you might
think.  The rest of this is just documentation, and any of the proposals
on the table require documentation if you expect people to be able to
write extensions to use them.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

05 February 2020, 16:03:55

Chapman Flack <chap@anastigmatix.net> writes:
> On 2/5/20 10:20 AM, Tom Lane wrote:
>> * Third-party context types would have to force the compiler to take
>> context-type values that weren't among the known enum values ---

> Doesn't that seem like a long run for a short slide?

Well, one thing we could do is assign an "other" or "custom" code,
and people who were just doing one-off things could use that.
If they were going to publish their code, we could encourage them
to ask for a publicly-known enum entry.  We have precedent for
that approach, eg in pg_statistic stakind codes.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

05 February 2020, 16:48:05

On Wed, Feb 5, 2020 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Doesn't the proposed magic number address this concern?
>
> No, because (a) it will be a random magic number that nobody will
> remember, and gdb won't print in any helpful form; (b) at least
> as I understood the proposal, there'd be just one magic number for
> all types of memory context.

I don't disagree with the factual statements that you are making but I
don't understand why any of them represent real problems.

- It's true that magic numbers are generally not chosen for easy
memorization, but I think that most experienced hackers don't have
much trouble looking them up with 'git grep' on those (generally rare)
occasions when they are needed.

- It's true that gdb's default format is decimal and you might want
hex, but it has a 'printf' command that can be used to do that, which
I at least have found to be pretty darn convenient for this sort of
thing.

- And it's true that I was proposing - with your agreement, or so I
had understood - one magic number for all context types, but that was
specifically so you could tell whether you had a memory context or
some other thing. Once you know it's really a memory context, you
could print cxt->methods->name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Tom Lane

Date:

05 February 2020, 17:15:43

Robert Haas <robertmhaas@gmail.com> writes:
> I don't disagree with the factual statements that you are making but I
> don't understand why any of them represent real problems.

The core problem that I'm unhappy about is that what you want to do
adds cognitive burden (including an increased risk of mistakes) and
extra debugging commands in common cases around inspecting memory
contexts in a debugger.  Sure, it's not a fatal problem, but it's
still a burden.  I judge that putting an enum into the struct would
greatly reduce that burden at very small cost.

The point of the magic number, AIUI, is so that we can still have an
equivalent of Assert(IsA(MemoryContext)) in the code in appropriate
places.  That need doesn't require readability in a debugger, which is
why I'm okay with it being a random magic number.  But there's still
a need for debugger-friendliness, and a constant string that you need
to use extra commands to see doesn't solve that end of the problem.
IMO anyway.

            regards, tom lane

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

05 February 2020, 18:09:48

On Wed, Feb 5, 2020 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The core problem that I'm unhappy about is that what you want to do
> adds cognitive burden (including an increased risk of mistakes) and
> extra debugging commands in common cases around inspecting memory
> contexts in a debugger.  Sure, it's not a fatal problem, but it's
> still a burden.  I judge that putting an enum into the struct would
> greatly reduce that burden at very small cost.

I respect that, but I disagree. I think the annoyance and strangeness
factor of the enum is pretty high for third-party code that wants to
use this, because deliberately concocting an out-of-range value for an
enum is really odd. And I think the gain in debuggability is pretty
low, because even if the enum seems to have the expected value, you
still don't really know that things are OK unless you check the magic
number and the methods field, too. On the other hand, I don't inspect
memory contexts in a debugger all that often, and it sounds like you
do, or you presumably wouldn't feel so strongly about this.

We might have to see if anybody else has an opinion. I'd rather do it
your way than no way, but I feel like it's such a strange design that
I'd be afraid to commit it if anyone other than you had suggested it,
for fear of having you demand an immediate revert and, maybe, the
removal of some of my less important limbs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Kyotaro Horiguchi

Date:

06 February 2020, 02:46:38

Hello.

At Wed, 5 Feb 2020 13:09:48 -0500, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Wed, Feb 5, 2020 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > The core problem that I'm unhappy about is that what you want to do
> > adds cognitive burden (including an increased risk of mistakes) and
> > extra debugging commands in common cases around inspecting memory
> > contexts in a debugger.  Sure, it's not a fatal problem, but it's
> > still a burden.  I judge that putting an enum into the struct would
> > greatly reduce that burden at very small cost.
> 
> I respect that, but I disagree. I think the annoyance and strangeness
> factor of the enum is pretty high for third-party code that wants to
> use this, because deliberately concocting an out-of-range value for an
> enum is really odd. And I think the gain in debuggability is pretty
> low, because even if the enum seems to have the expected value, you
> still don't really know that things are OK unless you check the magic
> number and the methods field, too. On the other hand, I don't inspect
> memory contexts in a debugger all that often, and it sounds like you
> do, or you presumably wouldn't feel so strongly about this.

I agree that "deliberately concocting an out-of-(enum-)range value" is
odd. Regardless of the context type is in (or out-of?) enum or not,
there's no arbitrator of type numbers for custom allocators. If any,
it would be like that for LWLock tranches, but differently from
tranches, an allocator will fill the method field with the *correct*
values and works correctly even having a bogus type number. That is,
such an arbitration system or such type ids are mere a useless burden
to custom allocators, and such concocted (or pseudo-random) type
numbers don't contribute debuggability at all since it cannot be
distinguished from really-bogus numbers on its face.

Since we rarely face custom allocators, I think it's enough that we
have enum types for in-core allocators and one "CUSTOM".  For the
"CUSTOM" allocators, we need to look into cxt->methods->name if we
need to identify it but I don't think it's too bothering as far as we
need to do that only for the need.

> We might have to see if anybody else has an opinion. I'd rather do it
> your way than no way, but I feel like it's such a strange design that
> I'd be afraid to commit it if anyone other than you had suggested it,
> for fear of having you demand an immediate revert and, maybe, the
> removal of some of my less important limbs.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Is custom MemoryContext prohibited?

From

Andres Freund

Date:

06 February 2020, 03:09:08

Hi,

On 2020-02-05 09:28:08 -0500, Robert Haas wrote:
> On Mon, Feb 3, 2020 at 7:26 PM Andres Freund <andres@anarazel.de> wrote:
> > Hm. I kinda like the idea of still having one NodeTag identifying memory
> > contexts, and then some additional field identifying the actual
> > type. Being able to continue to rely on IsA() etc imo is nice.  I think
> > nodes.h itself only would be a problem for frontend code because we put
> > a lot of other stuff too. We should just separate the actually generic
> > stuff out. I think it's going to be like 2 seconds once we have memory
> > contexts until we're e.g. going to want to also have pg_list.h - which
> > is harder without knowing the tags.
> 
> The problem with IsA() is that it assumes that you've got all the node
> tags that can ever exist in one big enum. I don't see how to make that
> work once you extend the system to work with more than one program. I
> think it will be really confusing if frontend code starts reusing
> random backend data structures. Like, fundamental things like List,
> sure, that should be exposed. But if people start creating Plan or FDW
> objects in the frontend, it's just going to be chaos. And I don't
> think we want new objects that people may add for frontend code to be
> visible to backend code, either.

I wasn't advocating for making plannodes.h etc frontend usable. I think
that's a fairly different discussion than making enum NodeTag,
pg_list.h, memutils.h available.  I don't see them having access to the
numerical value of node tag for backend structs as something actually
problematic (I'm pretty sure you can do that today already if you really
want to - but why would you?).

I don't buy that having a separate magic number for various types that
we may want to use both frontend and backend is better than largely just
having one set of such magic type identifiers.


> > It seems like a good idea to still have an additional identifier for
> > each node type, for some cross checking. How about just frobbing the
> > pointer to the MemoryContextMethod slightly, and storing that in an
> > additional field? That'd be something fairly unlikely to ever be a false
> > positive, and it doesn't require dereferencing any additional memory.
> 
> That would be fine as an internal sanity check, but if Tom is unhappy
> with the idea of having to try to make sense of a function pointer,
> he's probably going to be even less happy about trying to make sense
> of a frobbed pointer. And I would actually agree with him on that
> point.

I feel the concern about identifying nodes can pretty readily be
addressed by adding a name to the context methods - something that's
useful independently too. It'd e.g. be nice to have some generic print
routines for memory context stats.


> I think we're all pursuing slightly different goals here. KaiGai's
> main goal is to make it possible for third-party code to add new kinds
> of memory contexts. My main goal is to make memory contexts not depend
> on backend-only infrastructure. Tom is concerned about debuggability.
> Your concern here is about sanity checking. There's some overlap
> between those goals but the absolute best thing for any given one of
> them might be really bad for one of the other ones; hopefully we can
> find some compromise that gets everybody the things they care about
> most.

FWIW, I care most about the frontend usable bit too. I was bringing up
the cross check idea solely as an idea of how to not loose, but if
anything, improve our error checking, while making things more
extensible.

Greetings,

Andres Freund

Re: Is custom MemoryContext prohibited?

From

Andres Freund

Date:

06 February 2020, 03:10:30

On 2020-02-05 10:40:59 -0500, Robert Haas wrote:
> I'm pretty unimpressed with the enum proposal - I think it's pretty
> nasty for an extension author to have to make up a value that's not in
> the enum. One, how are they supposed to know that they should do that?
> Two, how are they supposed to know that the code doesn't actually
> depend on that enum value for anything important? And three, how do
> they know that the compiler isn't going to hose them by assuming that
> isn't a can't-happen scenario?
>
> I mean, I'd rather get a patch committed here than not, but I have a
> hard time understanding why this is a good way to go.

+1

Re: Is custom MemoryContext prohibited?

From

Andres Freund

Date:

06 February 2020, 03:23:47

Hi,

On 2020-02-05 10:56:42 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Feb 5, 2020 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> an enum field right in the context.  You'll have to do an extra step to
> >> discover the context's type, and if you jump to the wrong conclusion
> >> and do, say,
> >>     p *(AllocSetContext *) ptr_value
> >> when it's really some other context type, there won't be anything
> >> as obvious as "type = T_GenerationContext" in what is printed to
> >> tell you you were wrong.
> 
> > Doesn't the proposed magic number address this concern?
> 
> No, because (a) it will be a random magic number that nobody will
> remember, and gdb won't print in any helpful form; (b) at least
> as I understood the proposal, there'd be just one magic number for
> all types of memory context.

I still don't get what reason there is to not use T_MemoryContext as the
magic number, instead of something randomly new. It's really not
problematic to expose those numerical values. And it means that the
first bit of visual inspection is going to be the same as it always has
been, and the same as it works for most other types one regularly
inspects in postgres.

What about using T_MemoryContext as the identifier that's the same for
all types of memory contexts and additionally have a new 'const char
*contexttype' in MemoryContextData, that points to
MemoryContextMethods.contexttype (which is a char[32] or such).  As it's
a char * debuggers will display the value, making it easy to identify
the specific type. And sure, it's 8 additional bytes instead of 4 - but
I don't see that being a problem.

And because contexttype points into a specific offset in the
MemoryContextData, we can use that as a crosscheck, by Asserting that
MemoryContext->methods + offsetof(MemoryContextMethods.contexttype) == MemoryContext->contexttype

> Another issue with relying on only a magic number is that if you
> get confused and do "p *(AllocSetContext *) ptr_value" on something
> that doesn't point at any sort of memory context at all, there will
> not be *anything* except confusing field values to help you realize
> that.  One of the great advantages of the Node system, IME, is that
> when you try to print out a Node-subtype struct, the first field
> in what is printed is either the Node type you were expecting, or
> some recognizable other Node code, or obvious garbage.

I agree that that's good - which is why I think we should simply not
give it up here.

Greetings,

Andres Freund

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

07 February 2020, 14:17:59

On Wed, Feb 5, 2020 at 10:09 PM Andres Freund <andres@anarazel.de> wrote:
> I wasn't advocating for making plannodes.h etc frontend usable. I think
> that's a fairly different discussion than making enum NodeTag,
> pg_list.h, memutils.h available.  I don't see them having access to the
> numerical value of node tag for backend structs as something actually
> problematic (I'm pretty sure you can do that today already if you really
> want to - but why would you?).
>
> I don't buy that having a separate magic number for various types that
> we may want to use both frontend and backend is better than largely just
> having one set of such magic type identifiers.

To be honest, and I realize this is probably going to blow your mind
and/or make you think that I'm completely insane, one concern that I
have here is that I have seen multiple people fail to understand that
the frontend and backend are, ah, not the same process. And they try
to write code in frontend environments that makes no sense whatsoever
there. The fact that nodes.h could hypothetically be included in
frontend code doesn't really contribute to confusion in this area, but
I'm concerned that including it in every file might, because it means
that a whole lot of backend-only stuff suddenly becomes visible in any
code that anyone writes anywhere. And as things stand that would the
effect of adding #include "utils/palloc.h" to "postgres_fe.h". Perhaps
I worrying too much.

On a broader level, I am not convinced that having one "enum" to rule
them all is a good design. If we go that direction, then it means that
frontend code code that wants to add its own node types (and why
shouldn't it want to do that?) would have to have them be visible to
the backend and to all other frontend processes. That doesn't seem
like a disaster, but I don't think it's great. I also don't really
like the fact that we have one central registry of node types that has
to be edited to add more node types, because it means that extensions
are not free to do so. I know we're some distance from allowing any
real extensibility around new node types and perhaps we never will,
but on principle a centralized registry sucks, and I'd prefer a more
decentralized solution if we could find one that would be acceptable.
I don't know what that would be, though. Even though I'm not as
trenchant about debuggability as you and Tom, having a magic number at
the beginning of every type of node in lieu of an enum would drive me
nuts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Robert Haas

Date:

07 February 2020, 14:31:08

On Wed, Feb 5, 2020 at 10:23 PM Andres Freund <andres@anarazel.de> wrote:
> I still don't get what reason there is to not use T_MemoryContext as the
> magic number, instead of something randomly new. It's really not
> problematic to expose those numerical values. And it means that the
> first bit of visual inspection is going to be the same as it always has
> been, and the same as it works for most other types one regularly
> inspects in postgres.

Without trying to say that my thought process is necessarily correct,
I can explain my thought process.

Many years ago, I had Tom slap down a patch of mine for making
something participate in the Node system unnecessarily. He pointed
out, then and at other times, that there was no reason for everything
to be part of the giant enum just because many things need to be. At
the time, I was a bit perplexed by his feedback, but over time I've
come to see the point. We've got lots of "enum" fields all over the
backend whose purpose it is to decide whether a particular object is
of one sub-type or another. We've also got NodeTag, which is that same
thing at a very large scale. I used to think that the reason why we
had jammed everything into NodeTag was just programming laziness or
some kind of desire for consistency, but now I think that the real
point is that making something a node allows it to participate in
readfuncs.c, outfuncs.c, copyfuncs.c, equalfuncs.c; so that a complex
data structure made entirely of nodes can be easily copied,
serialized, etc. The MemoryContext stuff participates in none of that
machinery, and it would be difficult to make it do so, and therefore
does not really need to be part of the Node system at all. The fact
that it *is* a part of the node system is just a historical accident,
or so I think. Sure, it's not an inconvenient thing to see a NodeTag
on random stuff that you're inspecting with a debugger, but if we took
that argument to its logical conclusion we would, I think, end up
needing to add node tags to a lot of stuff that doesn't have them now
- like TupleTableSlots, for example.

Also, as a general rule, every Node of a given type is expected to
have the same structure, which wouldn't be true here, because there
are multiple types of memory contexts that can exist, and
T_MemoryContext would identify them all. It's true that there are some
other weird exceptions, but it doesn't seem like a great idea to
create more.

Between those concerns, and those I wrote about in my last post, it
seemed to me that it made more sense to try to break the dependency
between palloc.h and nodes.h rather than to embrace it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is custom MemoryContext prohibited?

From

Craig Ringer

Date:

10 February 2020, 04:53:11

On Thu, 6 Feb 2020 at 11:09, Andres Freund <andres@anarazel.de> wrote:

> I wasn't advocating for making plannodes.h etc frontend usable. I think
> that's a fairly different discussion than making enum NodeTag,
> pg_list.h, memutils.h available.  I don't see them having access to the
> numerical value of node tag for backend structs as something actually
> problematic (I'm pretty sure you can do that today already if you really
> want to - but why would you?).
>
> I don't buy that having a separate magic number for various types that
> we may want to use both frontend and backend is better than largely just
> having one set of such magic type identifiers.

Simply using MemoryContext as the NodeTag seems very sensible based on
the above discussion.

But rather than adding a const char * name to point to some constant
for the implementation name as was proposed earlier, I think the
existing pointer MemoryContextData->methods is sufficient to identify
the context type. We could add a NameData field to
MemoryContextMethods that the initializer sets to the implementation
name for convenience.

It's trivial to see when debugging with a   p ctx->methods->name   .
We keep the MemoryContextData size down and we lose nothing. Though
gdb is smart enough to annotate a pointer to the symbol
AllocSetMethods as such when it sees it in a debuginfo build there's
no harm in having a single static string in the const-data segment per
memory context type.

I'd also like to add a

   bool (*instanceof)(MemoryContext context, MemoryContextMethods context_type);

to MemoryContextMethods . Then replace all use of   IsA(ctx,
AllocSetContext)   etc with a test like:

    #define Insanceof_AllocSetContext(ctx) \
        (ctx->methods == AllocSetMethods || ctx->is_a(AllocSetMethods));

In other words, we ask the target object what it is.

This would make it possible to create wrapper implementations of
existing contexts that do extra memory accounting or other limited
sorts of extensions. The short-circuit testing for the known concrete
AllocSetMethods should make it pretty much identical in performance
terms, which is of course rather important.

The OO-alike naming is not a coincidence.

I can't help but notice that we're facing some of the same issues
faced by early OO patterns. Not too surprising given that Pg uses a
lot of pseudo-OO - some level of structural inheritance and
behavioural inheritance, but no encapsulation, no messaging model, no
code-to-data binding. I'm no OO purist, I don't care much so long as
it works and is consistent.

In OO terms what we seem to be facing is difficulty with extending
existing object types into new subtypes without modifying all the code
that knows how to work with the parent types. MemoryContext is one
example of this, Node is another. The underlying issue is similar.

Being able to do this is something I'm much more interested in being
able to do for plan and parse nodes etc than for MemoryContext tbh,
but the same principles apply.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Is custom MemoryContext prohibited?

From

Kohei KaiGai

Date:

10 February 2020, 13:18:51

2020年2月10日(月) 13:53 Craig Ringer <craig@2ndquadrant.com>:
>
> On Thu, 6 Feb 2020 at 11:09, Andres Freund <andres@anarazel.de> wrote:
>
> > I wasn't advocating for making plannodes.h etc frontend usable. I think
> > that's a fairly different discussion than making enum NodeTag,
> > pg_list.h, memutils.h available.  I don't see them having access to the
> > numerical value of node tag for backend structs as something actually
> > problematic (I'm pretty sure you can do that today already if you really
> > want to - but why would you?).
> >
> > I don't buy that having a separate magic number for various types that
> > we may want to use both frontend and backend is better than largely just
> > having one set of such magic type identifiers.
>
> Simply using MemoryContext as the NodeTag seems very sensible based on
> the above discussion.
>
> But rather than adding a const char * name to point to some constant
> for the implementation name as was proposed earlier, I think the
> existing pointer MemoryContextData->methods is sufficient to identify
> the context type. We could add a NameData field to
> MemoryContextMethods that the initializer sets to the implementation
> name for convenience.
>
> It's trivial to see when debugging with a   p ctx->methods->name   .
> We keep the MemoryContextData size down and we lose nothing. Though
> gdb is smart enough to annotate a pointer to the symbol
> AllocSetMethods as such when it sees it in a debuginfo build there's
> no harm in having a single static string in the const-data segment per
> memory context type.
>
> I'd also like to add a
>
>    bool (*instanceof)(MemoryContext context, MemoryContextMethods context_type);
>
> to MemoryContextMethods . Then replace all use of   IsA(ctx,
> AllocSetContext)   etc with a test like:
>
>     #define Insanceof_AllocSetContext(ctx) \
>         (ctx->methods == AllocSetMethods || ctx->is_a(AllocSetMethods));
>
AllocSetMethods is statically defined at utils/mmgr/aset.c, so this macro
shall be available only in this source file.

Isn't it sufficient to have the macro below?

#define Insanceof_AllocSetContext(ctx) \
  (IsA(ctx, MemoryContext) && \
   strcmp(((MemoryContext)(ctx))->methods->name, "AllocSetMethods") == 0)

As long as an insane extension does not define a different memory context
with the same name, it will work.


> In other words, we ask the target object what it is.
>
> This would make it possible to create wrapper implementations of
> existing contexts that do extra memory accounting or other limited
> sorts of extensions. The short-circuit testing for the known concrete
> AllocSetMethods should make it pretty much identical in performance
> terms, which is of course rather important.
>
> The OO-alike naming is not a coincidence.
>
> I can't help but notice that we're facing some of the same issues
> faced by early OO patterns. Not too surprising given that Pg uses a
> lot of pseudo-OO - some level of structural inheritance and
> behavioural inheritance, but no encapsulation, no messaging model, no
> code-to-data binding. I'm no OO purist, I don't care much so long as
> it works and is consistent.
>
> In OO terms what we seem to be facing is difficulty with extending
> existing object types into new subtypes without modifying all the code
> that knows how to work with the parent types. MemoryContext is one
> example of this, Node is another. The underlying issue is similar.
>
> Being able to do this is something I'm much more interested in being
> able to do for plan and parse nodes etc than for MemoryContext tbh,
> but the same principles apply.
>
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise



--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Re: Is custom MemoryContext prohibited?

From

Craig Ringer

Date:

10 February 2020, 13:47:29

On Mon, 10 Feb 2020 at 21:19, Kohei KaiGai <kaigai@heterodb.com> wrote:
>
> 2020年2月10日(月) 13:53 Craig Ringer <craig@2ndquadrant.com>:
> >
> > On Thu, 6 Feb 2020 at 11:09, Andres Freund <andres@anarazel.de> wrote:
> >
> > > I wasn't advocating for making plannodes.h etc frontend usable. I think
> > > that's a fairly different discussion than making enum NodeTag,
> > > pg_list.h, memutils.h available.  I don't see them having access to the
> > > numerical value of node tag for backend structs as something actually
> > > problematic (I'm pretty sure you can do that today already if you really
> > > want to - but why would you?).
> > >
> > > I don't buy that having a separate magic number for various types that
> > > we may want to use both frontend and backend is better than largely just
> > > having one set of such magic type identifiers.
> >
> > Simply using MemoryContext as the NodeTag seems very sensible based on
> > the above discussion.
> >
> > But rather than adding a const char * name to point to some constant
> > for the implementation name as was proposed earlier, I think the
> > existing pointer MemoryContextData->methods is sufficient to identify
> > the context type. We could add a NameData field to
> > MemoryContextMethods that the initializer sets to the implementation
> > name for convenience.
> >
> > It's trivial to see when debugging with a   p ctx->methods->name   .
> > We keep the MemoryContextData size down and we lose nothing. Though
> > gdb is smart enough to annotate a pointer to the symbol
> > AllocSetMethods as such when it sees it in a debuginfo build there's
> > no harm in having a single static string in the const-data segment per
> > memory context type.
> >
> > I'd also like to add a
> >
> >    bool (*instanceof)(MemoryContext context, MemoryContextMethods context_type);
> >
> > to MemoryContextMethods . Then replace all use of   IsA(ctx,
> > AllocSetContext)   etc with a test like:
> >
> >     #define Insanceof_AllocSetContext(ctx) \
> >         (ctx->methods == AllocSetMethods || ctx->is_a(AllocSetMethods));
> >
> AllocSetMethods is statically defined at utils/mmgr/aset.c, so this macro
> shall be available only in this source file.
>
> Isn't it sufficient to have the macro below?
>
> #define Insanceof_AllocSetContext(ctx) \
>   (IsA(ctx, MemoryContext) && \
>    strcmp(((MemoryContext)(ctx))->methods->name, "AllocSetMethods") == 0)
>
> As long as an insane extension does not define a different memory context
> with the same name, it will work.

That wouldn't allow for the sort of extensibility I suggested for
wrapping objects, which is why I thought we might as well ask the
object itself. It's not exactly a new, weird or unusual pattern. A
pointer to AllocSetMethods would need to be made non-static if we
wanted to allow a macro or static inline to avoid the function call
and test it for equality, but that's not IMO a big problem. Or if it
is, well, there's always whole-program optimisation...

Also, isn't strcmp() kinda expensive compared to a simple pointer
value compare anyway? I admit I'm terribly clueless about modern
microarchitectures, so I may be very wrong.

All I'm saying is that if we're changing this, lets learn from what
others have done when writing interfaces and inheritance-type
patterns.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise