Thread: Expand palloc/pg_malloc API

Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
This adds additional variants of palloc, pg_malloc, etc. that 
encapsulate common usage patterns and provide more type safety.

Examples:

-   result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+   result = palloc_obj(IndexBuildResult);

-   collector->tuples = (IndexTuple *) palloc(sizeof(IndexTuple) *
                                               collector->lentuples);
+   collector->tuples = palloc_array(IndexTuple, collector->lentuples);

One common point is that the new interfaces all have a return type that 
automatically matches what they are allocating, so you don't need any 
casts nor have to manually make sure the size matches the expected 
result.  Besides the additional safety, the notation is also more 
compact, as you can see above.

Inspired by the talloc library.

The interesting changes are in fe_memutils.h and palloc.h.  The rest of 
the patch is just randomly sprinkled examples to test/validate the new 
additions.
Attachment

Re: Expand palloc/pg_malloc API

From
Bharath Rupireddy
Date:
On Tue, May 17, 2022 at 5:11 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> This adds additional variants of palloc, pg_malloc, etc. that
> encapsulate common usage patterns and provide more type safety.
>
> Examples:
>
> -   result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
> +   result = palloc_obj(IndexBuildResult);
>
> -   collector->tuples = (IndexTuple *) palloc(sizeof(IndexTuple) *
>                                                collector->lentuples);
> +   collector->tuples = palloc_array(IndexTuple, collector->lentuples);
>
> One common point is that the new interfaces all have a return type that
> automatically matches what they are allocating, so you don't need any
> casts nor have to manually make sure the size matches the expected
> result.  Besides the additional safety, the notation is also more
> compact, as you can see above.
>
> Inspired by the talloc library.
>
> The interesting changes are in fe_memutils.h and palloc.h.  The rest of
> the patch is just randomly sprinkled examples to test/validate the new
> additions.

It seems interesting. Are we always type-casting explicitly the output
of palloc/palloc0? Does this mean the compiler takes care of
type-casting the returned void * to the target type?

I see lots of instances where there's no explicit type-casting to the
target variable type -
        retval = palloc(sizeof(GISTENTRY));
        Interval   *p = palloc(sizeof(Interval));
        macaddr    *v = palloc0(sizeof(macaddr)); and so on.

Regards,
Bharath Rupireddy.



Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> On Tue, May 17, 2022 at 5:11 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>> This adds additional variants of palloc, pg_malloc, etc. that
>> encapsulate common usage patterns and provide more type safety.

> I see lots of instances where there's no explicit type-casting to the
> target variable type -
>         retval = palloc(sizeof(GISTENTRY));
>         Interval   *p = palloc(sizeof(Interval));
>         macaddr    *v = palloc0(sizeof(macaddr)); and so on.

Yeah.  IMO the first of those is very poor style, because there's
basically nothing enforcing that you wrote the right thing in sizeof().
The others are a bit safer, in that at least a human can note that
the two types mentioned on the same line match --- but I doubt any
compiler would detect it if they don't.  Our current preferred style

         Interval   *p = (Interval *) palloc(sizeof(Interval));

is really barely an improvement, because only two of the three types
mentioned are going to be checked against each other.

So I think Peter's got a good idea here (I might quibble with the details
of some of these macros).  But it's not really going to move the
safety goalposts very far unless we make a concerted effort to make
these be the style everywhere.  Are we willing to do that?  What
will it do to back-patching difficulty?  Dare I suggest back-patching
these changes?

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 17.05.22 20:43, Tom Lane wrote:
> So I think Peter's got a good idea here (I might quibble with the details
> of some of these macros).  But it's not really going to move the
> safety goalposts very far unless we make a concerted effort to make
> these be the style everywhere.  Are we willing to do that?  What
> will it do to back-patching difficulty?  Dare I suggest back-patching
> these changes?

I think it could go like the castNode() introduction: first we adopt it 
sporadically for new code, then we change over some larger pieces of 
code, then we backpatch the API, then someone sends in a big patch to 
change the rest.



Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 17.05.22 20:43, Tom Lane wrote:
>> So I think Peter's got a good idea here (I might quibble with the details
>> of some of these macros).  But it's not really going to move the
>> safety goalposts very far unless we make a concerted effort to make
>> these be the style everywhere.  Are we willing to do that?  What
>> will it do to back-patching difficulty?  Dare I suggest back-patching
>> these changes?

> I think it could go like the castNode() introduction: first we adopt it 
> sporadically for new code, then we change over some larger pieces of 
> code, then we backpatch the API, then someone sends in a big patch to 
> change the rest.

OK, that seems like a reasonable plan.

I've now studied this a little more closely, and I think the
functionality is fine, but I have naming quibbles.

1. Do we really want distinct names for the frontend and backend
versions of the macros?  Particularly since you're layering the
frontend ones over pg_malloc, which has exit-on-OOM behavior?
I think we've found that notational discrepancies between frontend
and backend code are generally more a hindrance than a help,
so I'm inclined to drop the pg_malloc_xxx macros and just use
"palloc"-based names across the board.

2. I don't like the "palloc_ptrtype" name at all.  I see that you
borrowed that name from talloc, but I doubt that's a precedent that
very many people are familiar with.  To me it sounds like it might
allocate something that's the size of a pointer, not the size of the
pointed-to object.  I have to confess though that I don't have an
obviously better name to suggest.  "palloc_pointed_to" would be
clear perhaps, but it's kind of long.

3. Likewise, "palloc_obj" is perhaps less clear than it could be.
I find palloc_array just fine though.  Maybe palloc_object or
palloc_struct?  (If "_obj" can be traced to talloc, I'm not
seeing where.)


One thought that comes to mind is that palloc_ptrtype is almost
surely going to be used in the style

    myvariable = palloc_ptrtype(myvariable);

and if it's not that it's very likely wrong.  So maybe we should cut
out the middleman and write something like

#define palloc_instantiate(ptr) ((ptr) = (typeof(ptr)) palloc(sizeof(*(ptr))))
...
    palloc_instantiate(myvariable);

I'm not wedded to "instantiate" here, there's probably better names.

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
"David G. Johnston"
Date:
On Tue, Jul 26, 2022 at 2:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. I don't like the "palloc_ptrtype" name at all.  I see that you
borrowed that name from talloc, but I doubt that's a precedent that
very many people are familiar with. 
 
To me it sounds like it might
allocate something that's the size of a pointer, not the size of the
pointed-to object.  I have to confess though that I don't have an
obviously better name to suggest.  "palloc_pointed_to" would be
clear perhaps, but it's kind of long.

I agree that ptrtype reads "the type of a pointer".

This may not be a C-idiom but the pointed-to thing is a "reference" (hence pass by value vs pass by reference).  So:

palloc_ref(myvariablepointer)

will allocate using the type of the referenced object.  Just like _array and _obj, which name the thing being used as a size template as opposed to instantiate which seems more like another word for "allocate/palloc".

David J.
P.S.
 
Admittedly I'm still getting my head around reading pointer-using code (I get the general concept but haven't had to code them)....

- lockrelid = palloc(sizeof(*lockrelid));
+ lockrelid = palloc_ptrtype(lockrelid);

// This definitely seems like an odd idiom until I remembered about short-lived memory contexts and the lost pointers are soon destroyed there.

So lockrelid (no star) is a pointer that has an underlying reference that the macro (and the orignal code) resolves via the *

I cannot reason out whether the following would be equivalent to the above:

lockrelid = palloc_obj(*lockrelid);

I assume not because:  typeof(lockrelid) != (*lockrelid *)

Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 26.07.22 23:32, Tom Lane wrote:
> 1. Do we really want distinct names for the frontend and backend
> versions of the macros?  Particularly since you're layering the
> frontend ones over pg_malloc, which has exit-on-OOM behavior?
> I think we've found that notational discrepancies between frontend
> and backend code are generally more a hindrance than a help,
> so I'm inclined to drop the pg_malloc_xxx macros and just use
> "palloc"-based names across the board.

This seems like a question that is independent of this patch.  Given 
that both pg_malloc() and palloc() do exist in fe_memutils, I think it 
would be confusing to only extend one part of that and not the other. 
The amount of code is ultimately not a lot.

If we wanted to get rid of pg_malloc() altogether, maybe we could talk 
about that.

(Personally, I have always been a bit suspicious about using the name 
palloc() without memory context semantics in frontend code, but I guess 
this is wide-spread now.)

> 3. Likewise, "palloc_obj" is perhaps less clear than it could be.
> I find palloc_array just fine though.  Maybe palloc_object or
> palloc_struct?  (If "_obj" can be traced to talloc, I'm not
> seeing where.)

In talloc, the talloc() function itself allocates an object of a given 
type.  To allocate something of a specified size, you'd use 
talloc_size().  So those names won't map exactly.  I'm fine with 
palloc_object() if that is clearer.

> One thought that comes to mind is that palloc_ptrtype is almost
> surely going to be used in the style
> 
>     myvariable = palloc_ptrtype(myvariable);
> 
> and if it's not that it's very likely wrong.  So maybe we should cut
> out the middleman and write something like
> 
> #define palloc_instantiate(ptr) ((ptr) = (typeof(ptr)) palloc(sizeof(*(ptr))))
> ...
>     palloc_instantiate(myvariable);

Right, this is sort of what you'd want, really.  But it looks like 
strange C code, since you are modifying the variable even though you are 
passing it by value.

I think the _ptrtype variant isn't that useful anyway, so if it's 
confusing we can leave it out.



Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 27.07.22 01:58, David G. Johnston wrote:
> Admittedly I'm still getting my head around reading pointer-using code 
> (I get the general concept but haven't had to code them)....
> 
> - lockrelid = palloc(sizeof(*lockrelid));
> + lockrelid = palloc_ptrtype(lockrelid);
> 
> // This definitely seems like an odd idiom until I remembered about 
> short-lived memory contexts and the lost pointers are soon destroyed there.
> 
> So lockrelid (no star) is a pointer that has an underlying reference 
> that the macro (and the orignal code) resolves via the *
> 
> I cannot reason out whether the following would be equivalent to the above:
> 
> lockrelid = palloc_obj(*lockrelid);

I think that would also work.

Ultimately, it would be more idiomatic (in Postgres), to write this as

lockrelid = palloc(sizeof(LockRelId));

and thus

lockrelid = palloc_obj(LockRelId);



Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 12.08.22 09:31, Peter Eisentraut wrote:
> In talloc, the talloc() function itself allocates an object of a given 
> type.  To allocate something of a specified size, you'd use 
> talloc_size().  So those names won't map exactly.  I'm fine with 
> palloc_object() if that is clearer.

> I think the _ptrtype variant isn't that useful anyway, so if it's 
> confusing we can leave it out.

I have updated this patch set to rename the _obj() functions to 
_object(), and I have dropped the _ptrtype() variants.

I have also split the patch to put the new API and the example uses into 
separate patches.
Attachment

Re: Expand palloc/pg_malloc API

From
Robert Haas
Date:
On Fri, Aug 12, 2022 at 3:31 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> (Personally, I have always been a bit suspicious about using the name
> palloc() without memory context semantics in frontend code, but I guess
> this is wide-spread now.)

I think it would be a good thing to add memory context support to the
frontend. We could just put everything in a single context for
starters, and then frontend utilities that wanted to create other
contexts could do so.

There are difficulties, though. For instance, memory contexts are
nodes, and have a NodeTag. And I'm pretty sure we don't want frontend
code to know about all the backend node types. My suspicion is that
memory context types really shouldn't be node types, but right now,
they are. Whether that's the correct view or not, this kind of problem
means it's not a simple lift-and-shift to move the memory context code
into src/common. Someone would need to spend some time thinking about
how to engineer it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> I have updated this patch set to rename the _obj() functions to 
> _object(), and I have dropped the _ptrtype() variants.

> I have also split the patch to put the new API and the example uses into 
> separate patches.

This patch set seems fine to me, so I've marked it Ready for Committer.

I think serious consideration should be given to back-patching the
0001 part (that is, addition of the macros).  Otherwise we'll have
to remember not to use these macros in code intended for back-patch,
and that'll be mighty annoying once we are used to them.

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Aug 12, 2022 at 3:31 AM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>> (Personally, I have always been a bit suspicious about using the name
>> palloc() without memory context semantics in frontend code, but I guess
>> this is wide-spread now.)

> I think it would be a good thing to add memory context support to the
> frontend. We could just put everything in a single context for
> starters, and then frontend utilities that wanted to create other
> contexts could do so.

Perhaps, but I think we should have at least one immediate use-case
for multiple contexts in frontend.  Otherwise it's just useless extra
code.  The whole point of memory contexts in the backend is that we
have well-defined lifespans for certain types of allocations (executor
state, function results, etc); but it's not very clear to me that the
same concept will be helpful in any of our frontend programs.

> There are difficulties, though. For instance, memory contexts are
> nodes, and have a NodeTag. And I'm pretty sure we don't want frontend
> code to know about all the backend node types. My suspicion is that
> memory context types really shouldn't be node types, but right now,
> they are. Whether that's the correct view or not, this kind of problem
> means it's not a simple lift-and-shift to move the memory context code
> into src/common. Someone would need to spend some time thinking about
> how to engineer it.

I don't really think that's much of an issue.  We could replace the
nodetag fields with some sort of magic number and have just as much
wrong-pointer safety as in the backend.  What I do take issue with
is moving the code into src/common.  I think we'd be better off
just writing a distinct implementation for frontend.  For one thing,
it's not apparent to me that aset.c is a good allocator for frontend
(and the other two surely are not).

This is all pretty off-topic for Peter's patch, though.

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 09.09.22 22:13, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> I have updated this patch set to rename the _obj() functions to
>> _object(), and I have dropped the _ptrtype() variants.
> 
>> I have also split the patch to put the new API and the example uses into
>> separate patches.
> 
> This patch set seems fine to me, so I've marked it Ready for Committer.

committed

> I think serious consideration should be given to back-patching the
> 0001 part (that is, addition of the macros).  Otherwise we'll have
> to remember not to use these macros in code intended for back-patch,
> and that'll be mighty annoying once we are used to them.

Yes, the 0001 patch is kept separate so that we can do that when we feel 
the time is right.




Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 09.09.22 22:13, Tom Lane wrote:
>> I think serious consideration should be given to back-patching the
>> 0001 part (that is, addition of the macros).  Otherwise we'll have
>> to remember not to use these macros in code intended for back-patch,
>> and that'll be mighty annoying once we are used to them.

> Yes, the 0001 patch is kept separate so that we can do that when we feel 
> the time is right.

I think the right time is now, or at least as soon as you're
satisfied that the buildfarm is happy.

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 12.09.22 15:49, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> On 09.09.22 22:13, Tom Lane wrote:
>>> I think serious consideration should be given to back-patching the
>>> 0001 part (that is, addition of the macros).  Otherwise we'll have
>>> to remember not to use these macros in code intended for back-patch,
>>> and that'll be mighty annoying once we are used to them.
> 
>> Yes, the 0001 patch is kept separate so that we can do that when we feel
>> the time is right.
> 
> I think the right time is now, or at least as soon as you're
> satisfied that the buildfarm is happy.

This has been done.




Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
I have another little idea that fits well here: repalloc0 and 
repalloc0_array.  These zero out the space added by repalloc.  This is a 
common pattern in the backend code that is quite hairy to code by hand. 
See attached patch.

Attachment

Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> I have another little idea that fits well here: repalloc0 and 
> repalloc0_array.  These zero out the space added by repalloc.  This is a 
> common pattern in the backend code that is quite hairy to code by hand. 
> See attached patch.

+1 in general --- you've put your finger on something I felt was
missing, but couldn't quite identify.

However, I'm a bit bothered by the proposed API:

+extern pg_nodiscard void *repalloc0(void *pointer, Size size, Size oldsize);

It kind of feels that the argument order should be pointer, oldsize, size.
It feels even more strongly that people will get the ordering wrong,
whichever we choose.  Is there a way to make that more bulletproof?

The only thought that comes to mind offhand is that the only plausible
use-case is with size >= oldsize, so maybe an assertion or even a
runtime check would help to catch getting it backwards.  (I notice
that your proposed coding will fail rather catastrophically if the
caller gets it backwards.  An assertion failure would be better.)

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
I wrote:
> It kind of feels that the argument order should be pointer, oldsize, size.
> It feels even more strongly that people will get the ordering wrong,
> whichever we choose.  Is there a way to make that more bulletproof?

Actually ... an even-more-terrifyingly-plausible misuse is that the
supplied oldsize is different from the actual previous allocation.
We should try to check that.  In MEMORY_CONTEXT_CHECKING builds
it should be possible to assert that oldsize == requested_size.
We don't have that data if !MEMORY_CONTEXT_CHECKING, but we could
at least assert that oldsize <= allocated chunk size.

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 14.09.22 06:53, Tom Lane wrote:
> I wrote:
>> It kind of feels that the argument order should be pointer, oldsize, size.
>> It feels even more strongly that people will get the ordering wrong,
>> whichever we choose.  Is there a way to make that more bulletproof?
> 
> Actually ... an even-more-terrifyingly-plausible misuse is that the
> supplied oldsize is different from the actual previous allocation.
> We should try to check that.  In MEMORY_CONTEXT_CHECKING builds
> it should be possible to assert that oldsize == requested_size.
> We don't have that data if !MEMORY_CONTEXT_CHECKING, but we could
> at least assert that oldsize <= allocated chunk size.

I'm not very familiar with MEMORY_CONTEXT_CHECKING.  Where would one get 
these values?




Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 14.09.22 06:53, Tom Lane wrote:
>> Actually ... an even-more-terrifyingly-plausible misuse is that the
>> supplied oldsize is different from the actual previous allocation.
>> We should try to check that.  In MEMORY_CONTEXT_CHECKING builds
>> it should be possible to assert that oldsize == requested_size.
>> We don't have that data if !MEMORY_CONTEXT_CHECKING, but we could
>> at least assert that oldsize <= allocated chunk size.

> I'm not very familiar with MEMORY_CONTEXT_CHECKING.  Where would one get 
> these values?

Hmm ... the individual allocators have that info, but mcxt.c doesn't
have access to it.  I guess we could invent an additional "method"
to return the requested size of a chunk, which is only available in
MEMORY_CONTEXT_CHECKING builds, or maybe in !MEMORY_CONTEXT_CHECKING
it returns the allocated size instead.

            regards, tom lane



Re: Expand palloc/pg_malloc API

From
Peter Eisentraut
Date:
On 11.10.22 18:04, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> On 14.09.22 06:53, Tom Lane wrote:
>>> Actually ... an even-more-terrifyingly-plausible misuse is that the
>>> supplied oldsize is different from the actual previous allocation.
>>> We should try to check that.  In MEMORY_CONTEXT_CHECKING builds
>>> it should be possible to assert that oldsize == requested_size.
>>> We don't have that data if !MEMORY_CONTEXT_CHECKING, but we could
>>> at least assert that oldsize <= allocated chunk size.
> 
>> I'm not very familiar with MEMORY_CONTEXT_CHECKING.  Where would one get
>> these values?
> 
> Hmm ... the individual allocators have that info, but mcxt.c doesn't
> have access to it.  I guess we could invent an additional "method"
> to return the requested size of a chunk, which is only available in
> MEMORY_CONTEXT_CHECKING builds, or maybe in !MEMORY_CONTEXT_CHECKING
> it returns the allocated size instead.

I'm not sure whether that amount of additional work would be useful 
relative to the size of this patch.  Is the patch as it stands now 
making the code less robust than what the code is doing now?

In the meantime, here is an updated patch with the argument order 
swapped and an additional assertion, as previously discussed.

Attachment

Re: Expand palloc/pg_malloc API

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 11.10.22 18:04, Tom Lane wrote:
>> Hmm ... the individual allocators have that info, but mcxt.c doesn't
>> have access to it.  I guess we could invent an additional "method"
>> to return the requested size of a chunk, which is only available in
>> MEMORY_CONTEXT_CHECKING builds, or maybe in !MEMORY_CONTEXT_CHECKING
>> it returns the allocated size instead.

> I'm not sure whether that amount of additional work would be useful 
> relative to the size of this patch.  Is the patch as it stands now 
> making the code less robust than what the code is doing now?

No.  It's slightly annoying that the call sites still have to track
the old size of the allocation, but I guess that they have to have
that information in order to know that they need to repalloc in the
first place.  I agree that this patch does make things easier to
read and a bit less error-prone.

Also, I realized that what I proposed above doesn't really work
anyway for this purpose.  Consider

    ptr = palloc(size);
    ... fill all "size" bytes ...
    ptr = repalloc0(ptr, size, newsize);

where the initial size request isn't a power of 2.  If production builds
rely on the initial allocated size not requested size to decide where to
start zeroing, this would work (no uninitialized holes) in a debug build,
but leave some uninitialized bytes in a production build, which is
absolutely horrible.  So I guess we have to rely on the callers to
track their requests.

> In the meantime, here is an updated patch with the argument order 
> swapped and an additional assertion, as previously discussed.

I think it'd be worth expending an actual runtime test in repalloc0,
that is

    if (unlikely(oldsize > size))
        elog(ERROR, "invalid repalloc0 call: oldsize %zu, new size %zu",
             oldsize, size);

This seems cheap compared to the cost of the repalloc+memset, and the
consequences of not detecting the error seem pretty catastrophic ---
the memset would try to zero almost your whole address space.

No objections beyond that.  I've marked this RFC.

            regards, tom lane