Thread: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
Part of the work that Thomas mentions in [1], regarding Direct I/O,
has certain requirements that pointers must be page-aligned.

I've attached a patch which implements palloc_aligned() and
MemoryContextAllocAligned() which accept an 'alignto' parameter which
must be a power-of-2 value. The memory addresses returned by these
functions will be aligned by the requested alignment.

Primarily, this work is by Andres. I took what he had and cleaned it
up, fixed a few minor bugs then implemented repalloc() and
GetMemoryChunkSpace() functionality.

The way this works is that palloc_aligned() can be called for any of
the existing MemoryContext types.  What we do is perform a normal
allocation request, but we add additional bytes to the size request to
allow the proper alignment of the pointer that we return.  Since we
have no control over the alignment of the return value from the
allocation requests, we must adjust the pointer returned by the
allocation function to align it to the required alignment.

When an operation such as pfree() or repalloc() is performed on a
pointer retuned by palloc_aligned(), we can't go trying to pfree the
aligned pointer as this is not the pointer that was returned by the
allocation function.  To make all this work, another MemoryChunk
struct exists directly before the aligned pointer which has the
MemoryContextMethodID set to MCTX_ALIGNED_REDIRECT_ID.  These
"redirection" MemoryChunks have the "block offset" set to allow the
actual MemoryChunk of the original allocation to be found. We just
subtract the number of bytes stored in the block offset, which is just
the same as how we now find the owning AllocBlock from the MemoryChunk
in aset.c.  Once we do that offset calculation, we can just pfree()
the original chunk.

The 'alignto' is stored in this "redirection" MemoryChunk so that
repalloc() knows what the original alignment request was so that it
the repalloc'd chunk can be aligned by that amount too.

In the attached patch, there are not yet any users of these new 2
functions.  As mentioned above, Thomas is proposing some patches to
implement Direct I/O in [1] which will use these functions. Because I
touched memory contexts last, it likely makes the most sense for me to
work on this portion of the patch.

Comments welcome.

Patch attached.

(I did rip out all the I/O specific portions from Andres' patch which
Thomas proposes as his 0002 patch. Thomas will need to rebase
(sorry)).

David

[1] https://www.postgresql.org/message-id/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com

Attachment

Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
Maxim Orlov
Date:
Hi!

Part of the work that Thomas mentions in [1], regarding Direct I/O,
has certain requirements that pointers must be page-aligned.

I've attached a patch which implements palloc_aligned() and
MemoryContextAllocAligned() ...
I've done a quick look and the patch is looks good to me.
Let's add tests for these functions, should we? If you think this is an overkill, feel free to trim tests for your taste.

--
Best regards,
Maxim Orlov.
Attachment

Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
Andres Freund
Date:
Hi,

On 2022-11-02 00:28:46 +1300, David Rowley wrote:
> Part of the work that Thomas mentions in [1], regarding Direct I/O,
> diff --git a/src/backend/utils/mmgr/alignedalloc.c b/src/backend/utils/mmgr/alignedalloc.c
> new file mode 100644
> index 0000000000..e581772758
> --- /dev/null
> +++ b/src/backend/utils/mmgr/alignedalloc.c
> @@ -0,0 +1,93 @@
> +/*-------------------------------------------------------------------------
> + *
> + * alignedalloc.c
> + *      Allocator functions to implement palloc_aligned

FWIW, to me this is really more part of mcxt.c than its own
allocator... Particularly because MemoryContextAllocAligned() et al are
implemented there.


> +void *
> +AlignedAllocRealloc(void *pointer, Size size)

I doubtthere's ever a need to realloc such a pointer? Perhaps we could just
elog(ERROR)?


> +/*
> + * MemoryContextAllocAligned
> + *        Allocate 'size' bytes of memory in 'context' aligned to 'alignto'
> + *        bytes.
> + *
> + * 'flags' may be 0 or set the same as MemoryContextAllocExtended().
> + * 'alignto' must be a power of 2.
> + */
> +void *
> +MemoryContextAllocAligned(MemoryContext context,
> +                          Size size, Size alignto, int flags)
> +{
> +    Size        alloc_size;
> +    void       *unaligned;
> +    void       *aligned;
> +
> +    /* wouldn't make much sense to waste that much space */
> +    Assert(alignto < (128 * 1024 * 1024));
> +
> +    /* ensure alignto is a power of 2 */
> +    Assert((alignto & (alignto - 1)) == 0);

Hm, not that I can see a case for ever not using a power of two
alignment... There's not really a "need" for the restriction, right? Perhaps
we should note that?


> +    /*
> +     * We implement aligned pointers by simply allocating enough memory for
> +     * the requested size plus the alignment and an additional MemoryChunk.
> +     * This additional MemoryChunk is required for operations such as pfree
> +     * when used on the pointer returned by this function.  We use this
> +     * "redirection" MemoryChunk in order to find the pointer to the memory
> +     * that was returned by the MemoryContextAllocExtended call below. We do
> +     * that by "borrowing" the block offset field and instead of using that to
> +     * find the offset into the owning block, we use it to find the original
> +     * allocated address.
> +     *
> +     * Here we must allocate enough extra memory so that we can still align
> +     * the pointer returned by MemoryContextAllocExtended and also have enough
> +     * space for the redirection MemoryChunk.
> +     */
> +    alloc_size = size + alignto + sizeof(MemoryChunk);
> +
> +    /* perform the actual allocation */
> +    unaligned = MemoryContextAllocExtended(context, alloc_size, flags);

Should we handle the case where we get a suitably aligned pointer from
MemoryContextAllocExtended() differently?


> +    /* XXX: should we adjust valgrind state here? */

Probably still need to do this... Kinda hard to get right without the code
getting exercised. Wonder if there's some minimal case we could actually use
it for?

Thanks,

Andres



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
Thanks for having a look.

On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote:
> FWIW, to me this is really more part of mcxt.c than its own
> allocator... Particularly because MemoryContextAllocAligned() et al are
> implemented there.

I'm on the fence about this one. I thought it was nice that we have a
file per consumed MemoryContextMethodID. The thing that caused me to
add alignedalloc.c was the various other comments in the declaration
of mcxt_methods[] that mention where to find each of the methods being
assigned in that array (e.g /* aset.c */).  I can change it back if
you feel strongly. I just don't.

> I doubtthere's ever a need to realloc such a pointer? Perhaps we could just
> elog(ERROR)?

Are you maybe locked into just thinking about your current planned use
case that we want to allocate BLCKSZ bytes in each case? It does not
seem impossible to me that someone will want something more than an
8-byte alignment and also might want to enlarge the allocation at some
point. I thought it might be more dangerous not to implement repalloc.
It might not be clear to someone using palloc_aligned() that there's
no code path that can call repalloc on the returned pointer.

> Hm, not that I can see a case for ever not using a power of two
> alignment... There's not really a "need" for the restriction, right? Perhaps
> we should note that?

TYPEALIGN() will not work correctly unless the alignment is a power of
2. We could modify it to, but that would require doing some modular
maths instead of bitmasking. That seems pretty horrible when the macro
is given a value that's not constant at compile time as we'd end up
with a (slow) divide in the code path.  I think the restriction is a
good idea. I imagine there will never be any need to align to anything
that's not a power of 2.

> Should we handle the case where we get a suitably aligned pointer from
> MemoryContextAllocExtended() differently?

Maybe it would be worth the extra check. I'm trying to imagine future
use cases.  Maybe if someone wanted to ensure that we're aligned to
CPU cache line boundaries then the chances of the pointer already
being aligned to 64 bytes is decent enough.  The problem is it that
it's too late to save any memory, it just saves a bit of boxing and
unboxing of the redirect headers.

> > +     /* XXX: should we adjust valgrind state here? */
>
> Probably still need to do this... Kinda hard to get right without the code
> getting exercised.

Yeah, that comment kept catching my eye. I agree. That should be
handled correctly. I'll work on that.

> Wonder if there's some minimal case we could actually use
> it for?

Is there anything we could align to CPU cacheline size that would
speed something up?

David



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
On Fri, 4 Nov 2022 at 04:20, Maxim Orlov <orlovmg@gmail.com> wrote:
> I've done a quick look and the patch is looks good to me.
> Let's add tests for these functions, should we? If you think this is an overkill, feel free to trim tests for your
taste.

Thanks for doing that.  I'm keen to wait a bit and see if we can come
up with a core user of this before adding a test module.  However, if
we keep the repalloc() implementation, then I wonder if it might be
worth having a test module for that. I see you're not testing it in
the one you've written.  Andres has suggested I remove the repalloc
stuff I added but see my reply to that. I think we should keep it
based on the fact that someone using palloc_aligned might have no idea
if some other code path can call repalloc() on the returned pointer.

David



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
John Naylor
Date:

On Tue, Nov 8, 2022 at 8:57 AM David Rowley <dgrowleyml@gmail.com> wrote:
> Is there anything we could align to CPU cacheline size that would
> speed something up?

InitCatCache() already has this, which could benefit from simpler notation.

/*
 * Allocate a new cache structure, aligning to a cacheline boundary
 *
 * Note: we rely on zeroing to initialize all the dlist headers correctly
 */
sz = sizeof(CatCache) + PG_CACHE_LINE_SIZE;
cp = (CatCache *) CACHELINEALIGN(palloc0(sz));

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
Andres Freund
Date:
Hi,

On 2022-11-08 15:01:18 +1300, David Rowley wrote:
> Andres has suggested I remove the repalloc stuff I added but see my reply to
> that.

I'm fine with keeping it, I just couldn't really think of cases that have
strict alignment requirements but also requires resizing.

Greetings,

Andres Freund



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
On Tue, 8 Nov 2022 at 14:57, David Rowley <dgrowleyml@gmail.com> wrote:
> On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote:
> > Should we handle the case where we get a suitably aligned pointer from
> > MemoryContextAllocExtended() differently?
>
> Maybe it would be worth the extra check. I'm trying to imagine future
> use cases.  Maybe if someone wanted to ensure that we're aligned to
> CPU cache line boundaries then the chances of the pointer already
> being aligned to 64 bytes is decent enough.  The problem is it that
> it's too late to save any memory, it just saves a bit of boxing and
> unboxing of the redirect headers.

Thinking about that a bit more, if we keep the repalloc support then
we can't do this as if we happen to get the right alignment by chance
during the palloc_aligned, then if we don't have the redirection
MemoryChunk, then we've no way to ensure we keep the alignment after a
repalloc.

David



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
On Tue, 8 Nov 2022 at 15:17, John Naylor <john.naylor@enterprisedb.com> wrote:
>
>
> On Tue, Nov 8, 2022 at 8:57 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > Is there anything we could align to CPU cacheline size that would
> > speed something up?
>
> InitCatCache() already has this, which could benefit from simpler notation.

Thanks. I wasn't aware. I'll convert that to use palloc_aligned in the patch.

David



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
John Naylor
Date:

On Tue, Nov 8, 2022 at 8:57 AM David Rowley <dgrowleyml@gmail.com> wrote:

> On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote:
> > I doubtthere's ever a need to realloc such a pointer? Perhaps we could just
> > elog(ERROR)?
>
> Are you maybe locked into just thinking about your current planned use
> case that we want to allocate BLCKSZ bytes in each case? It does not
> seem impossible to me that someone will want something more than an
> 8-byte alignment and also might want to enlarge the allocation at some
> point. I thought it might be more dangerous not to implement repalloc.
> It might not be clear to someone using palloc_aligned() that there's
> no code path that can call repalloc on the returned pointer.

I can imagine a use case for arrays of cacheline-sized objects.

> TYPEALIGN() will not work correctly unless the alignment is a power of
> 2. We could modify it to, but that would require doing some modular
> maths instead of bitmasking. That seems pretty horrible when the macro
> is given a value that's not constant at compile time as we'd end up
> with a (slow) divide in the code path.  I think the restriction is a
> good idea. I imagine there will never be any need to align to anything
> that's not a power of 2.

+1

> > Should we handle the case where we get a suitably aligned pointer from
> > MemoryContextAllocExtended() differently?
>
> Maybe it would be worth the extra check. I'm trying to imagine future
> use cases.  Maybe if someone wanted to ensure that we're aligned to
> CPU cache line boundaries then the chances of the pointer already
> being aligned to 64 bytes is decent enough.  The problem is it that
> it's too late to save any memory, it just saves a bit of boxing and
> unboxing of the redirect headers.

To my mind the main point of detecting this case is to save memory, so if that's not possible/convenient, special-casing doesn't seem worth it. 

- Assert((char *) chunk > (char *) block);
+ Assert((char *) chunk >= (char *) block);

Is this related or independent?

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
Andres Freund
Date:
Hi,

On 2022-11-08 14:57:35 +1300, David Rowley wrote:
> On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote:
> > Should we handle the case where we get a suitably aligned pointer from
> > MemoryContextAllocExtended() differently?
> 
> Maybe it would be worth the extra check. I'm trying to imagine future
> use cases.  Maybe if someone wanted to ensure that we're aligned to
> CPU cache line boundaries then the chances of the pointer already
> being aligned to 64 bytes is decent enough.  The problem is it that
> it's too late to save any memory, it just saves a bit of boxing and
> unboxing of the redirect headers.

Couldn't we reduce the amount of over-allocation by a small amount by special
casing the already-aligned case? That's not going to be relevant for page size
aligne allocations, but for smaller alignment values it could matter.

Greetings,

Andres Freund



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
On Tue, 15 Nov 2022 at 11:11, Andres Freund <andres@anarazel.de> wrote:
> Couldn't we reduce the amount of over-allocation by a small amount by special
> casing the already-aligned case? That's not going to be relevant for page size
> aligne allocations, but for smaller alignment values it could matter.

I don't quite follow this. How can we know the allocation is already
aligned without performing the allocation? To perform the allocation
we must tell palloc what size to allocate. So, we've already wasted
the space by the time we can tell if the allocation is aligned to what
we need.

Aside from that, there's already a special case for alignto <=
MAXIMUM_ALIGNOF.  But we know no palloc will ever return anything
aligned less than that in all cases, which is why that can work.

David



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
On Mon, 14 Nov 2022 at 15:25, John Naylor <john.naylor@enterprisedb.com> wrote:
> - Assert((char *) chunk > (char *) block);
> + Assert((char *) chunk >= (char *) block);
>
> Is this related or independent?

It's related.  Because the code is doing:

MemoryChunkSetHdrMask(alignedchunk, unaligned, alignto,
  MCTX_ALIGNED_REDIRECT_ID);

Here the blockoffset gets set to the difference between alignedchunk
and unaligned. Typically when we call MemoryChunkSetHdrMask, the
blockoffset is always the difference between the block and
MemoryChunk, which is never 0 due to the block header fields.  Here it
can be the same pointer when the redirection MemoryChunk is stored on
the first byte of the palloc'd address.  This can happen if the
address returned by palloc + sizeof(MemoryChunk) is aligned to what we
need already.

David



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
Andres Freund
Date:
Hi,

On 2022-11-15 23:36:53 +1300, David Rowley wrote:
> On Tue, 15 Nov 2022 at 11:11, Andres Freund <andres@anarazel.de> wrote:
> > Couldn't we reduce the amount of over-allocation by a small amount by special
> > casing the already-aligned case? That's not going to be relevant for page size
> > aligne allocations, but for smaller alignment values it could matter.
> 
> I don't quite follow this. How can we know the allocation is already
> aligned without performing the allocation? To perform the allocation
> we must tell palloc what size to allocate. So, we've already wasted
> the space by the time we can tell if the allocation is aligned to what
> we need.

What I mean is that we perhaps could over-allocate by a bit less than
  alignto + sizeof(MemoryChunk)
If the value returned by the underlying memory context is already aligned to
the correct value, we can just return it as-is.

We already rely on memory context returning MAXIMUM_ALIGNOF aligned
allocations. Adding the special case, I think, means that the we could safely
over-allocate by "only"
  alignto + sizeof(MemoryChunk) - MAXIMUM_ALIGNOF

Which would be a reasonable win for small allocations with a small >
MAXIMUM_ALIGNOF alignment. But I don't think that'll be a very common case?


> Aside from that, there's already a special case for alignto <=
> MAXIMUM_ALIGNOF.  But we know no palloc will ever return anything
> aligned less than that in all cases, which is why that can work.

Yep.

Greetings,

Andres Freund



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
Greg Stark
Date:
So I think it's kind of cute that you've implemented these as agnostic
wrappers that work with any allocator ... but why?

I would have expected the functionality to just be added directly to
the allocator to explicitly request whole aligned pages which IIRC
it's already capable of doing but just doesn't have any way to
explicitly request.

DirectIO doesn't really need a wide variety of allocation sizes or
alignments, it's always going to be the physical block size which
apparently can be as low as 512 bytes but I'm guessing we're always
going to be using 4kB alignment and multiples of 8kB allocations.
Wouldn't just having a pool of 8kB pages all aligned on 4kB or 8kB
alignment be simpler and more efficient than working around misaligned
pointers and having all these branches and arithmetic happening?



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
Andres Freund
Date:
Hi,

On 2022-11-15 16:58:10 -0500, Greg Stark wrote:
> So I think it's kind of cute that you've implemented these as agnostic
> wrappers that work with any allocator ... but why?
> 
> I would have expected the functionality to just be added directly to
> the allocator to explicitly request whole aligned pages which IIRC
> it's already capable of doing but just doesn't have any way to
> explicitly request.

We'd need to support it in multiple allocators, and they'd code quite similar
to this because for allocations that go directly to malloc.

It's possible that we'd want to add optional support for aligned allocations
to e.g. aset.c but not other allocators - this patch would allow to add
support for that transparently.


> DirectIO doesn't really need a wide variety of allocation sizes or
> alignments, it's always going to be the physical block size which
> apparently can be as low as 512 bytes but I'm guessing we're always
> going to be using 4kB alignment and multiples of 8kB allocations.

Yep - I posted numbers in some other thread showing that using a larger
alignment is a good idea.


> Wouldn't just having a pool of 8kB pages all aligned on 4kB or 8kB
> alignment be simpler and more efficient than working around misaligned
> pointers and having all these branches and arithmetic happening?

I'm quite certain it'd increase memory usage, rather than reduce it - there's
not actually a whole lot of places that need aligned pages outside of bufmgr,
so the pool would just be unused most of the time. And you'd need special code
to return those pages to the pool when the operation using the aligned buffer
fails - whereas integrating with memory contexts already takes care of
that. Lastly, there's other places where we can benefit from aligned
allocations far smaller than 4kB (most typically cacheline aligned, I'd
guess).

Greetings,

Andres Freund



Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
On Wed, 16 Nov 2022 at 08:19, Andres Freund <andres@anarazel.de> wrote:
> We already rely on memory context returning MAXIMUM_ALIGNOF aligned
> allocations. Adding the special case, I think, means that the we could safely
> over-allocate by "only"
>   alignto + sizeof(MemoryChunk) - MAXIMUM_ALIGNOF
>
> Which would be a reasonable win for small allocations with a small >
> MAXIMUM_ALIGNOF alignment. But I don't think that'll be a very common case?

Seems reasonable.  Subtracting MAXIMUM_ALIGNOF doesn't add any
additional run-time cost since it will be constant folded with the
sizeof(MemoryChunk).

I've attached an updated patch.  The 0002 is just intended to exercise
these allocations a little bit, it's not intended for commit. I was
using that to ensure valgrind does not complain about anything.  It
seems happy now.

David

Attachment

Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

From
David Rowley
Date:
On Wed, 16 Nov 2022 at 23:56, David Rowley <dgrowleyml@gmail.com> wrote:
> I've attached an updated patch.  The 0002 is just intended to exercise
> these allocations a little bit, it's not intended for commit. I was
> using that to ensure valgrind does not complain about anything.  It
> seems happy now.

After making some comment adjustments and having adjusted the
calculation of how to get the old chunk size when doing repalloc() on
an aligned chunk, I've now pushed this.

Thank you for the reviews.

David