Thread: Add palloc_aligned() to allow arbitrary power of 2 memory alignment
Part of the work that Thomas mentions in [1], regarding Direct I/O, has certain requirements that pointers must be page-aligned. I've attached a patch which implements palloc_aligned() and MemoryContextAllocAligned() which accept an 'alignto' parameter which must be a power-of-2 value. The memory addresses returned by these functions will be aligned by the requested alignment. Primarily, this work is by Andres. I took what he had and cleaned it up, fixed a few minor bugs then implemented repalloc() and GetMemoryChunkSpace() functionality. The way this works is that palloc_aligned() can be called for any of the existing MemoryContext types. What we do is perform a normal allocation request, but we add additional bytes to the size request to allow the proper alignment of the pointer that we return. Since we have no control over the alignment of the return value from the allocation requests, we must adjust the pointer returned by the allocation function to align it to the required alignment. When an operation such as pfree() or repalloc() is performed on a pointer retuned by palloc_aligned(), we can't go trying to pfree the aligned pointer as this is not the pointer that was returned by the allocation function. To make all this work, another MemoryChunk struct exists directly before the aligned pointer which has the MemoryContextMethodID set to MCTX_ALIGNED_REDIRECT_ID. These "redirection" MemoryChunks have the "block offset" set to allow the actual MemoryChunk of the original allocation to be found. We just subtract the number of bytes stored in the block offset, which is just the same as how we now find the owning AllocBlock from the MemoryChunk in aset.c. Once we do that offset calculation, we can just pfree() the original chunk. The 'alignto' is stored in this "redirection" MemoryChunk so that repalloc() knows what the original alignment request was so that it the repalloc'd chunk can be aligned by that amount too. In the attached patch, there are not yet any users of these new 2 functions. As mentioned above, Thomas is proposing some patches to implement Direct I/O in [1] which will use these functions. Because I touched memory contexts last, it likely makes the most sense for me to work on this portion of the patch. Comments welcome. Patch attached. (I did rip out all the I/O specific portions from Andres' patch which Thomas proposes as his 0002 patch. Thomas will need to rebase (sorry)). David [1] https://www.postgresql.org/message-id/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
Attachment
Hi!
Part of the work that Thomas mentions in [1], regarding Direct I/O,
has certain requirements that pointers must be page-aligned.
I've attached a patch which implements palloc_aligned() and
MemoryContextAllocAligned() ...
I've done a quick look and the patch is looks good to me.
Let's add tests for these functions, should we? If you think this is an overkill, feel free to trim tests for your taste.
--
Best regards,
Maxim Orlov.
Attachment
Hi, On 2022-11-02 00:28:46 +1300, David Rowley wrote: > Part of the work that Thomas mentions in [1], regarding Direct I/O, > diff --git a/src/backend/utils/mmgr/alignedalloc.c b/src/backend/utils/mmgr/alignedalloc.c > new file mode 100644 > index 0000000000..e581772758 > --- /dev/null > +++ b/src/backend/utils/mmgr/alignedalloc.c > @@ -0,0 +1,93 @@ > +/*------------------------------------------------------------------------- > + * > + * alignedalloc.c > + * Allocator functions to implement palloc_aligned FWIW, to me this is really more part of mcxt.c than its own allocator... Particularly because MemoryContextAllocAligned() et al are implemented there. > +void * > +AlignedAllocRealloc(void *pointer, Size size) I doubtthere's ever a need to realloc such a pointer? Perhaps we could just elog(ERROR)? > +/* > + * MemoryContextAllocAligned > + * Allocate 'size' bytes of memory in 'context' aligned to 'alignto' > + * bytes. > + * > + * 'flags' may be 0 or set the same as MemoryContextAllocExtended(). > + * 'alignto' must be a power of 2. > + */ > +void * > +MemoryContextAllocAligned(MemoryContext context, > + Size size, Size alignto, int flags) > +{ > + Size alloc_size; > + void *unaligned; > + void *aligned; > + > + /* wouldn't make much sense to waste that much space */ > + Assert(alignto < (128 * 1024 * 1024)); > + > + /* ensure alignto is a power of 2 */ > + Assert((alignto & (alignto - 1)) == 0); Hm, not that I can see a case for ever not using a power of two alignment... There's not really a "need" for the restriction, right? Perhaps we should note that? > + /* > + * We implement aligned pointers by simply allocating enough memory for > + * the requested size plus the alignment and an additional MemoryChunk. > + * This additional MemoryChunk is required for operations such as pfree > + * when used on the pointer returned by this function. We use this > + * "redirection" MemoryChunk in order to find the pointer to the memory > + * that was returned by the MemoryContextAllocExtended call below. We do > + * that by "borrowing" the block offset field and instead of using that to > + * find the offset into the owning block, we use it to find the original > + * allocated address. > + * > + * Here we must allocate enough extra memory so that we can still align > + * the pointer returned by MemoryContextAllocExtended and also have enough > + * space for the redirection MemoryChunk. > + */ > + alloc_size = size + alignto + sizeof(MemoryChunk); > + > + /* perform the actual allocation */ > + unaligned = MemoryContextAllocExtended(context, alloc_size, flags); Should we handle the case where we get a suitably aligned pointer from MemoryContextAllocExtended() differently? > + /* XXX: should we adjust valgrind state here? */ Probably still need to do this... Kinda hard to get right without the code getting exercised. Wonder if there's some minimal case we could actually use it for? Thanks, Andres
Thanks for having a look. On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote: > FWIW, to me this is really more part of mcxt.c than its own > allocator... Particularly because MemoryContextAllocAligned() et al are > implemented there. I'm on the fence about this one. I thought it was nice that we have a file per consumed MemoryContextMethodID. The thing that caused me to add alignedalloc.c was the various other comments in the declaration of mcxt_methods[] that mention where to find each of the methods being assigned in that array (e.g /* aset.c */). I can change it back if you feel strongly. I just don't. > I doubtthere's ever a need to realloc such a pointer? Perhaps we could just > elog(ERROR)? Are you maybe locked into just thinking about your current planned use case that we want to allocate BLCKSZ bytes in each case? It does not seem impossible to me that someone will want something more than an 8-byte alignment and also might want to enlarge the allocation at some point. I thought it might be more dangerous not to implement repalloc. It might not be clear to someone using palloc_aligned() that there's no code path that can call repalloc on the returned pointer. > Hm, not that I can see a case for ever not using a power of two > alignment... There's not really a "need" for the restriction, right? Perhaps > we should note that? TYPEALIGN() will not work correctly unless the alignment is a power of 2. We could modify it to, but that would require doing some modular maths instead of bitmasking. That seems pretty horrible when the macro is given a value that's not constant at compile time as we'd end up with a (slow) divide in the code path. I think the restriction is a good idea. I imagine there will never be any need to align to anything that's not a power of 2. > Should we handle the case where we get a suitably aligned pointer from > MemoryContextAllocExtended() differently? Maybe it would be worth the extra check. I'm trying to imagine future use cases. Maybe if someone wanted to ensure that we're aligned to CPU cache line boundaries then the chances of the pointer already being aligned to 64 bytes is decent enough. The problem is it that it's too late to save any memory, it just saves a bit of boxing and unboxing of the redirect headers. > > + /* XXX: should we adjust valgrind state here? */ > > Probably still need to do this... Kinda hard to get right without the code > getting exercised. Yeah, that comment kept catching my eye. I agree. That should be handled correctly. I'll work on that. > Wonder if there's some minimal case we could actually use > it for? Is there anything we could align to CPU cacheline size that would speed something up? David
On Fri, 4 Nov 2022 at 04:20, Maxim Orlov <orlovmg@gmail.com> wrote: > I've done a quick look and the patch is looks good to me. > Let's add tests for these functions, should we? If you think this is an overkill, feel free to trim tests for your taste. Thanks for doing that. I'm keen to wait a bit and see if we can come up with a core user of this before adding a test module. However, if we keep the repalloc() implementation, then I wonder if it might be worth having a test module for that. I see you're not testing it in the one you've written. Andres has suggested I remove the repalloc stuff I added but see my reply to that. I think we should keep it based on the fact that someone using palloc_aligned might have no idea if some other code path can call repalloc() on the returned pointer. David
On Tue, Nov 8, 2022 at 8:57 AM David Rowley <dgrowleyml@gmail.com> wrote:
> Is there anything we could align to CPU cacheline size that would
> speed something up?
InitCatCache() already has this, which could benefit from simpler notation.
/*
* Allocate a new cache structure, aligning to a cacheline boundary
*
* Note: we rely on zeroing to initialize all the dlist headers correctly
*/
sz = sizeof(CatCache) + PG_CACHE_LINE_SIZE;
cp = (CatCache *) CACHELINEALIGN(palloc0(sz));
--
John Naylor
EDB: http://www.enterprisedb.com
* Allocate a new cache structure, aligning to a cacheline boundary
*
* Note: we rely on zeroing to initialize all the dlist headers correctly
*/
sz = sizeof(CatCache) + PG_CACHE_LINE_SIZE;
cp = (CatCache *) CACHELINEALIGN(palloc0(sz));
--
John Naylor
EDB: http://www.enterprisedb.com
Hi, On 2022-11-08 15:01:18 +1300, David Rowley wrote: > Andres has suggested I remove the repalloc stuff I added but see my reply to > that. I'm fine with keeping it, I just couldn't really think of cases that have strict alignment requirements but also requires resizing. Greetings, Andres Freund
On Tue, 8 Nov 2022 at 14:57, David Rowley <dgrowleyml@gmail.com> wrote: > On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote: > > Should we handle the case where we get a suitably aligned pointer from > > MemoryContextAllocExtended() differently? > > Maybe it would be worth the extra check. I'm trying to imagine future > use cases. Maybe if someone wanted to ensure that we're aligned to > CPU cache line boundaries then the chances of the pointer already > being aligned to 64 bytes is decent enough. The problem is it that > it's too late to save any memory, it just saves a bit of boxing and > unboxing of the redirect headers. Thinking about that a bit more, if we keep the repalloc support then we can't do this as if we happen to get the right alignment by chance during the palloc_aligned, then if we don't have the redirection MemoryChunk, then we've no way to ensure we keep the alignment after a repalloc. David
On Tue, 8 Nov 2022 at 15:17, John Naylor <john.naylor@enterprisedb.com> wrote: > > > On Tue, Nov 8, 2022 at 8:57 AM David Rowley <dgrowleyml@gmail.com> wrote: > > Is there anything we could align to CPU cacheline size that would > > speed something up? > > InitCatCache() already has this, which could benefit from simpler notation. Thanks. I wasn't aware. I'll convert that to use palloc_aligned in the patch. David
On Tue, Nov 8, 2022 at 8:57 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote:
> > I doubtthere's ever a need to realloc such a pointer? Perhaps we could just
> > elog(ERROR)?
>
> Are you maybe locked into just thinking about your current planned use
> case that we want to allocate BLCKSZ bytes in each case? It does not
> seem impossible to me that someone will want something more than an
> 8-byte alignment and also might want to enlarge the allocation at some
> point. I thought it might be more dangerous not to implement repalloc.
> It might not be clear to someone using palloc_aligned() that there's
> no code path that can call repalloc on the returned pointer.
I can imagine a use case for arrays of cacheline-sized objects.
> TYPEALIGN() will not work correctly unless the alignment is a power of
> 2. We could modify it to, but that would require doing some modular
> maths instead of bitmasking. That seems pretty horrible when the macro
> is given a value that's not constant at compile time as we'd end up
> with a (slow) divide in the code path. I think the restriction is a
> good idea. I imagine there will never be any need to align to anything
> that's not a power of 2.
+1
> > Should we handle the case where we get a suitably aligned pointer from
> > MemoryContextAllocExtended() differently?
>
> Maybe it would be worth the extra check. I'm trying to imagine future
> use cases. Maybe if someone wanted to ensure that we're aligned to
> CPU cache line boundaries then the chances of the pointer already
> being aligned to 64 bytes is decent enough. The problem is it that
> it's too late to save any memory, it just saves a bit of boxing and
> unboxing of the redirect headers.
To my mind the main point of detecting this case is to save memory, so if that's not possible/convenient, special-casing doesn't seem worth it.
- Assert((char *) chunk > (char *) block);
+ Assert((char *) chunk >= (char *) block);
Is this related or independent?
--
John Naylor
EDB: http://www.enterprisedb.com
Hi, On 2022-11-08 14:57:35 +1300, David Rowley wrote: > On Tue, 8 Nov 2022 at 05:24, Andres Freund <andres@anarazel.de> wrote: > > Should we handle the case where we get a suitably aligned pointer from > > MemoryContextAllocExtended() differently? > > Maybe it would be worth the extra check. I'm trying to imagine future > use cases. Maybe if someone wanted to ensure that we're aligned to > CPU cache line boundaries then the chances of the pointer already > being aligned to 64 bytes is decent enough. The problem is it that > it's too late to save any memory, it just saves a bit of boxing and > unboxing of the redirect headers. Couldn't we reduce the amount of over-allocation by a small amount by special casing the already-aligned case? That's not going to be relevant for page size aligne allocations, but for smaller alignment values it could matter. Greetings, Andres Freund
On Tue, 15 Nov 2022 at 11:11, Andres Freund <andres@anarazel.de> wrote: > Couldn't we reduce the amount of over-allocation by a small amount by special > casing the already-aligned case? That's not going to be relevant for page size > aligne allocations, but for smaller alignment values it could matter. I don't quite follow this. How can we know the allocation is already aligned without performing the allocation? To perform the allocation we must tell palloc what size to allocate. So, we've already wasted the space by the time we can tell if the allocation is aligned to what we need. Aside from that, there's already a special case for alignto <= MAXIMUM_ALIGNOF. But we know no palloc will ever return anything aligned less than that in all cases, which is why that can work. David
On Mon, 14 Nov 2022 at 15:25, John Naylor <john.naylor@enterprisedb.com> wrote: > - Assert((char *) chunk > (char *) block); > + Assert((char *) chunk >= (char *) block); > > Is this related or independent? It's related. Because the code is doing: MemoryChunkSetHdrMask(alignedchunk, unaligned, alignto, MCTX_ALIGNED_REDIRECT_ID); Here the blockoffset gets set to the difference between alignedchunk and unaligned. Typically when we call MemoryChunkSetHdrMask, the blockoffset is always the difference between the block and MemoryChunk, which is never 0 due to the block header fields. Here it can be the same pointer when the redirection MemoryChunk is stored on the first byte of the palloc'd address. This can happen if the address returned by palloc + sizeof(MemoryChunk) is aligned to what we need already. David
Hi, On 2022-11-15 23:36:53 +1300, David Rowley wrote: > On Tue, 15 Nov 2022 at 11:11, Andres Freund <andres@anarazel.de> wrote: > > Couldn't we reduce the amount of over-allocation by a small amount by special > > casing the already-aligned case? That's not going to be relevant for page size > > aligne allocations, but for smaller alignment values it could matter. > > I don't quite follow this. How can we know the allocation is already > aligned without performing the allocation? To perform the allocation > we must tell palloc what size to allocate. So, we've already wasted > the space by the time we can tell if the allocation is aligned to what > we need. What I mean is that we perhaps could over-allocate by a bit less than alignto + sizeof(MemoryChunk) If the value returned by the underlying memory context is already aligned to the correct value, we can just return it as-is. We already rely on memory context returning MAXIMUM_ALIGNOF aligned allocations. Adding the special case, I think, means that the we could safely over-allocate by "only" alignto + sizeof(MemoryChunk) - MAXIMUM_ALIGNOF Which would be a reasonable win for small allocations with a small > MAXIMUM_ALIGNOF alignment. But I don't think that'll be a very common case? > Aside from that, there's already a special case for alignto <= > MAXIMUM_ALIGNOF. But we know no palloc will ever return anything > aligned less than that in all cases, which is why that can work. Yep. Greetings, Andres Freund
So I think it's kind of cute that you've implemented these as agnostic wrappers that work with any allocator ... but why? I would have expected the functionality to just be added directly to the allocator to explicitly request whole aligned pages which IIRC it's already capable of doing but just doesn't have any way to explicitly request. DirectIO doesn't really need a wide variety of allocation sizes or alignments, it's always going to be the physical block size which apparently can be as low as 512 bytes but I'm guessing we're always going to be using 4kB alignment and multiples of 8kB allocations. Wouldn't just having a pool of 8kB pages all aligned on 4kB or 8kB alignment be simpler and more efficient than working around misaligned pointers and having all these branches and arithmetic happening?
Hi, On 2022-11-15 16:58:10 -0500, Greg Stark wrote: > So I think it's kind of cute that you've implemented these as agnostic > wrappers that work with any allocator ... but why? > > I would have expected the functionality to just be added directly to > the allocator to explicitly request whole aligned pages which IIRC > it's already capable of doing but just doesn't have any way to > explicitly request. We'd need to support it in multiple allocators, and they'd code quite similar to this because for allocations that go directly to malloc. It's possible that we'd want to add optional support for aligned allocations to e.g. aset.c but not other allocators - this patch would allow to add support for that transparently. > DirectIO doesn't really need a wide variety of allocation sizes or > alignments, it's always going to be the physical block size which > apparently can be as low as 512 bytes but I'm guessing we're always > going to be using 4kB alignment and multiples of 8kB allocations. Yep - I posted numbers in some other thread showing that using a larger alignment is a good idea. > Wouldn't just having a pool of 8kB pages all aligned on 4kB or 8kB > alignment be simpler and more efficient than working around misaligned > pointers and having all these branches and arithmetic happening? I'm quite certain it'd increase memory usage, rather than reduce it - there's not actually a whole lot of places that need aligned pages outside of bufmgr, so the pool would just be unused most of the time. And you'd need special code to return those pages to the pool when the operation using the aligned buffer fails - whereas integrating with memory contexts already takes care of that. Lastly, there's other places where we can benefit from aligned allocations far smaller than 4kB (most typically cacheline aligned, I'd guess). Greetings, Andres Freund
On Wed, 16 Nov 2022 at 08:19, Andres Freund <andres@anarazel.de> wrote: > We already rely on memory context returning MAXIMUM_ALIGNOF aligned > allocations. Adding the special case, I think, means that the we could safely > over-allocate by "only" > alignto + sizeof(MemoryChunk) - MAXIMUM_ALIGNOF > > Which would be a reasonable win for small allocations with a small > > MAXIMUM_ALIGNOF alignment. But I don't think that'll be a very common case? Seems reasonable. Subtracting MAXIMUM_ALIGNOF doesn't add any additional run-time cost since it will be constant folded with the sizeof(MemoryChunk). I've attached an updated patch. The 0002 is just intended to exercise these allocations a little bit, it's not intended for commit. I was using that to ensure valgrind does not complain about anything. It seems happy now. David
Attachment
On Wed, 16 Nov 2022 at 23:56, David Rowley <dgrowleyml@gmail.com> wrote: > I've attached an updated patch. The 0002 is just intended to exercise > these allocations a little bit, it's not intended for commit. I was > using that to ensure valgrind does not complain about anything. It > seems happy now. After making some comment adjustments and having adjusted the calculation of how to get the old chunk size when doing repalloc() on an aligned chunk, I've now pushed this. Thank you for the reviews. David