Thread: memory allocation and powers of two

memory allocation and powers of two

From
David Schultz
Date:
While looking into a block size mismatch problem between
Postgresql and FreeBSD's FFS, I noticed that postgresql is making
some rather odd-sized requests to malloc(3): 0x2034, 0x2020,
0x4018, 0x8018, etc.  Most malloc(3) implementations round large
allocations up to a multiple of a large power of 2---often the
hardware page size, so this is a pathological case for those
allocators.  For example, on a machine with 4K pages, the database
may use ~50% more memory for the 0x2034 byte allocations.

Browsing through the source, it looks like the allocation set
implementation and the buffered file implementation both use
inline tags.  The latter, at least, could be easily fixed by
allocating the buffer separately, but that would only be
worthwhile if the former were also modified.

Thoughts on this particular design decision?


Re: memory allocation and powers of two

From
Tom Lane
Date:
David Schultz <dschultz@uclink.Berkeley.EDU> writes:
> While looking into a block size mismatch problem between
> Postgresql and FreeBSD's FFS, I noticed that postgresql is making
> some rather odd-sized requests to malloc(3): 0x2034, 0x2020,
> 0x4018, 0x8018, etc.

AFAICT the operative word there is "some" --- the heavily used paths
should make power-of-two requests to malloc, because aset block sizes
will normally be powers of two.  Can you put your finger on paths that
generate a significant number of non-power-of-two requests?
        regards, tom lane


Re: memory allocation and powers of two

From
Tom Lane
Date:
David Schultz <dschultz@uclink.Berkeley.EDU> writes:
> While looking into a block size mismatch problem between
> Postgresql and FreeBSD's FFS, I noticed that postgresql is making
> some rather odd-sized requests to malloc(3): 0x2034, 0x2020,
> 0x4018, 0x8018, etc.  Most malloc(3) implementations round large
> allocations up to a multiple of a large power of 2---often the
> hardware page size, so this is a pathological case for those
> allocators.

After looking more closely I saw that there were some corner cases where
aset.c would unnecessarily switch from the intended power-of-two block
sizes to non-power-of-two sizes.  I've applied the attached fix.
        regards, tom lane

*** src/backend/utils/mmgr/aset.c.orig    Sun Aug  3 23:01:13 2003
--- src/backend/utils/mmgr/aset.c    Sat Sep 13 18:20:48 2003
***************
*** 650,681 ****         }         else         {
-             /* Get size of prior block */
-             blksize = set->blocks->endptr - ((char *) set->blocks);
-              /*
!              * Special case: if very first allocation was for a large
!              * chunk (or we have a small "keeper" block), could have an
!              * undersized top block.  Do something reasonable.              */
!             if (blksize < set->initBlockSize)
!                 blksize = set->initBlockSize;
!             else
!             {
!                 /* Crank it up, but not past max */                 blksize <<= 1;
!                 if (blksize > set->maxBlockSize)
!                     blksize = set->maxBlockSize;
!             }         }          /*          * If initBlockSize is less than ALLOC_CHUNK_LIMIT, we could need
!          * more space...          */         required_size = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
!         if (blksize < required_size)
!             blksize = required_size;          /* Try to allocate it */         block = (AllocBlock) malloc(blksize);
--- 650,678 ----         }         else         {             /*
!              * Use first power of 2 that is larger than previous block,
!              * but not more than the allowed limit.  (We don't simply double
!              * the prior block size, because in some cases this could be a
!              * funny size, eg if very first allocation was for an odd-sized
!              * large chunk.)              */
!             Size    pblksize = set->blocks->endptr - ((char *) set->blocks);
! 
!             blksize = set->initBlockSize;
!             while (blksize <= pblksize)                 blksize <<= 1;
!             if (blksize > set->maxBlockSize)
!                 blksize = set->maxBlockSize;         }          /*          * If initBlockSize is less than
ALLOC_CHUNK_LIMIT,we could need
 
!          * more space... but try to keep it a power of 2.          */         required_size = chunk_size +
ALLOC_BLOCKHDRSZ+ ALLOC_CHUNKHDRSZ;
 
!         while (blksize < required_size)
!             blksize <<= 1;          /* Try to allocate it */         block = (AllocBlock) malloc(blksize);


Re: memory allocation and powers of two

From
Tom Lane
Date:
David Schultz <dschultz@uclink.Berkeley.EDU> writes:
> ... I assume that pgsql will be able to use the
> slack allocated in each chunk.

That's the theory, anyway.  Undoubtedly the extra allocation will go to
waste in some scenarios, but we're no worse off than we were before.
        regards, tom lane


Re: memory allocation and powers of two

From
David Schultz
Date:
On Sat, Sep 13, 2003, Tom Lane wrote:
> David Schultz <dschultz@uclink.Berkeley.EDU> writes:
> > While looking into a block size mismatch problem between
> > Postgresql and FreeBSD's FFS, I noticed that postgresql is making
> > some rather odd-sized requests to malloc(3): 0x2034, 0x2020,
> > 0x4018, 0x8018, etc.  Most malloc(3) implementations round large
> > allocations up to a multiple of a large power of 2---often the
> > hardware page size, so this is a pathological case for those
> > allocators.
> 
> After looking more closely I saw that there were some corner cases where
> aset.c would unnecessarily switch from the intended power-of-two block
> sizes to non-power-of-two sizes.  I've applied the attached fix.

Thanks for looking into this.  I had meant to follow up sooner
myself, but I've been under time pressure lately.  Your patch
looks good to me.  I assume that pgsql will be able to use the
slack allocated in each chunk.  I'll try my original test again
next weekend and let you know if this change clears up most of the
power-of-two-plus-epsilon issues I was seeing.