Thread: copy.c allocation constant

copy.c allocation constant

From
Andrew Dunstan
Date:

While reading copy.c I noticed this line:


#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */


Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as
65535 so we palloc a power of 2?


I have no idea if this affects performance, but it did strike me as strange.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: copy.c allocation constant

From
Bruce Momjian
Date:
On Tue, Nov 28, 2017 at 11:51:28AM -0500, Andrew Dunstan wrote:
> 
> 
> While reading copy.c I noticed this line:
> 
> 
> #define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
> 
> 
> Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as
> 65535 so we palloc a power of 2?
> 
> 
> I have no idea if this affects performance, but it did strike me as strange.

Coming in late here, but it does seem very odd.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: copy.c allocation constant

From
Tomas Vondra
Date:

On 01/24/2018 04:14 AM, Bruce Momjian wrote:
> On Tue, Nov 28, 2017 at 11:51:28AM -0500, Andrew Dunstan wrote:
>>
>>
>> While reading copy.c I noticed this line:
>>
>>
>> #define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
>>
>>
>> Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as
>> 65535 so we palloc a power of 2?
>>
>>
>> I have no idea if this affects performance, but it did strike me
>> as strange.
> 
> Coming in late here, but it does seem very odd.
> 

I think there there are two things to consider - aset.c and glibc.

In AllocSet this is handled as over-sized chunk, that is chunk greater
than ALLOC_CHUNK_LIMIT (which ends up being 8kB). Which means it's
allocated as a separate block, and does not go through the freelists. So
the size does not follow the usual "power of 2" pattern and is only
maxalign-ed, and then freed immediately on pfree.

So for AllocSet we're fine, and this should not result in any memory
inefficiency, assuming there are not many such requests (which would
result in many malloc/free calls, which may be somewhat expensive).

At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
with similar ideas (freelists, ...) so hopefully it's fine too.

And then there are the systems without glibc, or with other libc
implementations. No idea about those.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: copy.c allocation constant

From
Bruce Momjian
Date:
On Wed, Jan 24, 2018 at 06:25:01PM +0100, Tomas Vondra wrote:
> > 
> 
> I think there there are two things to consider - aset.c and glibc.
> 
> In AllocSet this is handled as over-sized chunk, that is chunk greater
> than ALLOC_CHUNK_LIMIT (which ends up being 8kB). Which means it's
> allocated as a separate block, and does not go through the freelists. So
> the size does not follow the usual "power of 2" pattern and is only
> maxalign-ed, and then freed immediately on pfree.
> 
> So for AllocSet we're fine, and this should not result in any memory
> inefficiency, assuming there are not many such requests (which would
> result in many malloc/free calls, which may be somewhat expensive).
> 
> At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
> with similar ideas (freelists, ...) so hopefully it's fine too.
> 
> And then there are the systems without glibc, or with other libc
> implementations. No idea about those.

Thanks for the analysis.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: copy.c allocation constant

From
Robert Haas
Date:
On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
> with similar ideas (freelists, ...) so hopefully it's fine too.
>
> And then there are the systems without glibc, or with other libc
> implementations. No idea about those.

My guess is that a fairly common pattern for larger chunks will be to
round the size up to a multiple of 4kB, the usual memory page size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: copy.c allocation constant

From
Andres Freund
Date:
On 2018-01-24 13:19:19 -0500, Robert Haas wrote:
> On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
> > with similar ideas (freelists, ...) so hopefully it's fine too.
> >
> > And then there are the systems without glibc, or with other libc
> > implementations. No idea about those.
> 
> My guess is that a fairly common pattern for larger chunks will be to
> round the size up to a multiple of 4kB, the usual memory page size.

Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
platforms though. From man mallopt:
 Balancing  these  factors  leads  to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
Additionally, even when malloc() chooses to use mmap() to back an
allocation, it'll still needs a header to know the size of the
allocation and such. So exactly using a size of a multiple of 4KB will
still leave you with wasted space.  Due to the latter I can't see it
mattering whether or not we add +1 to a power-of-two size.

There's malloc_usable_size() returning the actual size of an
allocation. It'd probably be worthwhile to use that in a few of our own
allocator routines. If not available on a platform we can just have a
stub that returns the requested size...

Greetings,

Andres Freund


Re: copy.c allocation constant

From
Robert Haas
Date:
On Wed, Jan 24, 2018 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:
> Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
> platforms though. From man mallopt:
>  Balancing  these  factors  leads  to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
> Additionally, even when malloc() chooses to use mmap() to back an
> allocation, it'll still needs a header to know the size of the
> allocation and such. So exactly using a size of a multiple of 4KB will
> still leave you with wasted space.  Due to the latter I can't see it
> mattering whether or not we add +1 to a power-of-two size.

Well, it depends on how it works.  dsa_allocate, for example, never
adds a header to the size of the allocation.  Allocations < 8kB are
bucketed by size class and stored in superblocks carved up into
equal-sized chunks.  Allocations > 8kB are rounded to a multiple of
the 4kB page size and we grab that many consecutive free pages.  I
didn't make those behaviors up; I copied them from elsewhere.  Some
other allocator I read about did small-medium-large allocations: large
with mmap(), medium with multiples of the page size, small with
closely-spaced size classes.

It doesn't seem like a particularly good idea to take a 64kB+1 byte
allocation, stick a header on it, and pack it tightly up against other
allocations on both sides.  Seems like that could lead to
fragmentation problems.  Is that really what it does?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: copy.c allocation constant

From
Andres Freund
Date:
On 2018-01-24 14:25:37 -0500, Robert Haas wrote:
> On Wed, Jan 24, 2018 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:
> > Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
> > platforms though. From man mallopt:
> >  Balancing  these  factors  leads  to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
> > Additionally, even when malloc() chooses to use mmap() to back an
> > allocation, it'll still needs a header to know the size of the
> > allocation and such. So exactly using a size of a multiple of 4KB will
> > still leave you with wasted space.  Due to the latter I can't see it
> > mattering whether or not we add +1 to a power-of-two size.
> 
> Well, it depends on how it works.  dsa_allocate, for example, never
> adds a header to the size of the allocation.

glibc's malloc does add a header. My half-informed suspicion is that
most newer malloc backing allocators will have a header, because
maintaining a shared lookup-by-address table is pretty expensive to
maintain. A bit of metadata indicating size and/or source of the
allocation makes using thread-local information a lot easier.


> Allocations < 8kB are
> bucketed by size class and stored in superblocks carved up into
> equal-sized chunks.  Allocations > 8kB are rounded to a multiple of
> the 4kB page size and we grab that many consecutive free pages.  I
> didn't make those behaviors up; I copied them from elsewhere.  Some
> other allocator I read about did small-medium-large allocations: large
> with mmap(), medium with multiples of the page size, small with
> closely-spaced size classes.

Sure - all I'm trying to say that it likely won't matter whether we use
power-of-two or power-of-two + 1, because it seems likely that due to
overhead considerations we'll likely not quite fit into a size class
anyway.


> It doesn't seem like a particularly good idea to take a 64kB+1 byte
> allocation, stick a header on it, and pack it tightly up against other
> allocations on both sides.  Seems like that could lead to
> fragmentation problems.  Is that really what it does?

No, I'm fairly sure it's not.

Greetings,

Andres Freund


Re: copy.c allocation constant

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2018-01-24 14:25:37 -0500, Robert Haas wrote:
> > On Wed, Jan 24, 2018 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:
> > > Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
> > > platforms though. From man mallopt:
> > >  Balancing  these  factors  leads  to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
> > > Additionally, even when malloc() chooses to use mmap() to back an
> > > allocation, it'll still needs a header to know the size of the
> > > allocation and such. So exactly using a size of a multiple of 4KB will
> > > still leave you with wasted space.  Due to the latter I can't see it
> > > mattering whether or not we add +1 to a power-of-two size.
> > 
> > Well, it depends on how it works.  dsa_allocate, for example, never
> > adds a header to the size of the allocation.
> 
> glibc's malloc does add a header. My half-informed suspicion is that
> most newer malloc backing allocators will have a header, because
> maintaining a shared lookup-by-address table is pretty expensive to
> maintain. A bit of metadata indicating size and/or source of the
> allocation makes using thread-local information a lot easier.

Sounds like it'd be smart to allocate something closer to
M_MMAP_THRESHOLD (which with typical values would be about double the
amount of memory the current RAW_BUF_SIZE value), minus a few dozen
bytes to allow for palloc's and malloc's respective headers.  That way
we bet for a mmap()ed allocation with minimum space wastage across all
layers.  Not sure whether we want to try to minimize wastage through
clever use of malloc_usable_size() on each backend's first allocation --
that'd probably just be overengineering.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: copy.c allocation constant

From
Andres Freund
Date:
Hi,

On 2018-01-24 17:07:01 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > glibc's malloc does add a header. My half-informed suspicion is that
> > most newer malloc backing allocators will have a header, because
> > maintaining a shared lookup-by-address table is pretty expensive to
> > maintain. A bit of metadata indicating size and/or source of the
> > allocation makes using thread-local information a lot easier.
>
> Sounds like it'd be smart to allocate something closer to
> M_MMAP_THRESHOLD (which with typical values would be about double the
> amount of memory the current RAW_BUF_SIZE value), minus a few dozen
> bytes to allow for palloc's and malloc's respective headers.  That way
> we bet for a mmap()ed allocation with minimum space wastage across all
> layers.

In general there might be cases where that's worthwhile, although
M_MMAP_THRESHOLD IIRC isn't a constant anymore, complicating things. The
specific case this thread is discussing doesn't seem worthy of such
attention though, there's afaict no actual practical problem here.


> Not sure whether we want to try to minimize wastage through
> clever use of malloc_usable_size() on each backend's first allocation --
> that'd probably just be overengineering.

Likely also dangerous, I don't think this is a constant.

Greetings,

Andres Freund


Re: copy.c allocation constant

From
Thomas Munro
Date:
On Thu, Jan 25, 2018 at 7:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
>> with similar ideas (freelists, ...) so hopefully it's fine too.
>>
>> And then there are the systems without glibc, or with other libc
>> implementations. No idea about those.
>
> My guess is that a fairly common pattern for larger chunks will be to
> round the size up to a multiple of 4kB, the usual memory page size.

See also this discussion:


https://www.postgresql.org/message-id/flat/CAEepm%3D1bRyd%2B_W9eW-QmP1RGP03ti48zgd%3DK11Q6o4edQLgkcg%40mail.gmail.com#CAEepm=1bRyd+_W9eW-QmP1RGP03ti48zgd=K11Q6o4edQLgkcg@mail.gmail.com

TL;DR glibc doesn't actually round up like that below 128kB, but many
others including FreeBSD, macOS etc round up to various page sizes or
size classes including 8kB (!), 512 bytes.  I find this a bit
frustrating because it means that the most popular libc implementation
doesn't have the problem so this kind of thing probably isn't a high
priority, but probably on most other Unices (and I have no clue for
Windows) including my current favourite we waste a bunch of memory.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: copy.c allocation constant

From
Bruce Momjian
Date:
On Thu, Jan 25, 2018 at 09:30:54AM +1300, Thomas Munro wrote:
> On Thu, Jan 25, 2018 at 7:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> >> At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
> >> with similar ideas (freelists, ...) so hopefully it's fine too.
> >>
> >> And then there are the systems without glibc, or with other libc
> >> implementations. No idea about those.
> >
> > My guess is that a fairly common pattern for larger chunks will be to
> > round the size up to a multiple of 4kB, the usual memory page size.
> 
> See also this discussion:
> 
>
https://www.postgresql.org/message-id/flat/CAEepm%3D1bRyd%2B_W9eW-QmP1RGP03ti48zgd%3DK11Q6o4edQLgkcg%40mail.gmail.com#CAEepm=1bRyd+_W9eW-QmP1RGP03ti48zgd=K11Q6o4edQLgkcg@mail.gmail.com
> 
> TL;DR glibc doesn't actually round up like that below 128kB, but many
> others including FreeBSD, macOS etc round up to various page sizes or
> size classes including 8kB (!), 512 bytes.  I find this a bit
> frustrating because it means that the most popular libc implementation
> doesn't have the problem so this kind of thing probably isn't a high
> priority, but probably on most other Unices (and I have no clue for
> Windows) including my current favourite we waste a bunch of memory.

The BSD memory allocator used to allocate in powers of two, and keep the
header in a separate location.  They did this so they could combine two
free, identically-sized memory blocks into a single one that was double
the size.  I have no idea how it works now.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: copy.c allocation constant

From
Thomas Munro
Date:
On Thu, Jan 25, 2018 at 9:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> The BSD memory allocator used to allocate in powers of two, and keep the
> header in a separate location.  They did this so they could combine two
> free, identically-sized memory blocks into a single one that was double
> the size.  I have no idea how it works now.

According to "man malloc" on my FreeBSD 11.1 box (which uses jemalloc,
I think NetBSD and OpenBSD use something else), the code Andrew showed
at the top of this thread will waste ~16kB (because it'll round up to
80kB) and nodeHash.c will waste ~8kB for every ~32kB chunk of tuples
as described in that other thread.

       +----------------------------------------------------------+
       |Category   Spacing   Size                                 |
       |Small           lg   [8]                                  |
       |                16   [16, 32, 48, 64, 80, 96, 112, 128]   |
       |                32   [160, 192, 224, 256]                 |
       |                64   [320, 384, 448, 512]                 |
       |               128   [640, 768, 896, 1024]                |
       |               256   [1280, 1536, 1792, 2048]             |
       |               512   [2560, 3072, 3584, 4096]             |
       |             1 KiB   [5 KiB, 6 KiB, 7 KiB, 8 KiB]         |
       |             2 KiB   [10 KiB, 12 KiB, 14 KiB]             |
       |Large        2 KiB   [16 KiB]                             |
       |             4 KiB   [20 KiB, 24 KiB, 28 KiB, 32 KiB]     |
       |             8 KiB   [40 KiB, 48 KiB, 54 KiB, 64 KiB]     |
       |            16 KiB   [80 KiB, 96 KiB, 112 KiB, 128 KiB]   |
       |            32 KiB   [160 KiB, 192 KiB, 224 KiB, 256 KiB] |
       |            64 KiB   [320 KiB, 384 KiB, 448 KiB, 512 KiB] |
       |           128 KiB   [640 KiB, 768 KiB, 896 KiB, 1 MiB]   |
       |           256 KiB   [1280 KiB, 1536 KiB, 1792 KiB]       |
       |Huge       256 KiB   [2 MiB]                              |
       |           512 KiB   [2560 KiB, 3 MiB, 3584 KiB, 4 MiB]   |
       |             1 MiB   [5 MiB, 6 MiB, 7 MiB, 8 MiB]         |
       |             2 MiB   [10 MiB, 12 MiB, 14 MiB, 16 MiB]     |
       |             4 MiB   [20 MiB, 24 MiB, 28 MiB, 32 MiB]     |
       |             8 MiB   [40 MiB, 48 MiB, 56 MiB, 64 MiB]     |
       |               ...   ...                                  |
       |           512 PiB   [2560 PiB, 3 EiB, 3584 PiB, 4 EiB]   |
       |             1 EiB   [5 EiB, 6 EiB, 7 EiB]                |
       +----------------------------------------------------------+

-- 
Thomas Munro
http://www.enterprisedb.com


Re: copy.c allocation constant

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Jan 25, 2018 at 09:30:54AM +1300, Thomas Munro wrote:
>> On Thu, Jan 25, 2018 at 7:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> My guess is that a fairly common pattern for larger chunks will be to
>>> round the size up to a multiple of 4kB, the usual memory page size.
>>
>> See also this discussion:
>>
https://www.postgresql.org/message-id/flat/CAEepm%3D1bRyd%2B_W9eW-QmP1RGP03ti48zgd%3DK11Q6o4edQLgkcg%40mail.gmail.com#CAEepm=1bRyd+_W9eW-QmP1RGP03ti48zgd=K11Q6o4edQLgkcg@mail.gmail.com
>> TL;DR glibc doesn't actually round up like that below 128kB, but many
>> others including FreeBSD, macOS etc round up to various page sizes or
>> size classes including 8kB (!), 512 bytes.  I find this a bit
>> frustrating because it means that the most popular libc implementation
>> doesn't have the problem so this kind of thing probably isn't a high
>> priority, but probably on most other Unices (and I have no clue for
>> Windows) including my current favourite we waste a bunch of memory.

> The BSD memory allocator used to allocate in powers of two, and keep the
> header in a separate location.  They did this so they could combine two
> free, identically-sized memory blocks into a single one that was double
> the size.  I have no idea how it works now.

It seems like there's fairly good reason to suppose that most versions
of malloc are efficient for power-of-2 request sizes.  Our big problem
is the overhead that we add ourselves.  That, however, we could compensate
for by adjusting request sizes in cases where the request is flexible.
I'd definitely support some sort of memory context API to allow such
adjustment, as we speculated about in the thread Thomas points to above.

            regards, tom lane