Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize
Date
Msg-id CA+U5nMK7F2MzbJ2jyNhGX=VxNcxwkHYKLZ0WdiU4Eqpp4=BXhg@mail.gmail.com
Whole thread Raw
In response to MemoryContextAllocHuge(): selectively bypassing MaxAllocSize  (Noah Misch <noah@leadboat.com>)
Responses Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize
List pgsql-hackers
On 13 May 2013 15:26, Noah Misch <noah@leadboat.com> wrote:
> A memory chunk allocated through the existing palloc.h interfaces is limited
> to MaxAllocSize (~1 GiB).  This is best for most callers; SET_VARSIZE() need
> not check its own 1 GiB limit, and algorithms that grow a buffer by doubling
> need not check for overflow.  However, a handful of callers are quite happy to
> navigate those hazards in exchange for the ability to allocate a larger chunk.
>
> This patch introduces MemoryContextAllocHuge() and repalloc_huge() that check
> a higher MaxAllocHugeSize limit of SIZE_MAX/2.  Chunks don't bother recording
> whether they were allocated as huge; one can start with palloc() and then
> repalloc_huge() to grow the value.

I like the design and think its workable.

I'm concerned that people will accidentally use MaxAllocSize. Can we
put in a runtime warning if someone tests AllocSizeIsValid() with a
larger value?

>  To demonstrate, I put this to use in
> tuplesort.c; the patch also updates tuplestore.c to keep them similar.  Here's
> the trace_sort from building the pgbench_accounts primary key at scale factor
> 7500, maintenance_work_mem = '56GB'; memtuples itself consumed 17.2 GiB:
>
> LOG:  internal sort ended, 48603324 KB used: CPU 75.65s/305.46u sec elapsed 391.21 sec
>
> Compare:
>
> LOG:  external sort ended, 1832846 disk blocks used: CPU 77.45s/988.11u sec elapsed 1146.05 sec

Cool.

I'd like to put in an explicit test for this somewhere. Obviously not
part of normal regression, but somewhere, at least, so we have
automated testing that we all agree on. (yes, I know we don't have
that for replication/recovery yet, but thats why I don't want to
repeat that mistake).

> This was made easier by tuplesort growth algorithm improvements in commit
> 8ae35e91807508872cabd3b0e8db35fc78e194ac.  The problem has come up before
> (TODO item "Allow sorts to use more available memory"), and Tom floated the
> idea[1] behind the approach I've used.  The next limit faced by sorts is
> INT_MAX concurrent tuples in memory, which limits helpful work_mem to about
> 150 GiB when sorting int4.
>
> I have not added variants like palloc_huge() and palloc0_huge(), and I have
> not added to the frontend palloc.h interface.  There's no particular barrier
> to doing any of that.  I don't expect more than a dozen or so callers, so most
> of the variations might go unused.
>
> The comment at MaxAllocSize said that aset.c expects doubling the size of an
> arbitrary allocation to never overflow, but I couldn't find the code in
> question.  AllocSetAlloc() does double sizes of blocks used to aggregate small
> allocations, so maxBlockSize had better stay under SIZE_MAX/2.  Nonetheless,
> that expectation does apply to dozens of repalloc() users outside aset.c, and
> I preserved it for repalloc_huge().  64-bit builds will never notice, and I
> won't cry for the resulting 2 GiB limit on 32-bit.

Agreed. Can we document this for the relevant parameters?

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize
Next
From: Simon Riggs
Date:
Subject: Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize