MemoryContextAllocHuge(): selectively bypassing MaxAllocSize - Mailing list pgsql-hackers

A memory chunk allocated through the existing palloc.h interfaces is limited
to MaxAllocSize (~1 GiB).  This is best for most callers; SET_VARSIZE() need
not check its own 1 GiB limit, and algorithms that grow a buffer by doubling
need not check for overflow.  However, a handful of callers are quite happy to
navigate those hazards in exchange for the ability to allocate a larger chunk.

This patch introduces MemoryContextAllocHuge() and repalloc_huge() that check
a higher MaxAllocHugeSize limit of SIZE_MAX/2.  Chunks don't bother recording
whether they were allocated as huge; one can start with palloc() and then
repalloc_huge() to grow the value.  To demonstrate, I put this to use in
tuplesort.c; the patch also updates tuplestore.c to keep them similar.  Here's
the trace_sort from building the pgbench_accounts primary key at scale factor
7500, maintenance_work_mem = '56GB'; memtuples itself consumed 17.2 GiB:

LOG:  internal sort ended, 48603324 KB used: CPU 75.65s/305.46u sec elapsed 391.21 sec

Compare:

LOG:  external sort ended, 1832846 disk blocks used: CPU 77.45s/988.11u sec elapsed 1146.05 sec

This was made easier by tuplesort growth algorithm improvements in commit
8ae35e91807508872cabd3b0e8db35fc78e194ac.  The problem has come up before
(TODO item "Allow sorts to use more available memory"), and Tom floated the
idea[1] behind the approach I've used.  The next limit faced by sorts is
INT_MAX concurrent tuples in memory, which limits helpful work_mem to about
150 GiB when sorting int4.

I have not added variants like palloc_huge() and palloc0_huge(), and I have
not added to the frontend palloc.h interface.  There's no particular barrier
to doing any of that.  I don't expect more than a dozen or so callers, so most
of the variations might go unused.

The comment at MaxAllocSize said that aset.c expects doubling the size of an
arbitrary allocation to never overflow, but I couldn't find the code in
question.  AllocSetAlloc() does double sizes of blocks used to aggregate small
allocations, so maxBlockSize had better stay under SIZE_MAX/2.  Nonetheless,
that expectation does apply to dozens of repalloc() users outside aset.c, and
I preserved it for repalloc_huge().  64-bit builds will never notice, and I
won't cry for the resulting 2 GiB limit on 32-bit.

Thanks,
nm

[1] http://www.postgresql.org/message-id/19908.1297696263@sss.pgh.pa.us

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: lock support for aarch64
Next
From: Noah Misch
Date:
Subject: Parallel Sort