MemoryContextAllocHuge(): selectively bypassing MaxAllocSize - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | MemoryContextAllocHuge(): selectively bypassing MaxAllocSize |
Date | |
Msg-id | 20130513142653.GB171500@tornado.leadboat.com Whole thread Raw |
Responses |
Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize
(Pavel Stehule <pavel.stehule@gmail.com>)
Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize (Stephen Frost <sfrost@snowman.net>) Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize (Simon Riggs <simon@2ndQuadrant.com>) Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-hackers |
A memory chunk allocated through the existing palloc.h interfaces is limited to MaxAllocSize (~1 GiB). This is best for most callers; SET_VARSIZE() need not check its own 1 GiB limit, and algorithms that grow a buffer by doubling need not check for overflow. However, a handful of callers are quite happy to navigate those hazards in exchange for the ability to allocate a larger chunk. This patch introduces MemoryContextAllocHuge() and repalloc_huge() that check a higher MaxAllocHugeSize limit of SIZE_MAX/2. Chunks don't bother recording whether they were allocated as huge; one can start with palloc() and then repalloc_huge() to grow the value. To demonstrate, I put this to use in tuplesort.c; the patch also updates tuplestore.c to keep them similar. Here's the trace_sort from building the pgbench_accounts primary key at scale factor 7500, maintenance_work_mem = '56GB'; memtuples itself consumed 17.2 GiB: LOG: internal sort ended, 48603324 KB used: CPU 75.65s/305.46u sec elapsed 391.21 sec Compare: LOG: external sort ended, 1832846 disk blocks used: CPU 77.45s/988.11u sec elapsed 1146.05 sec This was made easier by tuplesort growth algorithm improvements in commit 8ae35e91807508872cabd3b0e8db35fc78e194ac. The problem has come up before (TODO item "Allow sorts to use more available memory"), and Tom floated the idea[1] behind the approach I've used. The next limit faced by sorts is INT_MAX concurrent tuples in memory, which limits helpful work_mem to about 150 GiB when sorting int4. I have not added variants like palloc_huge() and palloc0_huge(), and I have not added to the frontend palloc.h interface. There's no particular barrier to doing any of that. I don't expect more than a dozen or so callers, so most of the variations might go unused. The comment at MaxAllocSize said that aset.c expects doubling the size of an arbitrary allocation to never overflow, but I couldn't find the code in question. AllocSetAlloc() does double sizes of blocks used to aggregate small allocations, so maxBlockSize had better stay under SIZE_MAX/2. Nonetheless, that expectation does apply to dozens of repalloc() users outside aset.c, and I preserved it for repalloc_huge(). 64-bit builds will never notice, and I won't cry for the resulting 2 GiB limit on 32-bit. Thanks, nm [1] http://www.postgresql.org/message-id/19908.1297696263@sss.pgh.pa.us -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Attachment
pgsql-hackers by date: