Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize
Date
Msg-id CAMkU=1zVD82voXw1vBG1kWcz5c2G=SupGohPKM0ThwmpRK1Ddw@mail.gmail.com
Whole thread Raw
In response to MemoryContextAllocHuge(): selectively bypassing MaxAllocSize  (Noah Misch <noah@leadboat.com>)
Responses Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Mon, May 13, 2013 at 7:26 AM, Noah Misch <noah@leadboat.com> wrote:
A memory chunk allocated through the existing palloc.h interfaces is limited
to MaxAllocSize (~1 GiB).  This is best for most callers; SET_VARSIZE() need
not check its own 1 GiB limit, and algorithms that grow a buffer by doubling
need not check for overflow.  However, a handful of callers are quite happy to
navigate those hazards in exchange for the ability to allocate a larger chunk.

This patch introduces MemoryContextAllocHuge() and repalloc_huge() that check
a higher MaxAllocHugeSize limit of SIZE_MAX/2.  Chunks don't bother recording
whether they were allocated as huge; one can start with palloc() and then
repalloc_huge() to grow the value.

Since it doesn't record the size, I assume the non-use as a varlena is enforced only by coder discipline and not by the system?

!  * represented in a varlena header.  Callers that never use the allocation as
!  * a varlena can access the higher limit with MemoryContextAllocHuge().  Both
!  * limits permit code to assume that it may compute (in size_t math) twice an
!  * allocation's size without overflow.

What is likely to happen if I accidentally let a pointer to huge memory escape to someone who then passes it to varlena constructor without me knowing it?  (I tried sabotaging the code to make this happen, but I could not figure out how to).   Is there a place we can put an Assert to catch this mistake under enable-cassert builds?

I have not yet done a detailed code review, but this applies and builds cleanly, passes make check with and without enable-cassert, it does what it says (and gives performance improvements when it does kick in), and we want this.  No doc changes should be needed, we probably don't want run an automatic regression test of the size needed to usefully test this, and as far as I know there is no infrastructure for "big memory only" tests.

The only danger I can think of is that it could sometimes make some sorts slower, as using more memory than is necessary can sometimes slow down an "external" sort (because the heap is then too big for the fastest CPU cache).  If you use more tapes, but not enough more to reduce the number of passes needed, then you can get a slowdown.

I can't imagine that it would make things worse on average, though, as the benefit of doing more sorts as quicksorts rather than merge sorts, or doing mergesort with fewer number of passes, would outweigh sometimes doing a slower mergesort.  If someone has a pathological use pattern for which the averages don't work out favorably for them, they could probably play with work_mem to correct the problem.  Whereas without the patch, people who want more memory have no options.

People have mentioned additional things that could be done in this area, but I don't think that applying this patch will make those things harder, or back us into a corner.  Taking an incremental approach seems suitable.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Testing Cascading Replication
Next
From: Tom Lane
Date:
Subject: Re: PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn