Re: Proposal: Limitations of palloc inside checkpointer - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Proposal: Limitations of palloc inside checkpointer
Date
Msg-id c1993b75-a5bc-42fd-bbf1-6f06a1b37107@iki.fi
Whole thread Raw
In response to Re: Proposal: Limitations of palloc inside checkpointer  (Maxim Orlov <orlovmg@gmail.com>)
List pgsql-hackers
On 12/03/2025 13:00, Maxim Orlov wrote:
> On Wed, 12 Mar 2025 at 10:27, Xuneng Zhou <xunengzhou@gmail.com 
> <mailto:xunengzhou@gmail.com>> wrote:
> 
>> The patch itself looks ok to me. I'm curious about the trade-offs
>> between this incremental approach and the alternative of
>> using palloc_extended() with the MCXT_ALLOC_HUGE flag. The approach
>> of splitting the requests into fixed-size slices  avoids OOM
>> failures or process termination by the OOM killer, which is good.

+1

>> However, it does add some overhead with additional lock acquisition/
>> release cycles and memory movement operations via memmove(). The
>> natural question is whether the security justify the cost.

Hmm, if you turned the array into a ring buffer, you could absorb part 
of the queue without the memmove().

With that, I'd actually suggest using a much smaller allocation, maybe 
10000 entries or even less. That should be enough to keep the lock 
acquire/release overhead acceptable.

>> Regarding the slice size of 1 GB,  is this derived from
>> MaxAllocSize limit, or was it chosen for other performance
>> reasons? whether a different size might offer better performance
>> under typical workloads?
> 
> I think 1 GB is derived purely from MaxAllocSize. This "palloc" is a 
> relatively old one, and no one expected the number of requests to exceed 
> 1 GB. Now we have the ability to set the shared_buffers to a huge number 
> (without discussing now whether this makes any real sense), thus this 
> limit for palloc becomes a problem.

Yeah. Frankly even 1 GB seems excessive for this. We set max_requests = 
NBuffers, which makes some sense so that you can fit the worst case 
scenario of quickly evicting all pages from the buffer cache. But even 
then I'd expect the checkpointer to be able to absorb the changes before 
the queue fills up. And we have the compaction logic now too.

So I wonder if we should cap max_requests at, say, 10 million requests now?

CompactCheckpointerRequestQueue() makes a pretty large allocation too, 
BTW, although much smaller than the one in AbsorbSyncRequests(). The 
hash table it builds could grow quite large though.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: AIO v2.5
Next
From: Thom Brown
Date:
Subject: Re: In-placre persistance change of a relation