Re: PG18 GIN parallel index build crash - invalid memory alloc request size - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PG18 GIN parallel index build crash - invalid memory alloc request size
Date
Msg-id a502d863-382b-41bb-8d61-ad2ee9cd5a50@vondra.me
Whole thread Raw
In response to Re: PG18 GIN parallel index build crash - invalid memory alloc request size  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
On 10/29/25 01:05, Tomas Vondra wrote:
> ...
>> Yeah, I definitely want to protect against this. I believe similar
> failures can happen even with much lower m_w_m values (possibly ~2-3GB),
> although only with weird/skewed data sets. AFAICS a constant
> single-element array would trigger this, but I haven't tested that.
> 
> Serial builds can fail with large maintenance_work_mem too, like this:
> 
>   ERROR: posting list is too long
>   HINT: Reduce "maintenance_work_mem".
> 
> but it's deterministic, and it's actually a proper error message, not
> just some weird "invalid alloc size".
> 
> Attached is a v3 of the patch series. 0001 and 0002 were already posted,
> and I believe either of those would address the issue. 0003 is more of
> an optimization, further reducing the memory usage.
> 
> I'm putting this through additional testing, which takes time. But it
> seems there's still some loose end in 0001, as I just got the "invalid
> alloc request" failure with it applied ... I'll take a look tomorrow.
> 

Unsurprisingly, there were a couple more palloc/repalloc calls (in
ginPostingListDecodeAllSegments) that could fail with long TID lists
produced when merging worker data. The attached v4 fixes this.

However, I see this as a sign that allowing huge allocations is not the
right way to fix this. The GIN code generally assumes, and I don't think
reworking this in a bugfix seems a bit too invasive. And I'm not really
certain this is the last place that could hit this.

Another argument against 0001 is using more memory does not really help
anything. It's not any faster or simpler. It's more like "let's use the
memory we have" rather than "let's use the memory we need".

So I'm planning to get rid of 0001, and fix that by 0002 or 0002+0003.
That seems like a better and (unexpectedly) less invasive fix.


regards

-- 
Tomas Vondra
Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Thoughts on a "global" client configuration?
Next
From: Bruce Momjian
Date:
Subject: Re: Why pg_dump overwrites dump file?