Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Date
Msg-id CA+hUKGJvi2V294+r3LEDAV+DU_YD2J_79ZVGzKMkAKb3W_mw6w@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
On Mon, Mar 4, 2024 at 12:49 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 3/3/24 23:12, Thomas Munro wrote:
> > But the real long term question is what strategy
> > we're going to use to actually deal with this situation properly
> > *without* giving up our memory usage policies and hoping for the best,
> > and that remains an open question.  To summarise the two main ideas
> > put forward so far: (1) allow very high number of batches, but process
> > at most N of M batches at a time, using a temporary "all-the-rest"
> > batch to be re-partitioned to feed the next N batches + the rest in a
> > later cycle, (2) fall back to looping over batches multiple times in
> > order to keep nbatches <= a small limit while also not exceeding a
> > hash table size limit.
>
> I'm not sure about (2), but (1) sounds very much like a recursive hash
> join, where we'd allow only a limited fan out at each stage. It's also a
> bit like the "spill" files in my other proposal - that worked fine, and
> it enforced the memory limit better than the memory balancing (which is
> more like best-effort heuristics).

Yeah, I was trying to describe your spill idea in few words.  It's a good idea.

> > Both have some tricky edge cases, especially
> > with parallelism in the picture but probably even without it.  I'm
> > willing to work more on exploring this some time after the 17 cycle.
>
> I haven't thought very much about parallel hash joins, but what would be
> the corner cases for non-parallel cases? Sure, it may increase the
> amount of I/O, but only for cases with unexpectedly many batches (and
> then it's just a natural trade-off I/O vs. enforcing memory limit).

For parallel-specific problems I'd have to swap that back into my
brain... but I do remember one general problem with idea #1: if you
have too many duplicate keys, then anything based only on partition by
hash bits won't help, which was one of my objections to that idea and
the reason why I was talking about idea #2, which doesn't suffer from
that problem (but has other problems).  This morning I've been
wondering about a highly targeted solution to that: could we recognise
this condition cheaply, and then activate a special case that spills a
*bucket* to disk?  That is, a problem bucket that is indivisible by
hashing because all the hash values are the same.  Or something like
that.



pgsql-bugs by date:

Previous
From: Noah Misch
Date:
Subject: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Next
From: Thomas Munro
Date:
Subject: Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker