Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size1073741824 / Where: parallel worker - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size1073741824 / Where: parallel worker
Date
Msg-id CAEepm=1nemj1wGLsnNrVqBLmqV9N6+6WToo=prGxAXafPYiTrA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size1073741824 / Where: parallel worker  (Frits Jalvingh <jal@etc.to>)
Responses Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size1073741824 / Where: parallel worker
List pgsql-bugs
On Tue, Jun 5, 2018 at 11:08 PM, Frits Jalvingh <jal@etc.to> wrote:
> Hi Thomas, thanks for your help.
>
> The problem seems to be not very consistent. Without any changes to either
> database or query I now get:
> org.postgresql.util.PSQLException: ERROR: could not resize shared memory
> segment "/PostgreSQL.347733883" to 2147483648 bytes: No space left on device

That's what it looks like when the OS just can't allocate any more
virtual memory (posix_fallocate() on a shared memory region failed).
The solution to that problem is to turn down work_mem.  With our
current model of memory management, we'll use work_mem for every
backend for every sort or hash join.  So when you said work_mem = 2GB
and max_parallel_workers = 2 (+ 1 leader process = 3 backends) and you
joined N tables together (that's a very hand wavy and stupid
approximation of the number of nodes), you said it was OK to use a
total of 2GB * (N - 1) * 3.  That's apparently too much for this rig.
Does it work if you turn work_mem down?

> Just executing the statement WITHOUT the explain does work properly, so it
> is the explain part that seems to trigger the issue.

That is weird.  I don't have a theory to explain that yet.

> I set enable_parallel_hash =  off in the code and now the explain part works
> properly. The plan XML looks like this:

>           <Node-Type>Sort</Node-Type>
>           <Plan-Rows>1462</Plan-Rows>
>           <Actual-Rows>184162324</Actual-Rows>

>               <Node-Type>Hash Join</Node-Type>
>               <Plan-Rows>1462</Plan-Rows>
>               <Actual-Rows>184162324</Actual-Rows>

>                               <Node-Type>Hash</Node-Type>
>                               <Plan-Rows>47373</Plan-Rows>
>                               <Actual-Rows>731320569</Actual-Rows>

Thanks.  Hmm.  I didn't try to understand this plan in detail but
clearly it is dramatically underestimating cardinality, and there are
large numbers involved that would result in nbuckets of the right sort
of order.

If the first error you reported results from an attempt to allocate a
very large bucket array (rather than, say, a monster tuple >= 1GB), as
I speculated, then there would need to be a flaw in the logic that
clamps nbuckets to fit in MaxAllocSize.  I will do some digging today.

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Warning when using psql
Next
From: Fabien COELHO
Date:
Subject: RE: BUG #15228: pgbench custom script numbering off-by-one