Thread: BUG #18514: Encountering an error invalid DSA memory alloc request size 1811939328 when executing script

The following bug has been logged on the website:

Bug reference:      18514
Logged by:          iwebcas
Email address:      iwebcas@gmail.com
PostgreSQL version: 16.3
Operating system:   Debian 8.3.0-6 (PostgreSQL 16.3)
Description:

Issue Report: Invalid DSA Memory Allocation Request Size

Summary:
Encountering an error invalid DSA memory alloc request size 1811939328 when
executing a prepared statement after multiple runs. The error occurs due to
a different execution plan being chosen on the 6th attempt.

Description:
I have a prepared query that executes successfully using a specific
execution plan for the first five attempts. However, on the 6th execution, a
different plan is chosen, resulting in an error and the query failing.

Steps to Reproduce:

Prepare the query:

sql
PREPARE testQuery (timestamp) AS
SELECT
    "SuDocumentOneC"."Id" AS "Id",
    "SuDocumentBase"."Id" AS "SuDocumentBase.Id"
FROM
    "public"."SuDocumentOneC" "SuDocumentOneC"
LEFT OUTER JOIN "public"."SuDocumentOneC" "SuDocumentBase"
    ON ("SuDocumentBase"."Id" = "SuDocumentOneC"."SuDocumentBaseId")
WHERE
    ("SuDocumentOneC"."ModifiedOn" < $1)
ORDER BY
    "Id" ASC NULLS FIRST;
Execute the query multiple times:

sql
Копировать код
EXPLAIN (ANALYZE, BUFFERS) EXECUTE testQuery ('2024-06-05 10:24:39');
EXPLAIN (BUFFERS) EXECUTE testQuery ('2024-06-05 10:24:39');

The query runs correctly the first five times, using the following plan:
sql
Gather Merge  (cost=25656.39..47885.28 rows=190520 width=32)
  Workers Planned: 2
  ->  Sort  (cost=24656.37..24894.52 rows=95260 width=32)
        Sort Key: "SuDocumentOneC"."Id" NULLS FIRST
        ->  Parallel Hash Left Join  (cost=4811.69..15441.97 rows=95260
width=32)
              Hash Cond: ("SuDocumentOneC"."SuDocumentBaseId" =
"SuDocumentBase"."Id")
              ->  Parallel Seq Scan on "SuDocumentOneC"  (cost=0.00..8608.22
rows=95260 width=32)
                    Filter: ("ModifiedOn" < '2024-06-05 10:24:39'::timestamp
without time zone)
              ->  Parallel Hash  (cost=3147.46..3147.46 rows=95698
width=16)
                    ->  Parallel Index Only Scan using
"PKfh5fLg2Fg2F9m8kIjOIQ2e4Rc" on "SuDocumentOneC" "SuDocumentBase"
(cost=0.42..3147.46 rows=95698 width=16)
Planning:
  Buffers: shared hit=581
  
On the 6th attempt, a different execution plan is used, resulting in the
error:
sql
Gather Merge  (cost=12352.45..19796.07 rows=63798 width=32)
  Workers Planned: 2
  ->  Sort  (cost=11352.43..11432.18 rows=31899 width=32)
        Sort Key: "SuDocumentOneC"."Id" NULLS FIRST
        ->  Parallel Hash Right Join  (cost=4945.70..8517.24 rows=31899
width=32)
              Hash Cond: ("SuDocumentBase"."Id" =
"SuDocumentOneC"."SuDocumentBaseId")
              ->  Parallel Index Only Scan using
"PKfh5fLg2Fg2F9m8kIjOIQ2e4Rc" on "SuDocumentOneC" "SuDocumentBase"
(cost=0.42..3147.46 rows=95698 width=16)
              ->  Parallel Hash  (cost=4546.54..4546.54 rows=31899
width=32)
                    ->  Parallel Index Scan using
"IX_SuDocumentOneC_ModifiedOn_V2" on "SuDocumentOneC"  (cost=0.42..4546.54
rows=31899 width=32)
                          Index Cond: ("ModifiedOn" < $1)
Planning:
  Buffers: shared hit=34


ERROR:  invalid DSA memory alloc request size 1811939328
CONTEXT:  parallel worker
SQL state: XX000
System Details:

PostgreSQL version: "PostgreSQL 16.3 (Debian 16.3-1.pgdg100+1) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit"
Version on which the error is not reproduced: "PostgreSQL 16.2 (Debian
16.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6)
8.3.0, 64-bit"
Request for Help:
Could you please assist in understanding why the execution plan changes and
how to resolve the error invalid DSA memory alloc request size 1811939328?
Any guidance or recommendations would be greatly appreciated.


On Mon, Jun 17, 2024 at 1:14 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      18514
Logged by:          iwebcas
Email address:      iwebcas@gmail.com
PostgreSQL version: 16.3
Operating system:   Debian 8.3.0-6 (PostgreSQL 16.3)
Description:       

Issue Report: Invalid DSA Memory Allocation Request Size

Summary:
Encountering an error invalid DSA memory alloc request size 1811939328 when
executing a prepared statement after multiple runs. The error occurs due to
a different execution plan being chosen on the 6th attempt.
[...]
Version on which the error is not reproduced: "PostgreSQL 16.2 (Debian
16.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6)
8.3.0, 64-bit"

If this is indeed a regression from 16.2 odds are good you will need to work around the bug.  Namely by ensuring that a generic plan for that query is never chosen.  See: plan_cache_mode


David J.

PG Bug reporting form <noreply@postgresql.org> writes:
> Encountering an error invalid DSA memory alloc request size 1811939328 when
> executing a prepared statement after multiple runs. The error occurs due to
> a different execution plan being chosen on the 6th attempt.

Could you submit a self-contained reproducer script for this?  The
query alone is not enough information; we'd have to guess at your
table definition and contents.  (Note that dummy data might be
enough to provoke it, so you can probably anonymize or remove
anything sensitive.)

> Could you please assist in understanding why the execution plan changes

Probably as a result of switching from custom to generic plan.
There's not anything exciting about that, but we would like to
get to the bottom of the DSA allocation failure.

            regards, tom lane



On Tue, Jun 18, 2024 at 8:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Probably as a result of switching from custom to generic plan.
> There's not anything exciting about that, but we would like to
> get to the bottom of the DSA allocation failure.

Parallel Hash Right Join is new in 16 (commit 11c2d6fd, whose message
unfortunately focuses on JOIN_FULL but also covered JOIN_RIGHT).  I
suspect that the allocation failure might be coming from trying to
allocate a huge array of batches in
ExecParallelHashJoinSetUpBatches(), so large that it exceeds the 1GB
allocation cap.  I don't think it's the bucket array, because we avoid
exceeding the cap there.  The only other thing it could be is a
massive tuple, which seems unlikely.  If that's right, then the
question is: what is it about JOIN_RIGHT that is producing very large
nbatch?

There was a similar report[1] on -hackers.  One of the links to
depesz's EXPLAIN viewer shows Parallel Hash Right Join, and the report
includes very high numbers of batches.  Hmm.

[1] https://www.postgresql.org/message-id/flat/CAG4TxrizOVnkYx1v1a7rv6G3t4fMoZP6vbZn3yPLgjHrg5ETbw%40mail.gmail.com



On Tue, Jun 18, 2024 at 8:23 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> If this is indeed a regression from 16.2 odds are good you will need to work around the bug.  Namely by ensuring that
ageneric plan for that query is never chosen.  See: plan_cache_mode 

Or enable_parallel_hash=off.



On Tue, Jun 18, 2024 at 2:20 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Or enable_parallel_hash=off.

And if that allows the query to run, and it still uses the same basic
query plan with a Hash Right Join there, I wonder if it also uses a
huge number of batches, and if it does but manages to succeed, I
wonder if that could be just due to a difference in size of the
per-batch objects?  I vaguely recall that there was a proposal to cap
nbatch appropriately for that allocation, but the cap would be
different for serial and parallel cases due to the element sizes.