Thread: Logical Replication Memory Allocation Error - "invalid memory alloc request size"

Hello,

I'm encountering a consistent issue with PostgreSQL 15 logical replication and would appreciate any guidance on debugging or resolving this problem.

Setup:
- Source: PostgreSQL 15.x
- Target: PostgreSQL 15.x
- Replication: Logical replication using publication/subscription (pgoutput)
- Tables: 3 tables (details below)

Table Details:
- Table 1: ~1,300 records, 7 columns, no large objects 
- Table 2: ~100,000 records, 7 columns, no large objects
- Table 3: ~100,000 records, 17 columns, no large objects

Problem:

The initial snapshot and data copy complete successfully for all tables. However, anywhere from 5 minutes to 2 hours after the initial sync, the subscription consistently fails with memory allocation errors like:

```
2025-06-10 14:14:56.800 UTC [299] ERROR: could not receive data from WAL stream: ERROR: invalid memory alloc request size 1238451248
2025-06-10 14:14:56.805 UTC [1] LOG: background worker "logical replication worker" (PID 299) exited with exit code 1
```

This occurs whether I replicate all 3 tables together or individually.

My initial hypothesis is that large transactions are creating WAL segments that exceed memory limits when sent to the subscriber. However, I haven't been able to confirm this / find the cause.

Questions:

1. What's the best approach to debug this memory allocation issue?
2. Are there specific PostgreSQL settings I should check ?
3. How can I identify if large transactions are indeed the root cause?

Additional Context:
- This happens consistently across multiple replication attempts
- The error size varies but is always requesting > 1GB
- No custom logical replication settings currently applied
- Subscriber machine has 256 GB of RAM and Ubuntu 20.04
- Can recreate it on different machines

I should also mention that we're operating in a managed environment on DigitalOcean, which means we don't have direct access to the WAL logs on the publisher node. This is why the log information above is limited. I understand this constraint makes it more difficult to provide help, but I would really appreciate any insights or suggestions you might have.

Thanks,
 
Max

RE: Logical Replication Memory Allocation Error - "invalid memory alloc request size"

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Max,

Thanks for the report.

> The initial snapshot and data copy complete successfully for all tables. However, anywhere from 5
> minutes to 2 hours after the initial sync, the subscription consistently fails with memory allocation errors like:
>
> ```
> 2025-06-10 14:14:56.800 UTC [299] ERROR: could not receive data from WAL stream: ERROR: invalid memory alloc request
size1238451248 
 
> 2025-06-10 14:14:56.805 UTC [1] LOG: background worker "logical replication worker" (PID 299) exited with exit code
1
> ```

I think this is a known postgres bug which has been also reported at [1]. We are discussing
how we fix. Typically this can happen when there are lots of concurrent transactions
and they have DDLs. IIUC there are no good workaround for now - any parameters can't
avoid the failure. Only you can reduce them.

I'm happy if you apply the patch posted at [1] and confirms the issue can be solved, but...
seems difficult because you are in the managed env.

[1]: https://www.postgresql.org/message-id/CALDaNm0TaTPuza7Fa%2BDRMzL%2BmqK3%2B7RVEvFiRoDJbU2vkJESwg%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Hi Hayato, 

Thank you for your reply.

We have rewritten as many of our transactions as possible to avoid using temporary tables, and so far, that seems to have resolved the problem. 

Thank you for your help. 

Many thanks,
 
Max

On Wed, Jun 11, 2025 at 3:31 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Max,

Thanks for the report.

> The initial snapshot and data copy complete successfully for all tables. However, anywhere from 5
> minutes to 2 hours after the initial sync, the subscription consistently fails with memory allocation errors like:
>
> ```
> 2025-06-10 14:14:56.800 UTC [299] ERROR: could not receive data from WAL stream: ERROR: invalid memory alloc request size 1238451248
> 2025-06-10 14:14:56.805 UTC [1] LOG: background worker "logical replication worker" (PID 299) exited with exit code 1
> ```

I think this is a known postgres bug which has been also reported at [1]. We are discussing
how we fix. Typically this can happen when there are lots of concurrent transactions
and they have DDLs. IIUC there are no good workaround for now - any parameters can't
avoid the failure. Only you can reduce them.

I'm happy if you apply the patch posted at [1] and confirms the issue can be solved, but...
seems difficult because you are in the managed env.

[1]: https://www.postgresql.org/message-id/CALDaNm0TaTPuza7Fa%2BDRMzL%2BmqK3%2B7RVEvFiRoDJbU2vkJESwg%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

On Wed, Jun 11, 2025 at 7:36 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Max,
>
> > We have rewritten as many of our transactions as possible to avoid using
> > temporary tables, and so far, that seems to have resolved the problem.
>
> Good to know. We try to fix as soon as possible.
>

I pushed the fix for this issue[1].

Regards,

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d87d07b7ad3b782cb74566cd771ecdb2823adf6a


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com