Thread: Re: BUG #18938: Logical replication failure in 16.9: "invalid memory alloc request size 1372786672"
Re: BUG #18938: Logical replication failure in 16.9: "invalid memory alloc request size 1372786672"
From
Masahiko Sawada
Date:
Hi, On Wed, May 28, 2025 at 5:33 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18938 > Logged by: John Hutchins > Email address: john.hutchins@wicourts.gov > PostgreSQL version: 16.9 > Operating system: Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-60-generic x8 > Description: > > We encountered a critical logical replication failure after upgrading to > PostgreSQL 16.9 (Ubuntu package 16.9-0ubuntu0.24.04.1) that resolved upon > downgrading to 16.2. I believe this is very similar to the bug reported in > this current thread: > https://www.postgresql.org/message-id/CAD21AoBCn7RR0EYbK%2B1n5UTksc3CVn5AKvxBRSr7zR2eWqTTOw%40mail.gmail.com Yes, I think the same issue happened. We're discussing the best way to resolve this issue on that thread. > Environment: > OS: Ubuntu 24.04.2 LTS (Linux 6.8.0-60-generic) > Hardware: x86_64 > Compiler: gcc 13.3.0 > Upgraded production instance from 16.8 → 16.9 > Subscribers began failing with error: > [2025-05-27 13:56:07.860 CDT] ERROR: invalid memory alloc request size > 1372786672 > Downgrade to 16.2 restored normal replication from point of failure. > Observations: > Currently affects 1/72 production instances using logical replication > Error occurs consistently across all subscribers to affected database, > regardless of publication (appears to be related to logical decoding) > No schema changes or unusual load during incident > Other 16.9 instances remain operational (for now) > Logged error: > [2025-05-27 13:56:07.860 CDT] 649724 <logicalrep@xxx.xxx.xx.xxx(42454) > publishingdb subscribername> > Versions: > -- 16.9 (failing): > PostgreSQL 16.9 (Ubuntu 16.9-0ubuntu0.24.04.1) > -- 16.2 (working): > PostgreSQL 16.2 (Ubuntu 16.2-1ubuntu4) > Suspected area: > Recent commits to logical decoding in 16.9 (specific changesets may be > relevant). > We urgently need assistance diagnosing this, as it blocks patch updates. > Full server logs and configuration details are available upon request. > Thank you for your time, The configuration details would help the investigation. IIUC this issue could happen especially if there are many concurrent transactions that perform DDLs. Does this match the workload on your server? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: [E] Re: BUG #18938: Logical replication failure in 16.9: "invalid memory alloc request size 1372786672"
From
John Hutchins
Date:
Hi,
Thanks very much for your reply.
> The configuration details would help the investigation.
Here are all configs which may be relevant:
wal_level = logical
max_replication_slots = 20
max_logical_replication_workers = 14
max_wal_senders = 20
wal_keep_size = 4096
max_wal_size = 4GB
min_wal_size = 1GB
wal_sender_timeout = 300s
archive_mode = on
archive_command = 'true'
archive_timeout = 3600
max_worker_processes = 18
max_connections = 1000
shared_buffers = 10GB
maintenance_work_mem = 2GB
work_mem = 26214kB
effective_cache_size = 30GB
effective_io_concurrency = 200
> IIUC this issue could happen especially if there are many concurrent
> transactions that perform DDLs. Does this match the workload on your
> server?
I'm not sure if our workload would contain "many concurrent transactions that perform DDLs." Our workload includes DDLs to create temporary tables and also "ALTER TABLE ENABLE/DISABLE TRIGGER" statements that are single-threaded, but may run concurrently with other DML operations. So, in our workloads, there is a possibility of overlap between DDL (trigger management and temporary table creation) and DML from other sessions.
Please let me know if you need any additional details.
John Hutchins
Wisconsin Court System DBA
Wisconsin Court System DBA