Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 - Mailing list pgsql-bugs
From | Amit Kapila |
---|---|
Subject | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date | |
Msg-id | CAA4eK1JwJw6JOnfDxtGtSRF7kM0LbEVPRmNxWeJa5+wyoG05Xg@mail.gmail.com Whole thread Raw |
Responses |
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
List | pgsql-bugs |
On Mon, May 19, 2025 at 8:08 PM Duncan Sands <duncan.sands@deepbluecap.com> wrote: > > PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel > 6.8.0); x86-64 > > Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we > started seeing logical replication failures with publisher errors like this: > > ERROR: invalid memory alloc request size 1196493216 > > (the exact size varies). Here is a typical log extract from the publisher: > > 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG: > 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time > 2025-05-19 10:30:07.467048+02 > 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION: > ProcessStandbyReplyMessage, walsender.c:2431 > 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG: > 00000: skipped replication of an empty transaction with XID: 207637565 > 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT: > slot "jnb\_production", output plugin "pgoutput", in the commit callback, > associated LSN FB03/349FF938 > 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION: > pgoutput\_commit\_txn, pgoutput.c:629 > 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG: > 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191 > 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION: > UpdateDecodingStats, logical.c:1943 > 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG: > 00000: found top level transaction 207637519, with catalog changes > 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION: > SnapBuildCommitTxn, snapbuild.c:1150 > 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG: > 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0 > 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION: > SnapBuildDistributeSnapshotAndInval, snapbuild.c:915 > 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR: > XX000: invalid memory alloc request size 1196493216 > > If I'm reading it right, things go wrong on the publisher while preparing the > message, i.e. it's not a subscriber problem. > Right, I also think so. > This particular instance was triggered by a large number of catalog > invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s > FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line: > ... ... > > While it is long, it doesn't seem to merit allocating anything like 1GB of > memory. So I'm guessing that postgres is miscalculating the required size somehow. > We fixed a bug in commit 4909b38af0 to distribute invalidation at the transaction end to avoid data loss in certain cases, which could cause such a problem. I am wondering that even prior to that commit, we would eventually end up allocating the required memory for a transaction for all the invalidations because of repalloc in ReorderBufferAddInvalidations, so why it matter with this commit? One possibility is that we need allocations for multiple in-progress transactions now. I'll think more about this. It would be helpful if you could share more details about the workload, or if possible, a testcase or script using which we can reproduce this problem. -- With Regards, Amit Kapila.
pgsql-bugs by date: