Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 - Mailing list pgsql-bugs

From Shlok Kyal
Subject Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date
Msg-id CANhcyEWp_T7tX-yKbdbxdUR144UAZ7oxNM_AORfCvWHZg0ja5w@mail.gmail.com
Whole thread Raw
In response to Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5  (Duncan Sands <duncan.sands@deepbluecap.com>)
List pgsql-bugs
On Mon, 19 May 2025 at 20:08, Duncan Sands <duncan.sands@deepbluecap.com> wrote:
>
> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
> 6.8.0); x86-64
>
> Good morning from DeepBlueCapital.  Soon after upgrading to 17.5 from 17.4, we
> started seeing logical replication failures with publisher errors like this:
>
>    ERROR:  invalid memory alloc request size 1196493216
>
> (the exact size varies).  Here is a typical log extract from the publisher:
>
> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG:
> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
> 2025-05-19 10:30:07.467048+02
> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION:
>   ProcessStandbyReplyMessage, walsender.c:2431
> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG:
> 00000: skipped replication of an empty transaction with XID: 207637565
> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT:
> slot "jnb\_production", output plugin "pgoutput", in the commit callback,
> associated LSN FB03/349FF938
> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION:
>   pgoutput\_commit\_txn, pgoutput.c:629
> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG:
> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION:
>   UpdateDecodingStats, logical.c:1943
> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG:
> 00000: found top level transaction 207637519, with catalog changes
> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION:
>   SnapBuildCommitTxn, snapbuild.c:1150
> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG:
> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION:
>   SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR:
> XX000: invalid memory alloc request size 1196493216
>
> If I'm reading it right, things go wrong on the publisher while preparing the
> message, i.e. it's not a subscriber problem.
>
> This particular instance was triggered by a large number of catalog
> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
>
> rmgr: Transaction len (rec/tot):  10665/ 10665, tx:  207637519, lsn:
> FB03/34A1AAE0, prev FB03/34A1A8C8, desc: COMMIT 2025-05-19 08:10:12.880599 CEST;
> dropped stats: 2/17426/661557718 2/17426/661557717 2/17426/661557714
> 2/17426/661557678 2/17426/661557677 2/17426/661557674 2/17426/661557673
> 2/17426/661557672 2/17426/661557669 2/17426/661557618 2/17426/661557617
> 2/17426/661557614; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79
> catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
> catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55
> catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 80 catcache
> 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache
> 55 catcache 54 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 55 catcache 54 catcache 80 catcache 79 catcache
> 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache
> 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
> catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54
> catcache 55 catcache 54 catcache 55 catcache 54 catcache 63 catcache 63 catcache
> 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 55 catcache 54 catcache 32 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache
> 79 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 32
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79
> catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 32 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache
> 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80
> catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 661557614 snapshot
> 1214 relcache 661557617 relcache 661557618 relcache 661557617 snapshot 2608
> relcache 661557617 relcache 661557618 relcache 661557614 snapshot 2608 snapshot
> 2608 relcache 661557669 snapshot 1214 relcache 661557672 relcache 661557673
> relcache 661557672 snapshot 2608 relcache 661557672 relcache 661557673 relcache
> 661557669 snapshot 2608 relcache 661557669 snapshot 2608 relcache 661557674
> snapshot 1214 relcache 661557677 relcache 661557678 relcache 661557677 snapshot
> 2608 relcache 661557677 relcache 661557678 relcache 661557674 snapshot 2608
> snapshot 2608 relcache 661557714 snapshot 1214 relcache 661557717 relcache
> 661557718 relcache 661557717 snapshot 2608 relcache 661557717 relcache 661557718
> relcache 661557714 snapshot 2608 relcache 661557714 relcache 661557718 relcache
> 661557717 snapshot 2608 relcache 661557717 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557714 snapshot 2608 snapshot 1214 relcache 661557678 relcache
> 661557677 snapshot 2608 relcache 661557677 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557674 snapshot 2608 snapshot 1214 relcache 661557673 relcache
> 661557672 snapshot 2608 relcache 661557672 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557669 snapshot 2608 snapshot 1214 relcache 661557618 relcache
> 661557617 snapshot 2608 relcache 661557617 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557614 snapshot 2608 snapshot 1214
>
> While it is long, it doesn't seem to merit allocating anything like 1GB of
> memory.  So I'm guessing that postgres is miscalculating the required size somehow.
>
> If I skip over this LSN, for example by dropping the subscription and recreating
> it anew, then things go fine for a while before hitting another "invalid memory
> alloc request", i.e. it wasn't just a one-off.  On the other hand, after
> downgrading to 17.4, subscribers spontaneously recovered and the issue has gone
> way.  Since I didn't skip over the last LSN of this kind, presumably 17.4
> successfully serialized a message for the same problematic bit of WAL that
> caused 17.5 to blow up, which suggests a regression between 17.4 and 17.5.
>
Hi Duncan,

Thanks for reporting this.
I tried adding around ~80000 invalidations but could not reproduce the issue.
Can you share the steps to reproduce the above scenario?

Thanks and Regards,
Shlok Kyal



pgsql-bugs by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: BUG #18923: pg_dump 18beta1 fails to process complex table names
Next
From: Amit Kapila
Date:
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5