Thread: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
From
Amit Kapila
Date:
On Mon, May 19, 2025 at 8:08 PM Duncan Sands <duncan.sands@deepbluecap.com> wrote: > > PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel > 6.8.0); x86-64 > > Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we > started seeing logical replication failures with publisher errors like this: > > ERROR: invalid memory alloc request size 1196493216 > > (the exact size varies). Here is a typical log extract from the publisher: > > 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG: > 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time > 2025-05-19 10:30:07.467048+02 > 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION: > ProcessStandbyReplyMessage, walsender.c:2431 > 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG: > 00000: skipped replication of an empty transaction with XID: 207637565 > 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT: > slot "jnb\_production", output plugin "pgoutput", in the commit callback, > associated LSN FB03/349FF938 > 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION: > pgoutput\_commit\_txn, pgoutput.c:629 > 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG: > 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191 > 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION: > UpdateDecodingStats, logical.c:1943 > 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG: > 00000: found top level transaction 207637519, with catalog changes > 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION: > SnapBuildCommitTxn, snapbuild.c:1150 > 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG: > 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0 > 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION: > SnapBuildDistributeSnapshotAndInval, snapbuild.c:915 > 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR: > XX000: invalid memory alloc request size 1196493216 > > If I'm reading it right, things go wrong on the publisher while preparing the > message, i.e. it's not a subscriber problem. > Right, I also think so. > This particular instance was triggered by a large number of catalog > invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s > FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line: > ... ... > > While it is long, it doesn't seem to merit allocating anything like 1GB of > memory. So I'm guessing that postgres is miscalculating the required size somehow. > We fixed a bug in commit 4909b38af0 to distribute invalidation at the transaction end to avoid data loss in certain cases, which could cause such a problem. I am wondering that even prior to that commit, we would eventually end up allocating the required memory for a transaction for all the invalidations because of repalloc in ReorderBufferAddInvalidations, so why it matter with this commit? One possibility is that we need allocations for multiple in-progress transactions now. I'll think more about this. It would be helpful if you could share more details about the workload, or if possible, a testcase or script using which we can reproduce this problem. -- With Regards, Amit Kapila.
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
From
Amit Kapila
Date:
On Wed, May 21, 2025 at 11:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, May 19, 2025 at 8:08 PM Duncan Sands > <duncan.sands@deepbluecap.com> wrote: > > > > While it is long, it doesn't seem to merit allocating anything like 1GB of > > memory. So I'm guessing that postgres is miscalculating the required size somehow. > > > > We fixed a bug in commit 4909b38af0 to distribute invalidation at the > transaction end to avoid data loss in certain cases, which could cause > such a problem. I am wondering that even prior to that commit, we > would eventually end up allocating the required memory for a > transaction for all the invalidations because of repalloc in > ReorderBufferAddInvalidations, so why it matter with this commit? One > possibility is that we need allocations for multiple in-progress > transactions now. > I think the problem here is that when we are distributing invalidations to a concurrent transaction, in addition to queuing the invalidations as a change, we also copy the distributed invalidations along with the original transaction's invalidations via repalloc in ReorderBufferAddInvalidations. So, when there are many in-progress transactions, each would try to copy all its accumulated invalidations to the remaining in-progress transactions. This could lead to such an increase in allocation request size. However, after queuing the change, we don't need to copy it along with the original transaction's invalidations. This is because the copy is only required when we don't process any changes in cases like ReorderBufferForget(). I have analyzed all such cases, and my analysis is as follows: ReorderBufferForget() ------------------------------ It is okay not to perform the invalidations that we got from other concurrent transactions during ReorderBufferForget. This is because ReorderBufferForget executes invalidations when we skip the transaction being decoded, as it is not from a database of interest. So, we execute only to invalidate shared catalogs (See comment at the caller of ReorderBufferForget). It is sufficient to execute such invalidations in the source transaction only because the transaction being skipped wouldn't have loaded anything in the shared catalog. ReorderBufferAbort() ----------------------------- ReorderBufferAbort() process invalidation when it has already streamed some changes. Whenever it would have streamed the change, it would have processed the concurrent transactions' invalidation messages that happened before the statement that led to streaming. That should be sufficient for us. Consider the following variant of the original case that required the distribution of invalidations: 1) S1: CREATE TABLE d(data text not null); 2) S1: INSERT INTO d VALUES('d1'); 3) S2: BEGIN; INSERT INTO d VALUES('d2'); 4) S1: ALTER PUBLICATION pb ADD TABLE d; 5) S2: INSERT INTO unrelated_tab VALUES(1); 6) S2: ROLLBACK; 7) S2: INSERT INTO d VALUES('d3'); 8) S1: INSERT INTO d VALUES('d4'); The problem with the sequence is that the insert from 3) could be decoded *after* 4) in step 5) due to streaming, and that to decode the insert (which happened before the ALTER) the catalog snapshot and cache state is from *before* the ALTER TABLE. Because the transaction started in 3) doesn't modify any catalogs, no invalidations are executed after decoding it. The result could be that the cache looks like it did at 3), not like after 4). However, this won't create a problem because while streaming at 5), we would execute invalidation from S-1 due to the change added via message REORDER_BUFFER_CHANGE_INVALIDATION in ReorderBufferAddInvalidations. ReorderBufferInvalidate -------------------------------- The reason is the same as ReorderBufferForget(), as it executes invalidations for the same reason, but with a different function to avoid the cleanup of the buffer at the end. XLOG_XACT_INVALIDATIONS ------------------------------------------- While processing XLOG_XACT_INVALIDATIONS, we don't need invalidations accumulated from other xacts because this is a special case to execute invalidations from a particular command (DDL) in a transaction. It won't build any cache, so it can't create any invalid state. -- With Regards, Amit Kapila.
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
From
Duncan Sands
Date:
Hi Amit and Shlok, thanks for thinking about this issue. We are working on reproducing it in our test environment. Since it seems likely to be related to our primary database being very busy with lots of concurrency and large transactions, we are starting by creating a streaming replication copy of our primary server (this copy to run 17.5, with the primary on 17.4), with the idea of then doing logical replication from the standby to see if we hit the same issue. If so, that gives something to poke at, and we can move on to something better from there. Best wishes, Duncan. On 21/05/2025 07:48, Amit Kapila wrote: > On Mon, May 19, 2025 at 8:08 PM Duncan Sands > <duncan.sands@deepbluecap.com> wrote: >> >> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel >> 6.8.0); x86-64 >> >> Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we >> started seeing logical replication failures with publisher errors like this: >> >> ERROR: invalid memory alloc request size 1196493216 >> >> (the exact size varies). Here is a typical log extract from the publisher: >> >> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\@blue DEBUG: >> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time >> 2025-05-19 10:30:07.467048+02 >> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\@blue LOCATION: >> ProcessStandbyReplyMessage, walsender.c:2431 >> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\@blue DEBUG: >> 00000: skipped replication of an empty transaction with XID: 207637565 >> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\@blue CONTEXT: >> slot "jnb\_production", output plugin "pgoutput", in the commit callback, >> associated LSN FB03/349FF938 >> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\@blue LOCATION: >> pgoutput\_commit\_txn, pgoutput.c:629 >> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\@blue DEBUG: >> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191 >> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\@blue LOCATION: >> UpdateDecodingStats, logical.c:1943 >> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\@blue DEBUG: >> 00000: found top level transaction 207637519, with catalog changes >> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\@blue LOCATION: >> SnapBuildCommitTxn, snapbuild.c:1150 >> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\@blue DEBUG: >> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0 >> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\@blue LOCATION: >> SnapBuildDistributeSnapshotAndInval, snapbuild.c:915 >> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\@blue ERROR: >> XX000: invalid memory alloc request size 1196493216 >> >> If I'm reading it right, things go wrong on the publisher while preparing the >> message, i.e. it's not a subscriber problem. >> > > Right, I also think so. > >> This particular instance was triggered by a large number of catalog >> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s >> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line: >> > ... > ... >> >> While it is long, it doesn't seem to merit allocating anything like 1GB of >> memory. So I'm guessing that postgres is miscalculating the required size somehow. >> > > We fixed a bug in commit 4909b38af0 to distribute invalidation at the > transaction end to avoid data loss in certain cases, which could cause > such a problem. I am wondering that even prior to that commit, we > would eventually end up allocating the required memory for a > transaction for all the invalidations because of repalloc in > ReorderBufferAddInvalidations, so why it matter with this commit? One > possibility is that we need allocations for multiple in-progress > transactions now. I'll think more about this. It would be helpful if > you could share more details about the workload, or if possible, a > testcase or script using which we can reproduce this problem. >
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
From
Masahiko Sawada
Date:
On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, May 21, 2025 at 11:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, May 19, 2025 at 8:08 PM Duncan Sands > > <duncan.sands@deepbluecap.com> wrote: > > > > > > While it is long, it doesn't seem to merit allocating anything like 1GB of > > > memory. So I'm guessing that postgres is miscalculating the required size somehow. > > > > > > > We fixed a bug in commit 4909b38af0 to distribute invalidation at the > > transaction end to avoid data loss in certain cases, which could cause > > such a problem. I am wondering that even prior to that commit, we > > would eventually end up allocating the required memory for a > > transaction for all the invalidations because of repalloc in > > ReorderBufferAddInvalidations, so why it matter with this commit? One > > possibility is that we need allocations for multiple in-progress > > transactions now. > > > > I think the problem here is that when we are distributing > invalidations to a concurrent transaction, in addition to queuing the > invalidations as a change, we also copy the distributed invalidations > along with the original transaction's invalidations via repalloc in > ReorderBufferAddInvalidations. So, when there are many in-progress > transactions, each would try to copy all its accumulated invalidations > to the remaining in-progress transactions. This could lead to such an > increase in allocation request size. I agree with this analysis. > However, after queuing the > change, we don't need to copy it along with the original transaction's > invalidations. This is because the copy is only required when we don't > process any changes in cases like ReorderBufferForget(). It seems that we use the accumulated invalidation message also after replaying or concurrently aborting a transaction via ReorderBufferExecuteInvalidations(). Do we need to consider such cases too? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
From
Shlok Kyal
Date:
On Wed, 21 May 2025 at 17:18, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > > > I think the problem here is that when we are distributing > > invalidations to a concurrent transaction, in addition to queuing the > > invalidations as a change, we also copy the distributed invalidations > > along with the original transaction's invalidations via repalloc in > > ReorderBufferAddInvalidations. So, when there are many in-progress > > transactions, each would try to copy all its accumulated invalidations > > to the remaining in-progress transactions. This could lead to such an > > increase in allocation request size. However, after queuing the > > change, we don't need to copy it along with the original transaction's > > invalidations. This is because the copy is only required when we don't > > process any changes in cases like ReorderBufferForget(). I have > > analyzed all such cases, and my analysis is as follows: > > Based on the analysis, I created a PoC which avoids the repalloc(). > Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are > skipped to add in the list, just queued - repalloc can be skipped. Also, the function > distributes messages only in the list, so received messages won't be sent again. > > Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and > confirms whether the issue can be solved? > Hi, I was able to reproduce the issue with following test: 1. First begin 9 concurrent txn. (BEGIN; INSERT into t1 values(11);) 2. In 10th concurrent txn : perform 1000 DDL (ALTER PUBLICATION ADD/DROP TABLE) 3. For each concurrent 9 txn. Perform: i. Add 1000 DDL ii. COMMIT; iii. BEGIN; INSERT into t1 values(11); 4. Perform step (2 and 3) in loop This steps reproduced the error: 2025-05-22 19:03:35.111 JST [63150] sub1 ERROR: invalid memory alloc request size 1555752832 2025-05-22 19:03:35.111 JST [63150] sub1 STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', streaming 'parallel', origin 'any', publication_names '"pub1"') I have also attached the test script for the same. Also, I tried to run the test with Kuroda-san's patch and it did not reproduce the issue. Thanks and Regards, Shlok Kyal
Attachment
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
From
Amit Kapila
Date:
On Thu, May 22, 2025 at 6:29 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Amit, Sawada-san, > > > Good point. After replaying the transaction, it doesn't matter because > > we would have already relayed the required invalidation while > > processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However > > for > > concurrent abort case it could matter. See my analysis for the same > > below: > > > > Simulation of concurrent abort > > ------------------------------------------ > > 1) S1: CREATE TABLE d(data text not null); > > 2) S1: INSERT INTO d VALUES('d1'); > > 3) S2: BEGIN; INSERT INTO d VALUES('d2'); > > 4) S2: INSERT INTO unrelated_tab VALUES(1); > > 5) S1: ALTER PUBLICATION pb ADD TABLE d; > > 6) S2: INSERT INTO unrelated_tab VALUES(2); > > 7) S2: ROLLBACK; > > 8) S2: INSERT INTO d VALUES('d3'); > > 9) S1: INSERT INTO d VALUES('d4'); > > > The problem with the sequence is that the insert from 3) could be > > decoded *after* 5) in step 6) due to streaming and that to decode the > > insert (which happened before the ALTER) the catalog snapshot and > > cache state is from *before* the ALTER TABLE. Because the transaction > > started in 3) doesn't actually modify any catalogs, no invalidations > > are executed after decoding it. Now, assume, while decoding Insert > > from 4), we detected a concurrent abort, then the distributed > > invalidation won't be executed, and if we don't have accumulated > > messages in txn->invalidations, then the invalidation from step 5) > > won't be performed. The data loss can occur in steps 8 and 9. This is > > just a theory, so I could be missing something. > > I verified this is real or not, and succeeded to reproduce. See appendix the > detailed steps. > > > If the above turns out to be a problem, one idea for fixing it is that > > for the concurrent abort case (both during streaming and for prepared > > transaction's processing), we still check all the remaining changes > > and process only the changes related to invalidations. This has to be > > done before the current txn changes are freed via > > ReorderBufferResetTXN->ReorderBufferTruncateTXN. > > I roughly implemented the part, PSA the updated version. One concern is whether we > should consider the case that invalidations can cause ereport(ERROR). If happens, > the walsender will exit at that time. > But, in the catch part, we are already executing invalidations: ... /* make sure there's no cache pollution */ ReorderBufferExecuteInvalidations(txn->ninvalidations, txn->invalidations); ... So, the behaviour should be the same. -- With Regards, Amit Kapila.
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
From
Duncan Sands
Date:
Dear Hayato Kuroda, thank you so much for working on this problem. Your patch PG17-0001-Avoid-distributing-invalidation-messages-several-tim.patch solves the issue for me. Without it I get an invalid memory alloc request error within about twenty minutes. With your patch, 24 hours have passed with no errors. Best wishes, Duncan. On 21/05/2025 13:48, Hayato Kuroda (Fujitsu) wrote: > Dear hackers, > >> I think the problem here is that when we are distributing >> invalidations to a concurrent transaction, in addition to queuing the >> invalidations as a change, we also copy the distributed invalidations >> along with the original transaction's invalidations via repalloc in >> ReorderBufferAddInvalidations. So, when there are many in-progress >> transactions, each would try to copy all its accumulated invalidations >> to the remaining in-progress transactions. This could lead to such an >> increase in allocation request size. However, after queuing the >> change, we don't need to copy it along with the original transaction's >> invalidations. This is because the copy is only required when we don't >> process any changes in cases like ReorderBufferForget(). I have >> analyzed all such cases, and my analysis is as follows: > > Based on the analysis, I created a PoC which avoids the repalloc(). > Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are > skipped to add in the list, just queued - repalloc can be skipped. Also, the function > distributes messages only in the list, so received messages won't be sent again. > > Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and > confirms whether the issue can be solved? > > Best regards, > Hayato Kuroda > FUJITSU LIMITED >