Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date
Msg-id CAA4eK1LqTcH5vYrCUFn=NUHv_9eYrZS6piLA3UUKD4=39xB=fQ@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
List pgsql-bugs
On Thu, May 29, 2025 at 12:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, May 26, 2025 at 4:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > > If the above hypothesis is true, we need to consider another idea so
> > > > that we can execute invalidation messages in both cases.
> > >
> > > The straightforward fix is to check the change queue as well when the transaction
> > > has invalidation messages. 0003 implemented that. One downside is that traversing
> > > changes can affect performance. Currently we iterates all of changes even a
> > > single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
> > >
> >
> > It can impact the performance for large transactions with fewer
> > invalidations, especially the ones which has spilled changes because
> > it needs to traverse the entire list of changes again at the end.
>
> What if we remember all executed REORDER_BUFFER_CHANGE_INVALIDATION in
> a queue while replaying the transaction so that we can execute them at
> the end in a non-error path, instead of re-traversing the entire list
> of changes to execute the inval messages?
>

The current proposed patch (v4) is also traversing only the required
inval messages, as it has maintained a separate queue for that. So,
what will be the advantage of forming such a queue during the
processing of changes? Are you imagining a local instead of a queue at
ReorderBufferTXN level? I feel we still need at ReorderBufferTXN level
to ensure that we can execute those changes across streaming blocks,
otherwise, the cleanup of such a local queue would be tricky and add
to maintenance effort.

One disadvantage of the approach you suggest is that the changes in
the new queue won't be accounted for in logical_decoding_work_mem
computation, which can be done in the proposed approach, although the
patch hasn't implemented it as of now.

A few comments on v4:
===================
1.
+static void
+ReorderBufferExecuteInvalidationsInQueue(ReorderBuffer *rb,
ReorderBufferTXN *txn)
+{
...
...
+ /* Skip other changes because the transaction was aborted */
+ case REORDER_BUFFER_CHANGE_INSERT:
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ case REORDER_BUFFER_CHANGE_DELETE:
+ case REORDER_BUFFER_CHANGE_MESSAGE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+ case REORDER_BUFFER_CHANGE_TRUNCATE:
+ break;

This cases should never happen, so we should have either an Assert or
an elog here.

2.
+void
+ReorderBufferAddInvalidationsExtended(ReorderBuffer *rb, TransactionId xid,
+   XLogRecPtr lsn, Size nmsgs,
+   SharedInvalidationMessage *msgs,
+   bool needs_distribute)
{
...
+ * XXX: IIUC This must be done only to the toptxns, but is it right?
+ */
+ if (!needs_distribute && !TransactionIdIsValid(txn->toplevel_xid))
+ {
+ ReorderBufferChange *inval_change;
+
+ /* Duplicate the inval change to queue it */
+ inval_change = ReorderBufferAllocChange(rb);
+ inval_change->action = REORDER_BUFFER_CHANGE_INVALIDATION;

The name needs_distribute seems confusing because when this function
is invoked from SnapBuildDistributeSnapshotAndInval(), the parameter
is passed as false; it should be true in that case, and the meaning of
this parameter in the function should be reversed.

--
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: John Hutchins
Date:
Subject: Re: [E] Re: BUG #18938: Logical replication failure in 16.9: "invalid memory alloc request size 1372786672"
Next
From: Masahiko Sawada
Date:
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5