Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 - Mailing list pgsql-bugs

From Masahiko Sawada
Subject Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date
Msg-id CAD21AoDKisp9pGb=8qos8Y4ddDLt62D5=P5usQMG3cm+A+vfOg@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
On Thu, Jun 5, 2025 at 8:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jun 6, 2025 at 12:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jun 5, 2025 at 4:07 AM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Dear Amit,
> > >
> > > > > ---
> > > > > I'd like to make it clear again which case we need to execute
> > > > > txn->invalidations as well as txn->invalidations_distributed (like in
> > > > > ReorderBufferProcessTXN()) and which case we need to execute only
> > > > > txn->invalidations (like in ReorderBufferForget() and
> > > > > ReorderBufferAbort()). I think it might be worth putting some comments
> > > > > about overall strategy somewhere.
> > > > >
> > > > > ---
> > > > > BTW for back branches, a simple fix without ABI breakage would be to
> > > > > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > > > > txn->invalidations. That is, we accumulate inval messages both coming
> > > > > from the current transaction and distributed by other transactions but
> > > > > once the size reaches the threshold we invalidate all caches. Is it
> > > > > worth considering for back branches?
> > > > >
> > > >
> > > > It should work and is worth considering. The main concern would be
> > > > that it will hit sooner than we expect in the field, seeing the recent
> > > > reports. So, such a change has the potential to degrade the
> > > > performance. I feel that the number of people impacted due to
> > > > performance would be more than the number of people impacted due to
> > > > such an ABI change (adding the new members at the end of
> > > > ReorderBufferTXN). However, if we think we want to go safe w.r.t
> > > > extensions that can rely on the sizeof ReorderBufferTXN then your
> > > > proposal makes sense.
> > >
> > > While considering the approach, I found a doubtful point. Consider the below
> > > workload:
> > >
> > > 0. S1: CREATE TABLE d(data text not null);
> > > 1. S1: BEGIN;
> > > 2. S1: INSERT INTO d VALUES ('d1')
> > > 3.                                              S2: BEGIN;
> > > 4.                                              S2: INSERT INTO d VALUES ('d2')
> > > 5. S1: ALTER PUBLICATION pb ADD TABLE d;
> > > 6. S1: ... lots of DDLs so overflow happens
> > > 7. S1: COMMIT;
> > > 8.                                              S2: INSERT INTO d VALUES ('d3');
> > > 9.                                              S2: COMMIT;
> > > 10.                                             S2: INSERT INTO d VALUES ('d4');
> > >
> > > In this case, the inval message generated by step 5 is discarded at step 6. No
> > > invalidation messages are distributed in the  SnapBuildDistributeSnapshotAndInval().
> > > While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
> > > replicated. Do you think this can happen?
> >
> > I think that once the S1's inval messages got overflowed, we should
> > mark other transactions as overflowed instead of distributing inval
> > messages.
> >
>
> Yeah, this should work, but are you still advocating that we go with
> this approach (marking txn->invalidations also as overflowed) for
> backbranches? In the previous email, you seemed to agree with the
> performance impact due to DDLs, so it is not clear which approach you
> prefer.

No, I just wanted to make it clear that this idea is possible. But I
agree to use the idea of having both invalidations_distributed and
ninvalidations_distributed in ReorderBufferTXN in all branches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-bugs by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Next
From: PG Bug reporting form
Date:
Subject: BUG #18948: Equivalent MAX() result in view and inline query yields inconsistent row count