Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 - Mailing list pgsql-bugs
From | Amit Kapila |
---|---|
Subject | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date | |
Msg-id | CAA4eK1JQ2UQEbKdWUa0ozh3NdJFMJXfAKRW-vFrC3JqKxmx1FQ@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-bugs |
On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Thank you for updating the patch. Here are some review comments: > > --- > + /* > + * Make sure there's no cache pollution. Unlike the PG_TRY part, > + * this must be done unconditionally because the processing might > + * fail before we reach invalidation messages. > + */ > + if (rbtxn_inval_all_cache(txn)) > + InvalidateSystemCaches(); > + else > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr, > + > txn->distributed_invalidations); > + > > If we don't need to execute the distributed inval message in an error > path other than detecting concurrent abort, we should describe the > reason. > The concurrent abort handling is done for streaming and prepared transactions, where we send the transaction changes to the client before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now, among these COMMIT/ROLLBACK PREPARED cases are handled as part of a prepared transaction case. For ROLLBACK, we will never perform any changes from the current transaction, so we don't need distributed invalidations to be executed. For COMMIT, if we encounter any errors while processing changes (this is when we reach the ERROR path, which is not a concurrent abort), then we will reprocess all changes and, at the end, execute both the current transaction and distributed invalidations. Now, one possibility is that if, after ERROR, the caller does slot_advance to skip the ERROR, then we will probably miss executing the distributed invalidations, leading to data loss afterwards. If the above theory is correct, then it is better to execute distributed invalidation even in non-concurrent-abort cases in the ERROR path. > --- > Given that we don't account the memory usage of both > txn->invalidations and txn->distributed_invalidations, probably we can > have a lower limit, say 8MB (or lower?), to avoid memory exhaustion. > Are you thinking if there are many waslenders in the system, and we kept the limit higher, say 256 MB, then all will start consuming that much memory, leading to memory exhaustion? If so, then I agree with your point to keep this limit as 8MB, we can probably explain the same in comments as well. > --- > + if ((for_inval && !AllocSizeIsValid(req_mem_size)) || > + rbtxn_inval_all_cache(txn)) > { > - txn->ninvalidations = nmsgs; > - txn->invalidations = (SharedInvalidationMessage *) > - palloc(sizeof(SharedInvalidationMessage) * nmsgs); > - memcpy(txn->invalidations, msgs, > - sizeof(SharedInvalidationMessage) * nmsgs); > + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE; > + > + if (*invalidations) > + { > + pfree(*invalidations); > + *invalidations = NULL; > + *ninvalidations = 0; > + } > > RBTXN_INVAL_ALL_CACHE seems to have an effect only on the distributed > inval messages. One question is do we need to care about the overflow > of txn->invalidations as well? > I don't think so. This has been working for a long time, so unless we see a case where the overflow can happen, it is better not to change it. > If no, does it make sense to have a > separate function like ReorderBufferAddDistributedInvalidtions() > instead of having an extended version of > ReorderBufferAddInvalidations()? Some common routines can also be > declared as a static function if needed. > +1. -- With Regards, Amit Kapila.
pgsql-bugs by date: