Dear Hayato Kuroda, thank you so much for working on this problem. Your patch
PG17-0001-Avoid-distributing-invalidation-messages-several-tim.patch solves the
issue for me. Without it I get an invalid memory alloc request error within
about twenty minutes. With your patch, 24 hours have passed with no errors.
Best wishes, Duncan.
On 21/05/2025 13:48, Hayato Kuroda (Fujitsu) wrote:
> Dear hackers,
>
>> I think the problem here is that when we are distributing
>> invalidations to a concurrent transaction, in addition to queuing the
>> invalidations as a change, we also copy the distributed invalidations
>> along with the original transaction's invalidations via repalloc in
>> ReorderBufferAddInvalidations. So, when there are many in-progress
>> transactions, each would try to copy all its accumulated invalidations
>> to the remaining in-progress transactions. This could lead to such an
>> increase in allocation request size. However, after queuing the
>> change, we don't need to copy it along with the original transaction's
>> invalidations. This is because the copy is only required when we don't
>> process any changes in cases like ReorderBufferForget(). I have
>> analyzed all such cases, and my analysis is as follows:
>
> Based on the analysis, I created a PoC which avoids the repalloc().
> Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
> skipped to add in the list, just queued - repalloc can be skipped. Also, the function
> distributes messages only in the list, so received messages won't be sent again.
>
> Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
> confirms whether the issue can be solved?
>
> Best regards,
> Hayato Kuroda
> FUJITSU LIMITED
>