Alex, Do we see this problem with small tuples as well? I see from your earlier email that tuple size is ~800 bytes in the production environment. It is possible that after commit 1b0d9aa4 such kind of problems are not there with small tuple sizes but that commit happened in PG15 whereas your production environment might be on a prior release.
Hi Amit,
Our prod environment is also on 15.1, which is where we first saw the issue, so I'm afraid the issue still seems to be present here.
Looping back on the earlier discussion, we applied the malloc patch from [1] ([2]) to a prod server, which also fixes the issue there. Attached is a graph of the last 48 hours of memory usage, the ~200GB spikes are instances of the LR walsender memory issue.
After patch is applied (blue mark), baseline memory drops and we no longer see the spikes. Per-process memory stats corroborate that the LR walsender memory is now never more than a few MB RSS per process.