I think I reproduced this problem as you suggested (Update the entire table in parallel). And I can reproduce this problem on both current HEAD and REL_15_1. The memory used in rb->tup_context can reach 350M in HEAD and reach 600MB in REL_15_1.
Great, thanks for your help in reproducing this.
But there's one more thing I'm not sure about. You mentioned in [2] that pg_stat_replication_slots shows 0 spilled or streamed bytes for any slots. I think this may be due to the timing of viewing pg_stat_replication_slots. In the function ReorderBufferCheckMemoryLimit , after invoking the function ReorderBufferSerializeTXN, even without actually freeing any used memory in rb->tup_context, I could see spilled-related record in pg_stat_replication_slots. Could you please help to confirm this point if possible?
So on the local reproduction using the test scripts we have in the last two emails, I do see some streamed bytes on the test slot. However in production I still see 0 streamed or spilled bytes, and the walsenders there regularly reach some gigabytes of RSS. I think it is the same root bug but with a far greater scale in production (millions of tiny updates instead of 16 large ones). I should also note that in production we have ~40 subscriptions/walsenders rather than 1 in the test reproduction here, so there's a lot of extra CPU churning through the work.
Thanks for your continued analysis of the GenerationAlloc/Free stuff - I'm afraid I'm out of my depth there but let me know if you need any more information on reproducing the issue or testing patches etc.