Re: Logical Replica ReorderBuffer Size Accounting Issues - Mailing list pgsql-bugs

From Alex Richman
Subject Re: Logical Replica ReorderBuffer Size Accounting Issues
Date
Msg-id CAMnUB3o_=q4FYZ5yqCgy=TQUk=ce49RNr7CJD4EGP_=KuYrzFA@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replica ReorderBuffer Size Accounting Issues  (Alex Richman <alexrichman@onesignal.com>)
Responses Re: Logical Replica ReorderBuffer Size Accounting Issues  (Gilles Darold <gilles@darold.net>)
List pgsql-bugs
Hi all,

Looping back to say we updated to 15.2 and are still seeing this issue, though it is less prevalent.

Thanks,
- Alex.

On Wed, 18 Jan 2023 at 11:16, Alex Richman <alexrichman@onesignal.com> wrote:


On Wed, 18 Jan 2023 at 10:10, Amit Kapila <amit.kapila16@gmail.com> wrote:
Alex,
Do we see this problem with small tuples as well? I see from your
earlier email that tuple size is ~800 bytes in the production
environment. It is possible that after commit 1b0d9aa4 such kind of
problems are not there with small tuple sizes but that commit happened
in PG15 whereas your production environment might be on a prior
release.

Hi Amit,

Our prod environment is also on 15.1, which is where we first saw the issue, so I'm afraid the issue still seems to be present here.

Looping back on the earlier discussion, we applied the malloc patch from [1] ([2]) to a prod server, which also fixes the issue there.  Attached is a graph of the last 48 hours of memory usage, the ~200GB spikes are instances of the LR walsender memory issue.
After patch is applied (blue mark), baseline memory drops and we no longer see the spikes.  Per-process memory stats corroborate that the LR walsender memory is now never more than a few MB RSS per process.

Thanks,
- Alex.

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17797: connection error
Next
From: "David G. Johnston"
Date:
Subject: Re: BUG #17797: connection error