Re: Logical replication - ERROR: could not send data to WAL stream:cannot allocate memory for input buffer - Mailing list pgsql-general

From Aleš Zelený
Subject Re: Logical replication - ERROR: could not send data to WAL stream:cannot allocate memory for input buffer
Date
Msg-id CAODqTUYzC5xgQQBscEcgNe_pH3bCjc9-VaNVJuMedm8+yhxdvw@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication - ERROR: could not send data to WAL stream:cannot allocate memory for input buffer  (Michael Paquier <michael@paquier.xyz>)
List pgsql-general
Thanks for the comment.

from what I was able to monitor memory usage was almost stable and there were about 20GB allocated as cached memory. Memory overcommit is disabled on the database server. Might it be a memory issue, since wit was synchronizing newly added tables with a sum of 380 GB of data containing JSONB columns (60 bytes to 100kBytes). The problem is, that I was not able to reproduce it since in dev environment it wors like a charm an as usual on PROD we were facing this issue.

It is clear that for memory allocation issues testcase would be appropriate, but I was not able to build reproducible testcase.

Thanks Ales

po 8. 6. 2020 v 8:41 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Fri, Jun 05, 2020 at 10:57:46PM +0200, Aleš Zelený wrote:
> we are using logical replication for more than 2 years and today I've found
> new not yet know error message from wal receiver. The replication was in
> catchup mode (on publisher side some new tables were created and added to
> publication, on subscriber side they were missing).

This comes from pqCheckInBufferSpace() in libpq when realloc() fails,
most probably because this host ran out of memory.

> Repeated several times, finally it proceeded and switch into streaming
> state. The OS has 64GB RAM, OS + database instance are using usually 20GB
> rest is used as OS buffers. I've checked monitoring (sampled every 10
> seconds) and no memory usage peak was visible, so unless it was a very
> short memory usage peak, I'd not expect the system running out of memory.
>
> Is there something I can do to diagnose and avoid this issue?

Does the memory usage increase slowly over time?  Perhaps it was not a
peak and the memory usage was not steady?  One thing that could always
be tried if you are able to get a rather reproducible case would be to
use valgrind and check if it is able to detect any leaks.  And I am
afraid that it is hard to act on this report without more information.
--
Michael

pgsql-general by date:

Previous
From: "Jim Hurne"
Date:
Subject: autovacuum failing on pg_largeobject and disk usage of the pg_largeobjectgrowing unchecked
Next
From: Michael Lewis
Date:
Subject: Re: autovacuum failing on pg_largeobject and disk usage of thepg_largeobject growing unchecked