Thread: Logical replication - ERROR: could not send data to WAL stream:cannot allocate memory for input buffer

Hello,

we are using logical replication for more than 2 years and today I've found new not yet know error message from wal receiver. The replication was in catchup mode (on publisher side some new tables were created and added to publication, on subscriber side they were missing).

RDBMS version:PostgreSQL 11.4 (Ubuntu 11.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
OS: Ubuntu 18.04.2 LTS

RDBMS was installed from pgdg packages.

The error message:
2020-06-05 20:00:08 UTC 19753 5edaa378.4d29 2    0 540/1132087   [XX000]:ERROR:  could not send data to WAL stream: cannot allocate memory for input buffer
2020-06-05 20:00:08 UTC 867 5df8a0b4.363 28613    0    [00000]:LOG:  background worker "logical replication worker" (PID 19753) exited with exit code 1

Repeated several times, finally it proceeded and switch into streaming state. The OS has 64GB RAM, OS + database instance are using usually 20GB rest is used as OS buffers. I've checked monitoring (sampled every 10 seconds) and no memory usage peak was visible, so unless it was a very short memory usage peak, I'd not expect the system running out of memory.

Is there something I can do to diagnose and avoid this issue?

Thanks Ales
On Fri, Jun 05, 2020 at 10:57:46PM +0200, Aleš Zelený wrote:
> we are using logical replication for more than 2 years and today I've found
> new not yet know error message from wal receiver. The replication was in
> catchup mode (on publisher side some new tables were created and added to
> publication, on subscriber side they were missing).

This comes from pqCheckInBufferSpace() in libpq when realloc() fails,
most probably because this host ran out of memory.

> Repeated several times, finally it proceeded and switch into streaming
> state. The OS has 64GB RAM, OS + database instance are using usually 20GB
> rest is used as OS buffers. I've checked monitoring (sampled every 10
> seconds) and no memory usage peak was visible, so unless it was a very
> short memory usage peak, I'd not expect the system running out of memory.
>
> Is there something I can do to diagnose and avoid this issue?

Does the memory usage increase slowly over time?  Perhaps it was not a
peak and the memory usage was not steady?  One thing that could always
be tried if you are able to get a rather reproducible case would be to
use valgrind and check if it is able to detect any leaks.  And I am
afraid that it is hard to act on this report without more information.
--
Michael

Attachment
Thanks for the comment.

from what I was able to monitor memory usage was almost stable and there were about 20GB allocated as cached memory. Memory overcommit is disabled on the database server. Might it be a memory issue, since wit was synchronizing newly added tables with a sum of 380 GB of data containing JSONB columns (60 bytes to 100kBytes). The problem is, that I was not able to reproduce it since in dev environment it wors like a charm an as usual on PROD we were facing this issue.

It is clear that for memory allocation issues testcase would be appropriate, but I was not able to build reproducible testcase.

Thanks Ales

po 8. 6. 2020 v 8:41 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Fri, Jun 05, 2020 at 10:57:46PM +0200, Aleš Zelený wrote:
> we are using logical replication for more than 2 years and today I've found
> new not yet know error message from wal receiver. The replication was in
> catchup mode (on publisher side some new tables were created and added to
> publication, on subscriber side they were missing).

This comes from pqCheckInBufferSpace() in libpq when realloc() fails,
most probably because this host ran out of memory.

> Repeated several times, finally it proceeded and switch into streaming
> state. The OS has 64GB RAM, OS + database instance are using usually 20GB
> rest is used as OS buffers. I've checked monitoring (sampled every 10
> seconds) and no memory usage peak was visible, so unless it was a very
> short memory usage peak, I'd not expect the system running out of memory.
>
> Is there something I can do to diagnose and avoid this issue?

Does the memory usage increase slowly over time?  Perhaps it was not a
peak and the memory usage was not steady?  One thing that could always
be tried if you are able to get a rather reproducible case would be to
use valgrind and check if it is able to detect any leaks.  And I am
afraid that it is hard to act on this report without more information.
--
Michael