Thread: Out of Memory error triggering replica to transition into recovery mode

Hello Experts!

As the subject says, today very frequently our replica DB is going into the recovery mode causing an outage in the application side. 

Here are the server  & details:
Server type: Compute engine
OS: Ubuntu 20
Pgsql: 12.2
CPUs: 64
Memory: 128GB
Shared_buffers: 32GB
Work_mem: 256MB
maintenance_work_mem = 3GB
shared_buffers = 32GB
max_connections = 4000
Total size of the DBs: 3TB

The application is designed in such a way that it consumes data primarily from SECONDARY. And, there are several applications of such type. I can see tons of messages in the postgres log being written as:
"IP, 2024-11-28 ,<db name>, <user>,1, FATAL: the database system is in recovery mode"

This indicates that the app services are trying to connect to the DB constantly and there are tons of them.

Any advice on how we can improvise the situation.

Regards
Siraj
Siraj G <tosiraj.g@gmail.com> writes:
> As the subject says, today very frequently our replica DB is going into the
> recovery mode causing an outage in the application side.

If you're not on this month's minor releases, perhaps you should be:

    Reduce memory consumption of logical decoding (Masahiko Sawada)

        Use a smaller default block size to store tuple data received
        during logical replication. This reduces memory wastage, which
        has been reported to be severe while processing long-running
        transactions, even leading to out-of-memory failures.

I recall past updates that fixed other memory leaks in logical
replication, too.

> Pgsql: 12.2

Egad.  Your version-updating strategy seriously needs a rethink.

            regards, tom lane