Logical Replication ERROR reporting issue - Mailing list pgsql-general

From Ranjan Gajare
Subject Logical Replication ERROR reporting issue
Date
Msg-id CACj5rkYQ_A6-h8TcVDuDRuuVUPzGdcTLS=1TOb-n=BB6ey2hMQ@mail.gmail.com
Whole thread Raw
List pgsql-general
Hello Folks,

We are having the issue with Logical Replication in Postgres 10.11 production environment that unable to get around.

Following is the production environment configuration
PostgreSQL Version: 10.11
OS: Ubuntu 16.04.3 LTS (Xenial Xerus)


The error message frequently occurring in the logs of the subscription server is :

LOG:  logical replication apply worker for subscription "<sub_name>" has started
ERROR:  terminating logical replication worker due to timeout
LOG:  background worker "logical replication worker" (PID <pid>) exited with exit code 1
LOG:  logical replication apply worker for subscription "<sub_name>" has started
ERROR:  could not start WAL streaming: ERROR:  replication slot "<slot_name>" is active for PID <pid>
LOG:  worker process: logical replication worker for subscription <sub_oid> (PID <pid>) exited with exit code 1


This results in filling up disk space on master due to too many WAL pending to apply. There are two ERROR messages observed here.

Looking at timeout ERROR we tried to simply increase 'wal_receiver_timeout' to '2min' (1min default). 'wal_sender_timeout' was already '2min'. It resolved the timeout ERROR and surprisingly the other error saying 'replication slot is active for PID' also vanished after that.

Does anyone have any idea how increasing the wal_receiver_timeout relates to 'ERROR:  could not start WAL streaming: ERROR:  replication slot "<slot_name>" is active for PID <pid>' OR is it just a flaw in error reporting?


Thanks for any help!

--
Regards,
Ranjan Gajare

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: policies and extensions
Next
From: Marc Munro
Date:
Subject: Re: policies and extensions