Home > mailing lists

Re: how is the WAL receiver process stopped and restarted when the network connection is broken and then restored? - Mailing list pgsql-hackers

From	Rui Hai Jiang
Subject	Re: how is the WAL receiver process stopped and restarted when the network connection is broken and then restored?
Date	June 23, 2016 17:56:24
Msg-id	CAEri+mLJjVD301LvKNmaGnr_VdmsHLBks7vaC2bWvYGLJjjuRw@mail.gmail.com Whole thread Raw
In response to	Re: how is the WAL receiver process stopped and restarted when the network connection is broken and then restored? (Craig Ringer <craig@2ndquadrant.com>)
Responses	Re: how is the WAL receiver process stopped and restarted when the network connection is broken and then restored? (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Thank you Craig for your suggestion.

I followed the clue and spent the whole day digging into the code.

Finally I figured out how the WAL receiver exits and restarts.

Question-1. How the WAL receiver process exits

===============================================

When the network connection is broken, WAL receiver couldn't communicate with the WAL sender. For a long time (timer：wal_receiver_timeout), the WAL receiver gets nothing from the WAL sender, the WAL receiver process exits by calling "ereport(ERROR,...)".

Calling ereport(ERROR,...) causes the current process exit, but calling ereport(LOG,...) doesn't.

WalReceiverMain(void)

{

len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);

if (len != 0)

{

}

else

{

if (wal_receiver_timeout > 0)

{

if (now >= timeout)

ereport(ERROR,

(errmsg("terminating walreceiver due to timeout")));

}

Question-2. How WAL receiver process starts again

=====================================================

At the Standby side, the startup process is responsible for recovery processing. If streaming replication is configured and the startup process finds that the WAL receiver process is not running, it notify the Postmaster to start the WAL receiver process.Note: This is also how the WAL receiver process starts for the first time!

(1) startup process notify Postmaster to start the WAL receiver by sending a SIGUSR1.

RequestXLogStreaming()

{

if (launch)

SendPostmasterSignal(PMSignalReason reason=PMSIGNAL_START_WALRECEIVER)

{

kill(PostmasterPid, SIGUSR1);

}

(2) Postmaster gets SIGUSR1 and starts the WAL receiver process.

sigusr1_handler(SIGNAL_ARGS)

{

WalReceiverPID = StartWalReceiver();

}

Please let me know if my understanding is incorrect.

thanks,

Rui Hai

pgsql-hackers by date:

From: Tom Lane
Date: 23 June 2016, 16:59:40
Subject: Re: PQconnectdbParams vs PQconninfoParse

From: Tom Lane
Date: 23 June 2016, 17:57:33
Subject: Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype

Re: how is the WAL receiver process stopped and restarted when the network connection is broken and then restored? - Mailing list pgsql-hackers

Previous

Next