Thread: pgsql: Use the regular main processing loop also in walsenders.

pgsql: Use the regular main processing loop also in walsenders.

From
Heikki Linnakangas
Date:
Use the regular main processing loop also in walsenders.

The regular backend's main loop handles signal handling and error recovery
better than the current WAL sender command loop does. For example, if the
client hangs and a SIGTERM is received before starting streaming, the
walsender will now terminate immediately, rather than hang until the
connection times out.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/fd5942c18f977a36fec66a8d1281092805f2a55e

Modified Files
--------------
src/backend/replication/basebackup.c |   16 +--
src/backend/replication/walsender.c  |  269 ++++++++--------------------------
src/backend/tcop/postgres.c          |   51 ++++++-
src/include/replication/walsender.h  |    5 +-
4 files changed, 109 insertions(+), 232 deletions(-)


Re: pgsql: Use the regular main processing loop also in walsenders.

From
Thom Brown
Date:
On 5 October 2012 15:26, Heikki Linnakangas <heikki.linnakangas@iki.fi> wrote:
> Use the regular main processing loop also in walsenders.
>
> The regular backend's main loop handles signal handling and error recovery
> better than the current WAL sender command loop does. For example, if the
> client hangs and a SIGTERM is received before starting streaming, the
> walsender will now terminate immediately, rather than hang until the
> connection times out.

This commit seems to have broken the WAL sender in at least one
scenario.  I have a primary and 2 standbys, standby 1 receiving WAL
stream from the primary, and standby 2 receiving WAL stream from
standby 1 (chain configuration).  If I attempt to restart standby 1,
it hangs and the WAL sender process on standby 1 uses 100% CPU.

The following error is logged too:
FATAL:  terminating walreceiver process due to administrator command

I can shut down standby 1 without issue only if I shut down standby 2 before it.
--
Thom


Re: pgsql: Use the regular main processing loop also in walsenders.

From
Thom Brown
Date:
On 6 October 2012 22:52, Thom Brown <thom@linux.com> wrote:
> On 5 October 2012 15:26, Heikki Linnakangas <heikki.linnakangas@iki.fi> wrote:
>> Use the regular main processing loop also in walsenders.
>>
>> The regular backend's main loop handles signal handling and error recovery
>> better than the current WAL sender command loop does. For example, if the
>> client hangs and a SIGTERM is received before starting streaming, the
>> walsender will now terminate immediately, rather than hang until the
>> connection times out.
>
> This commit seems to have broken the WAL sender in at least one
> scenario.  I have a primary and 2 standbys, standby 1 receiving WAL
> stream from the primary, and standby 2 receiving WAL stream from
> standby 1 (chain configuration).  If I attempt to restart standby 1,
> it hangs and the WAL sender process on standby 1 uses 100% CPU.
>
> The following error is logged too:
> FATAL:  terminating walreceiver process due to administrator command
>
> I can shut down standby 1 without issue only if I shut down standby 2 before it.

This was just a description of the scenario I was using.  The same
occurs with just 1 standby and attempting to shut down the primary.

--
Thom


Re: pgsql: Use the regular main processing loop also in walsenders.

From
Heikki Linnakangas
Date:
On 07.10.2012 12:24, Thom Brown wrote:
> On 6 October 2012 22:52, Thom Brown<thom@linux.com>  wrote:
>> On 5 October 2012 15:26, Heikki Linnakangas<heikki.linnakangas@iki.fi>  wrote:
>>> Use the regular main processing loop also in walsenders.
>>>
>>> The regular backend's main loop handles signal handling and error recovery
>>> better than the current WAL sender command loop does. For example, if the
>>> client hangs and a SIGTERM is received before starting streaming, the
>>> walsender will now terminate immediately, rather than hang until the
>>> connection times out.
>>
>> This commit seems to have broken the WAL sender in at least one
>> scenario.  I have a primary and 2 standbys, standby 1 receiving WAL
>> stream from the primary, and standby 2 receiving WAL stream from
>> standby 1 (chain configuration).  If I attempt to restart standby 1,
>> it hangs and the WAL sender process on standby 1 uses 100% CPU.
>>
>> The following error is logged too:
>> FATAL:  terminating walreceiver process due to administrator command
>>
>> I can shut down standby 1 without issue only if I shut down standby 2 before it.
>
> This was just a description of the scenario I was using.  The same
> occurs with just 1 standby and attempting to shut down the primary.

Fixed, thanks for the report. I set ProcDiePending = true to kill the
process, but that didn't do anything without also setting
InterruptPending = true. On second thoughts, it wasn't a very good way
to make walsender exit, anyway, so I changed it to use proc_exit(0),
like it used to.

- Heikki


Re: pgsql: Use the regular main processing loop also in walsenders.

From
Thom Brown
Date:
On 8 October 2012 11:34, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 07.10.2012 12:24, Thom Brown wrote:
>>
>> On 6 October 2012 22:52, Thom Brown<thom@linux.com>  wrote:
>>>
>>> On 5 October 2012 15:26, Heikki Linnakangas<heikki.linnakangas@iki.fi>
>>> wrote:
>>>>
>>>> Use the regular main processing loop also in walsenders.
>>>>
>>>> The regular backend's main loop handles signal handling and error
>>>> recovery
>>>> better than the current WAL sender command loop does. For example, if
>>>> the
>>>> client hangs and a SIGTERM is received before starting streaming, the
>>>> walsender will now terminate immediately, rather than hang until the
>>>> connection times out.
>>>
>>>
>>> This commit seems to have broken the WAL sender in at least one
>>> scenario.  I have a primary and 2 standbys, standby 1 receiving WAL
>>> stream from the primary, and standby 2 receiving WAL stream from
>>> standby 1 (chain configuration).  If I attempt to restart standby 1,
>>> it hangs and the WAL sender process on standby 1 uses 100% CPU.
>>>
>>> The following error is logged too:
>>> FATAL:  terminating walreceiver process due to administrator command
>>>
>>> I can shut down standby 1 without issue only if I shut down standby 2
>>> before it.
>>
>>
>> This was just a description of the scenario I was using.  The same
>> occurs with just 1 standby and attempting to shut down the primary.
>
>
> Fixed, thanks for the report. I set ProcDiePending = true to kill the
> process, but that didn't do anything without also setting InterruptPending =
> true. On second thoughts, it wasn't a very good way to make walsender exit,
> anyway, so I changed it to use proc_exit(0), like it used to.

Thanks.
--
Thom