Thread: pgsql: Use the regular main processing loop also in walsenders.
Use the regular main processing loop also in walsenders. The regular backend's main loop handles signal handling and error recovery better than the current WAL sender command loop does. For example, if the client hangs and a SIGTERM is received before starting streaming, the walsender will now terminate immediately, rather than hang until the connection times out. Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/fd5942c18f977a36fec66a8d1281092805f2a55e Modified Files -------------- src/backend/replication/basebackup.c | 16 +-- src/backend/replication/walsender.c | 269 ++++++++-------------------------- src/backend/tcop/postgres.c | 51 ++++++- src/include/replication/walsender.h | 5 +- 4 files changed, 109 insertions(+), 232 deletions(-)
On 5 October 2012 15:26, Heikki Linnakangas <heikki.linnakangas@iki.fi> wrote: > Use the regular main processing loop also in walsenders. > > The regular backend's main loop handles signal handling and error recovery > better than the current WAL sender command loop does. For example, if the > client hangs and a SIGTERM is received before starting streaming, the > walsender will now terminate immediately, rather than hang until the > connection times out. This commit seems to have broken the WAL sender in at least one scenario. I have a primary and 2 standbys, standby 1 receiving WAL stream from the primary, and standby 2 receiving WAL stream from standby 1 (chain configuration). If I attempt to restart standby 1, it hangs and the WAL sender process on standby 1 uses 100% CPU. The following error is logged too: FATAL: terminating walreceiver process due to administrator command I can shut down standby 1 without issue only if I shut down standby 2 before it. -- Thom
On 6 October 2012 22:52, Thom Brown <thom@linux.com> wrote: > On 5 October 2012 15:26, Heikki Linnakangas <heikki.linnakangas@iki.fi> wrote: >> Use the regular main processing loop also in walsenders. >> >> The regular backend's main loop handles signal handling and error recovery >> better than the current WAL sender command loop does. For example, if the >> client hangs and a SIGTERM is received before starting streaming, the >> walsender will now terminate immediately, rather than hang until the >> connection times out. > > This commit seems to have broken the WAL sender in at least one > scenario. I have a primary and 2 standbys, standby 1 receiving WAL > stream from the primary, and standby 2 receiving WAL stream from > standby 1 (chain configuration). If I attempt to restart standby 1, > it hangs and the WAL sender process on standby 1 uses 100% CPU. > > The following error is logged too: > FATAL: terminating walreceiver process due to administrator command > > I can shut down standby 1 without issue only if I shut down standby 2 before it. This was just a description of the scenario I was using. The same occurs with just 1 standby and attempting to shut down the primary. -- Thom
On 07.10.2012 12:24, Thom Brown wrote: > On 6 October 2012 22:52, Thom Brown<thom@linux.com> wrote: >> On 5 October 2012 15:26, Heikki Linnakangas<heikki.linnakangas@iki.fi> wrote: >>> Use the regular main processing loop also in walsenders. >>> >>> The regular backend's main loop handles signal handling and error recovery >>> better than the current WAL sender command loop does. For example, if the >>> client hangs and a SIGTERM is received before starting streaming, the >>> walsender will now terminate immediately, rather than hang until the >>> connection times out. >> >> This commit seems to have broken the WAL sender in at least one >> scenario. I have a primary and 2 standbys, standby 1 receiving WAL >> stream from the primary, and standby 2 receiving WAL stream from >> standby 1 (chain configuration). If I attempt to restart standby 1, >> it hangs and the WAL sender process on standby 1 uses 100% CPU. >> >> The following error is logged too: >> FATAL: terminating walreceiver process due to administrator command >> >> I can shut down standby 1 without issue only if I shut down standby 2 before it. > > This was just a description of the scenario I was using. The same > occurs with just 1 standby and attempting to shut down the primary. Fixed, thanks for the report. I set ProcDiePending = true to kill the process, but that didn't do anything without also setting InterruptPending = true. On second thoughts, it wasn't a very good way to make walsender exit, anyway, so I changed it to use proc_exit(0), like it used to. - Heikki
On 8 October 2012 11:34, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 07.10.2012 12:24, Thom Brown wrote: >> >> On 6 October 2012 22:52, Thom Brown<thom@linux.com> wrote: >>> >>> On 5 October 2012 15:26, Heikki Linnakangas<heikki.linnakangas@iki.fi> >>> wrote: >>>> >>>> Use the regular main processing loop also in walsenders. >>>> >>>> The regular backend's main loop handles signal handling and error >>>> recovery >>>> better than the current WAL sender command loop does. For example, if >>>> the >>>> client hangs and a SIGTERM is received before starting streaming, the >>>> walsender will now terminate immediately, rather than hang until the >>>> connection times out. >>> >>> >>> This commit seems to have broken the WAL sender in at least one >>> scenario. I have a primary and 2 standbys, standby 1 receiving WAL >>> stream from the primary, and standby 2 receiving WAL stream from >>> standby 1 (chain configuration). If I attempt to restart standby 1, >>> it hangs and the WAL sender process on standby 1 uses 100% CPU. >>> >>> The following error is logged too: >>> FATAL: terminating walreceiver process due to administrator command >>> >>> I can shut down standby 1 without issue only if I shut down standby 2 >>> before it. >> >> >> This was just a description of the scenario I was using. The same >> occurs with just 1 standby and attempting to shut down the primary. > > > Fixed, thanks for the report. I set ProcDiePending = true to kill the > process, but that didn't do anything without also setting InterruptPending = > true. On second thoughts, it wasn't a very good way to make walsender exit, > anyway, so I changed it to use proc_exit(0), like it used to. Thanks. -- Thom