Thread: [COMMITTERS] pgsql: Fix signal handling in logical replication workers
Fix signal handling in logical replication workers The logical replication worker processes now use the normal die() handler for SIGTERM and CHECK_FOR_INTERRUPTS() instead of custom code. One problem before was that the apply worker would not exit promptly when a subscription was dropped, which could lead to deadlocks. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Masahiko Sawada <sawada.mshk@gmail.com> Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/9fcf670c2efdf31233d429f557ab77937f0f1e6a Modified Files -------------- src/backend/replication/logical/launcher.c | 16 +++++++------- src/backend/replication/logical/tablesync.c | 10 ++++----- src/backend/replication/logical/worker.c | 34 ++++++++++++++++++++++++++--- src/backend/tcop/postgres.c | 5 +++++ src/include/replication/logicalworker.h | 2 ++ src/include/replication/worker_internal.h | 4 ---- 6 files changed, 50 insertions(+), 21 deletions(-)
Peter Eisentraut <peter_e@gmx.net> writes: > Fix signal handling in logical replication workers It looks like this broke buildfarm member nightjar. Not clear why - I don't see anything especially platform-specific in the patch. regards, tom lane
I wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Fix signal handling in logical replication workers > It looks like this broke buildfarm member nightjar. > Not clear why - I don't see anything especially platform-specific > in the patch. To muddy the waters further, I tried to duplicate the failure on FreeBSD 11.0/x86_64, and it seems to pass just fine. Maybe Andrew can look into why nightjar is failing. regards, tom lane
On 03/06/17 02:59, Tom Lane wrote: > I wrote: >> Peter Eisentraut <peter_e@gmx.net> writes: >>> Fix signal handling in logical replication workers > >> It looks like this broke buildfarm member nightjar. >> Not clear why - I don't see anything especially platform-specific >> in the patch. > > To muddy the waters further, I tried to duplicate the failure on > FreeBSD 11.0/x86_64, and it seems to pass just fine. Maybe Andrew > can look into why nightjar is failing. > There is still one locking patch pending (well pending to be written), I would not be surprised if there is race condition in shutdown somewhere before that's done. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 06/02/2017 09:13 PM, Petr Jelinek wrote: > On 03/06/17 02:59, Tom Lane wrote: >> I wrote: >>> Peter Eisentraut <peter_e@gmx.net> writes: >>>> Fix signal handling in logical replication workers >>> It looks like this broke buildfarm member nightjar. >>> Not clear why - I don't see anything especially platform-specific >>> in the patch. >> To muddy the waters further, I tried to duplicate the failure on >> FreeBSD 11.0/x86_64, and it seems to pass just fine. Maybe Andrew >> can look into why nightjar is failing. >> > There is still one locking patch pending (well pending to be written), I > would not be surprised if there is race condition in shutdown somewhere > before that's done. > nightjar has been having intermittent failures on the subscription tests for some time. See <https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=nightjar&br=HEAD>. It's only been running the tests for about 53 days. I'm prepared to give any help needed, including access to nightjar if required. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services