Thread: [COMMITTERS] pgsql: Fix signal handling in logical replication workers

[COMMITTERS] pgsql: Fix signal handling in logical replication workers

From
Peter Eisentraut
Date:
Fix signal handling in logical replication workers

The logical replication worker processes now use the normal die()
handler for SIGTERM and CHECK_FOR_INTERRUPTS() instead of custom code.
One problem before was that the apply worker would not exit promptly
when a subscription was dropped, which could lead to deadlocks.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/9fcf670c2efdf31233d429f557ab77937f0f1e6a

Modified Files
--------------
src/backend/replication/logical/launcher.c  | 16 +++++++-------
src/backend/replication/logical/tablesync.c | 10 ++++-----
src/backend/replication/logical/worker.c    | 34 ++++++++++++++++++++++++++---
src/backend/tcop/postgres.c                 |  5 +++++
src/include/replication/logicalworker.h     |  2 ++
src/include/replication/worker_internal.h   |  4 ----
6 files changed, 50 insertions(+), 21 deletions(-)


Peter Eisentraut <peter_e@gmx.net> writes:
> Fix signal handling in logical replication workers

It looks like this broke buildfarm member nightjar.
Not clear why - I don't see anything especially platform-specific
in the patch.

            regards, tom lane


I wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Fix signal handling in logical replication workers

> It looks like this broke buildfarm member nightjar.
> Not clear why - I don't see anything especially platform-specific
> in the patch.

To muddy the waters further, I tried to duplicate the failure on
FreeBSD 11.0/x86_64, and it seems to pass just fine.  Maybe Andrew
can look into why nightjar is failing.

            regards, tom lane


Re: [COMMITTERS] pgsql: Fix signal handling in logical replicationworkers

From
Petr Jelinek
Date:
On 03/06/17 02:59, Tom Lane wrote:
> I wrote:
>> Peter Eisentraut <peter_e@gmx.net> writes:
>>> Fix signal handling in logical replication workers
>
>> It looks like this broke buildfarm member nightjar.
>> Not clear why - I don't see anything especially platform-specific
>> in the patch.
>
> To muddy the waters further, I tried to duplicate the failure on
> FreeBSD 11.0/x86_64, and it seems to pass just fine.  Maybe Andrew
> can look into why nightjar is failing.
>

There is still one locking patch pending (well pending to be written), I
would not be surprised if there is race condition in shutdown somewhere
before that's done.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Re: [COMMITTERS] pgsql: Fix signal handling in logical replicationworkers

From
Andrew Dunstan
Date:

On 06/02/2017 09:13 PM, Petr Jelinek wrote:
> On 03/06/17 02:59, Tom Lane wrote:
>> I wrote:
>>> Peter Eisentraut <peter_e@gmx.net> writes:
>>>> Fix signal handling in logical replication workers
>>> It looks like this broke buildfarm member nightjar.
>>> Not clear why - I don't see anything especially platform-specific
>>> in the patch.
>> To muddy the waters further, I tried to duplicate the failure on
>> FreeBSD 11.0/x86_64, and it seems to pass just fine.  Maybe Andrew
>> can look into why nightjar is failing.
>>
> There is still one locking patch pending (well pending to be written), I
> would not be surprised if there is race condition in shutdown somewhere
> before that's done.
>



nightjar has been having intermittent failures on the subscription tests
for some time. See
<https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=nightjar&br=HEAD>.
It's only been running the tests for about 53 days.

I'm prepared to give any help needed, including access to nightjar if
required.

cheers

andrew

--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services