signal handling in plpython - Mailing list pgsql-hackers

From Mario De Frutos Dieguez
Subject signal handling in plpython
Date
Msg-id CAFYwGJ3+Xg7EcL2nU-MxX6p+O6c895Pm3mYZ-b+9n9DffEh5MQ@mail.gmail.com
Whole thread Raw
Responses Re: signal handling in plpython  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Hello everyone :).

First of all, I want to introduce me to this list. My name is Mario de Frutos and I work at CARTO :)

I come here asking for some advice/help because we're facing some unexpected behavior when we want to interrupt functions doing CPU intensive operations in plpython.

Our problem is that we're not able to interrupt them when they're making CPU intensive operations. For example, when calculating Moran using PySAL, the SIGINT handler of Postgres is not able to cancel it.

I want to show you some possible solutions that I've tried without success:

- If we don't add a custom signal handler, we're not able to interrupt the function when it's making CPU intensive operations. When the `SIGINT` signal is launched, the system is not able to interrupt it until the function ends.
- If we add a custom signal handler for the `SIGINT`, we are able to interrupt the CPU intensive function but we're not able to interrupt data fetching operations like `plpy.execute(query)` because we have overridden the Postgres handler for that signal.
- As a third option I've added a python context manager to wrap, for testing purposes, the CPU intensive part (Moran function from PySAL):
```
def _signal_handler(signal_code, frame):
    plpy.error(INTERRUPTED BY USER!!')


@contextmanager
def interruptible():
    try:
        signal.signal(signal.SIGINT, _signal_handler)
        yield
    finally:
        # Restore the default behavoiur for the signal
        signal.signal(signal.SIGINT, signal.SIG_DFL)
```
  This doesn't work as expected because in the `finally` clause we try to reset to the default behavior but in Postgres, the behavior for the SIGINT signal is defined by a [custom handler](https://github.com/postgres/postgres/blob/master/src/include/tcop/tcopprot.h#L66).
  If we try to retrieve the old handler using `signal.getsignal` we get a None object

So after all,going back and forth I came up with two possible solutions:
- [custom code] in `plpython` to make us able to reset the default signal handler after finish the CPU intensive functions. It seems to work but I'm still doing some tests. This option lets us call it explicitly and add it to the `finally` part of a decorator/context manager
- Reset the signal handler at the beginning of the `plpy.execute` or alike functions like [here].

As an extra ball, we want to implement the SIGALRM part to mimic the "statement timeout" behavior too

I don't know if there is a better way to implement this, I know we're pushing/doing things beyond the scope of plpython but any advise is welcome :)

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: parallel.sgml
Next
From: Jim Nasby
Date:
Subject: make coverage-html on OS X