Thread: pgsql: Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_re

Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore.

Formerly, Unix builds of pg_dump/pg_restore would trap SIGINT and similar
signals and set a flag that was tested in various data-transfer loops.
This was prone to errors of omission (cf commit 3c8aa6654); and even if
the client-side response was prompt, we did nothing that would cause
long-running SQL commands (e.g. CREATE INDEX) to terminate early.
Also, the master process would effectively do nothing at all upon receipt
of SIGINT; the only reason it seemed to work was that in typical scenarios
the signal would also be delivered to the child processes.  We should
support termination when a signal is delivered only to the master process,
though.

Windows builds had no console interrupt handler, so they would just fall
over immediately at control-C, again leaving long-running SQL commands to
finish unmolested.

To fix, remove the flag-checking approach altogether.  Instead, allow the
Unix signal handler to send a cancel request directly and then exit(1).
In the master process, also have it forward the signal to the children.
On Windows, add a console interrupt handler that behaves approximately
the same.  The main difference is that a single execution of the Windows
handler can send all the cancel requests since all the info is available
in one process, whereas on Unix each process sends a cancel only for its
own database connection.

In passing, fix an old problem that DisconnectDatabase tends to send a
cancel request before exiting a parallel worker, even if nothing went
wrong.  This is at least a waste of cycles, and could lead to unexpected
log messages, or maybe even data loss if it happened in pg_restore (though
in the current code the problem seems to affect only pg_dump).  The cause
was that after a COPY step, pg_dump was leaving libpq in PGASYNC_BUSY
state, causing PQtransactionStatus() to report PQTRANS_ACTIVE.  That's
normally harmless because the next PQexec() will silently clear the
PGASYNC_BUSY state; but in a parallel worker we might exit without any
additional SQL commands after a COPY step.  So add an extra PQgetResult()
call after a COPY to allow libpq to return to PGASYNC_IDLE state.

This is a bug fix, IMO, so back-patch to 9.3 where parallel dump/restore
were introduced.

Thanks to Kyotaro Horiguchi for Windows testing and code suggestions.

Original-Patch: <7005.1464657274@sss.pgh.pa.us>
Discussion: <20160602.174941.256342236.horiguchi.kyotaro@lab.ntt.co.jp>

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/e652273e073566b67c2fd36a5754b3fad2bc0291

Modified Files
--------------
src/bin/pg_dump/compress_io.c         |  10 -
src/bin/pg_dump/parallel.c            | 606 ++++++++++++++++++++++++++--------
src/bin/pg_dump/parallel.h            |   2 +-
src/bin/pg_dump/pg_backup_archiver.c  |   4 +
src/bin/pg_dump/pg_backup_archiver.h  |   3 +
src/bin/pg_dump/pg_backup_db.c        |  32 +-
src/bin/pg_dump/pg_backup_directory.c |  12 -
src/bin/pg_dump/pg_dump.c             |   5 +
8 files changed, 505 insertions(+), 169 deletions(-)


Re: pgsql: Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_re

From
Peter Eisentraut
Date:
On 6/2/16 1:28 PM, Tom Lane wrote:
> Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore.

These changes introduced several new compiler warnings under fortify rules:

parallel.c: In function ‘sigTermHandler’:
parallel.c:556:9: warning: ignoring return value of ‘write’, declared
with attribute warn_unused_result [-Wunused-result]
parallel.c:557:9: warning: ignoring return value of ‘write’, declared
with attribute warn_unused_result [-Wunused-result]
parallel.c:559:8: warning: ignoring return value of ‘write’, declared
with attribute warn_unused_result [-Wunused-result]

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 6/2/16 1:28 PM, Tom Lane wrote:
>> Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore.

> These changes introduced several new compiler warnings under fortify rules:

> parallel.c: In function ‘sigTermHandler’:
> parallel.c:556:9: warning: ignoring return value of ‘write’, declared
> with attribute warn_unused_result [-Wunused-result]

Hm, interesting --- I copied that write_stderr() macro from psql/common.c
and figured it was good.  But now that I look, only the uses of it in the
Windows code path there are straightforward; on the Unix side we have

            rc = write_stderr("Cancel request sent\n");
            (void) rc;            /* ignore errors, nothing we can do here */

which evidently was done to shut up exactly this type of overly-nannyish
warning.  I'm thinking we need to hide that dead-chicken-waving in the
macro itself.  Will deal with it tomorrow.

            regards, tom lane