Thread: pgsql: Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_re
Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore. Formerly, Unix builds of pg_dump/pg_restore would trap SIGINT and similar signals and set a flag that was tested in various data-transfer loops. This was prone to errors of omission (cf commit 3c8aa6654); and even if the client-side response was prompt, we did nothing that would cause long-running SQL commands (e.g. CREATE INDEX) to terminate early. Also, the master process would effectively do nothing at all upon receipt of SIGINT; the only reason it seemed to work was that in typical scenarios the signal would also be delivered to the child processes. We should support termination when a signal is delivered only to the master process, though. Windows builds had no console interrupt handler, so they would just fall over immediately at control-C, again leaving long-running SQL commands to finish unmolested. To fix, remove the flag-checking approach altogether. Instead, allow the Unix signal handler to send a cancel request directly and then exit(1). In the master process, also have it forward the signal to the children. On Windows, add a console interrupt handler that behaves approximately the same. The main difference is that a single execution of the Windows handler can send all the cancel requests since all the info is available in one process, whereas on Unix each process sends a cancel only for its own database connection. In passing, fix an old problem that DisconnectDatabase tends to send a cancel request before exiting a parallel worker, even if nothing went wrong. This is at least a waste of cycles, and could lead to unexpected log messages, or maybe even data loss if it happened in pg_restore (though in the current code the problem seems to affect only pg_dump). The cause was that after a COPY step, pg_dump was leaving libpq in PGASYNC_BUSY state, causing PQtransactionStatus() to report PQTRANS_ACTIVE. That's normally harmless because the next PQexec() will silently clear the PGASYNC_BUSY state; but in a parallel worker we might exit without any additional SQL commands after a COPY step. So add an extra PQgetResult() call after a COPY to allow libpq to return to PGASYNC_IDLE state. This is a bug fix, IMO, so back-patch to 9.3 where parallel dump/restore were introduced. Thanks to Kyotaro Horiguchi for Windows testing and code suggestions. Original-Patch: <7005.1464657274@sss.pgh.pa.us> Discussion: <20160602.174941.256342236.horiguchi.kyotaro@lab.ntt.co.jp> Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/e652273e073566b67c2fd36a5754b3fad2bc0291 Modified Files -------------- src/bin/pg_dump/compress_io.c | 10 - src/bin/pg_dump/parallel.c | 606 ++++++++++++++++++++++++++-------- src/bin/pg_dump/parallel.h | 2 +- src/bin/pg_dump/pg_backup_archiver.c | 4 + src/bin/pg_dump/pg_backup_archiver.h | 3 + src/bin/pg_dump/pg_backup_db.c | 32 +- src/bin/pg_dump/pg_backup_directory.c | 12 - src/bin/pg_dump/pg_dump.c | 5 + 8 files changed, 505 insertions(+), 169 deletions(-)
Re: pgsql: Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_re
From
Peter Eisentraut
Date:
On 6/2/16 1:28 PM, Tom Lane wrote: > Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore. These changes introduced several new compiler warnings under fortify rules: parallel.c: In function ‘sigTermHandler’: parallel.c:556:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] parallel.c:557:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] parallel.c:559:8: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 6/2/16 1:28 PM, Tom Lane wrote: >> Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore. > These changes introduced several new compiler warnings under fortify rules: > parallel.c: In function ‘sigTermHandler’: > parallel.c:556:9: warning: ignoring return value of ‘write’, declared > with attribute warn_unused_result [-Wunused-result] Hm, interesting --- I copied that write_stderr() macro from psql/common.c and figured it was good. But now that I look, only the uses of it in the Windows code path there are straightforward; on the Unix side we have rc = write_stderr("Cancel request sent\n"); (void) rc; /* ignore errors, nothing we can do here */ which evidently was done to shut up exactly this type of overly-nannyish warning. I'm thinking we need to hide that dead-chicken-waving in the macro itself. Will deal with it tomorrow. regards, tom lane