On 2022-Sep-30, Michael Paquier wrote:
> On Thu, Sep 29, 2022 at 09:07:34PM -0700, Andres Freund wrote:
> > ISTM we should at least install a SIGINT/TERM handler in Cluster.pm that does
> > the stuff we already do in END.
>
> Hmm, indeed. And here I thought that END was actually taking care of
> that on an interrupt..
Me too. But the perlmod manpage says
An "END" code block is executed as late as possible, that is, after perl has
finished running the program and just before the interpreter is being exited,
even if it is exiting as a result of a die() function. (But not if it's
morphing into another program via "exec", or being blown out of the water by a
signal--you have to trap that yourself (if you can).)
So clearly we need to fix it. I thought it should be as simple as the
attached, since exit() calls END. (Would it be better to die() instead
of exit()?)
But on testing, some nodes linger after being sent a shutdown signal.
I'm not clear why this is -- I think it's due to the fact that we send
the signal just as the node is starting up, which means the signal
doesn't reach the process. (I added the 0002 patch --not for commit--
to see which Clusters were being shut down and in the trace file I can
clearly see that the nodes that linger were definitely subject to
->teardown_node).
Another funny thing: C-C'ing one run, I got this lingering process:
alvherre 800868 98.2 0.0 12144 5052 pts/9 R 11:03 0:26 /pgsql/install/master/bin/psql -X -c BASE_BACKUP
(CHECKPOINT'fast', MAX_RATE 32); -c SELECT pg_backup_stop() -d port=54380 host=/tmp/O_2PPNj9Fg dbname='postgres'
replication=database
This is probably a bug in psql. Backtrace is:
#0 PQclear (res=<optimized out>) at /pgsql/source/master/src/interfaces/libpq/fe-exec.c:748
#1 PQclear (res=res@entry=0x55ad308c6190) at /pgsql/source/master/src/interfaces/libpq/fe-exec.c:718
#2 0x000055ad2f303323 in ClearOrSaveResult (result=0x55ad308c6190) at /pgsql/source/master/src/bin/psql/common.c:472
#3 ClearOrSaveAllResults () at /pgsql/source/master/src/bin/psql/common.c:488
#4 ExecQueryAndProcessResults (query=query@entry=0x55ad308bc7a0 "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);",
elapsed_msec=elapsed_msec@entry=0x7fff9c9941d8, svpt_gone_p=svpt_gone_p@entry=0x7fff9c9941d7,
is_watch=is_watch@entry=false,
opt=opt@entry=0x0, printQueryFout=printQueryFout@entry=0x0) at /pgsql/source/master/src/bin/psql/common.c:1608
#5 0x000055ad2f301b9d in SendQuery (query=0x55ad308bc7a0 "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);")
at /pgsql/source/master/src/bin/psql/common.c:1172
#6 0x000055ad2f2f7bd9 in main (argc=<optimized out>, argv=<optimized out>) at
/pgsql/source/master/src/bin/psql/startup.c:384
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep." (Robert Davidson)
http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php