Re: interrupted tap tests leave postgres instances around - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: interrupted tap tests leave postgres instances around
Date
Msg-id 20220930091700.uxdconisg64ioqpo@alvherre.pgsql
Whole thread Raw
In response to Re: interrupted tap tests leave postgres instances around  (Michael Paquier <michael@paquier.xyz>)
Responses Re: interrupted tap tests leave postgres instances around
Re: interrupted tap tests leave postgres instances around
List pgsql-hackers
On 2022-Sep-30, Michael Paquier wrote:

> On Thu, Sep 29, 2022 at 09:07:34PM -0700, Andres Freund wrote:
> > ISTM we should at least install a SIGINT/TERM handler in Cluster.pm that does
> > the stuff we already do in END.
> 
> Hmm, indeed.  And here I thought that END was actually taking care of
> that on an interrupt..

Me too.  But the perlmod manpage says

       An "END" code block is executed as late as possible, that is, after perl has
       finished running the program and just before the interpreter is being exited,
       even if it is exiting as a result of a die() function.  (But not if it's
       morphing into another program via "exec", or being blown out of the water by a
       signal--you have to trap that yourself (if you can).)

So clearly we need to fix it.  I thought it should be as simple as the
attached, since exit() calls END.  (Would it be better to die() instead
of exit()?)

But on testing, some nodes linger after being sent a shutdown signal.
I'm not clear why this is -- I think it's due to the fact that we send
the signal just as the node is starting up, which means the signal
doesn't reach the process.  (I added the 0002 patch --not for commit--
to see which Clusters were being shut down and in the trace file I can
clearly see that the nodes that linger were definitely subject to
->teardown_node).


Another funny thing: C-C'ing one run, I got this lingering process:

alvherre  800868 98.2  0.0  12144  5052 pts/9    R    11:03   0:26 /pgsql/install/master/bin/psql -X -c BASE_BACKUP
(CHECKPOINT'fast', MAX_RATE 32); -c SELECT pg_backup_stop() -d port=54380 host=/tmp/O_2PPNj9Fg dbname='postgres'
replication=database

This is probably a bug in psql.  Backtrace is:

#0  PQclear (res=<optimized out>) at /pgsql/source/master/src/interfaces/libpq/fe-exec.c:748
#1  PQclear (res=res@entry=0x55ad308c6190) at /pgsql/source/master/src/interfaces/libpq/fe-exec.c:718
#2  0x000055ad2f303323 in ClearOrSaveResult (result=0x55ad308c6190) at /pgsql/source/master/src/bin/psql/common.c:472
#3  ClearOrSaveAllResults () at /pgsql/source/master/src/bin/psql/common.c:488
#4  ExecQueryAndProcessResults (query=query@entry=0x55ad308bc7a0 "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);", 
    elapsed_msec=elapsed_msec@entry=0x7fff9c9941d8, svpt_gone_p=svpt_gone_p@entry=0x7fff9c9941d7,
is_watch=is_watch@entry=false,
 
    opt=opt@entry=0x0, printQueryFout=printQueryFout@entry=0x0) at /pgsql/source/master/src/bin/psql/common.c:1608
#5  0x000055ad2f301b9d in SendQuery (query=0x55ad308bc7a0 "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);")
    at /pgsql/source/master/src/bin/psql/common.c:1172
#6  0x000055ad2f2f7bd9 in main (argc=<optimized out>, argv=<optimized out>) at
/pgsql/source/master/src/bin/psql/startup.c:384


-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."                (Robert Davidson)
               http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Daniel Gustafsson
Date:
Subject: Documentation building fails on HTTPS redirect (again)