Re: Bogus cleanup code in PostgresNode.pm - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Bogus cleanup code in PostgresNode.pm
Date
Msg-id CAB7nPqSo-GpjEHqKi94U-sbsnPrJw4NkMfkS9NphXC+0JCapHQ@mail.gmail.com
Whole thread Raw
In response to Bogus cleanup code in PostgresNode.pm  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bogus cleanup code in PostgresNode.pm
List pgsql-hackers
On Mon, Apr 25, 2016 at 11:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I noticed that even when they are successful, buildfarm members bowerbird
> and jacana tend to spew a lot of messages like this in their bin-check
> steps:
>
> Can't remove directory
/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/global:Directory not empty at
/usr/lib/perl5/5.8/File/Temp.pmline 898
 
> Can't remove directory
/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/pg_xlog:Directory not empty at
/usr/lib/perl5/5.8/File/Temp.pmline 898
 
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata:
Permissiondenied at /usr/lib/perl5/5.8/File/Temp.pm line 898
 
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf: Directory
notempty at /usr/lib/perl5/5.8/File/Temp.pm line 898
 
> ### Signalling QUIT to 9156 for node "main"
> # Running: pg_ctl kill QUIT 9156
>
> What is happening here is that the test script is not bothering to do an
> explicit $node->stop operation, and if it doesn't, the automatic cleanup
> steps happen in the wrong order: the File::Temp destructor for the temp
> data directory runs before PostgresNode.pm's DESTROY function, which is
> what's issuing the "pg_ctl kill" command.  On Unix that's just messy,
> but on Windows it fails because you can't delete a process's working
> directory.  I am not sure whether this is guaranteed wrong or just
> sometimes wrong; the Perl docs I can find say that destructors are run in
> unspecified order once interpreter shutdown begins.  But by adding some
> debug printout I was able to verify on my own machine that the data
> directory was already gone when DESTROY runs.

The docs say regarding File::Temp that he object is removed once the
object goes out of scope in the parent:
http://search.cpan.org/~dagolden/File-Temp-0.2304/lib/File/Temp.pm
So basically it means that when we enter in PostgresNode's DESTROY the
temporary folder just "went out of scope" and has been removed?

DESTROY is run once per object, END is a global destructor, and END is
called really at the end of the execution. And actually one reason why
a DESTROY block instead of END is given by Alvaro here:
http://www.postgresql.org/message-id/20151201231121.GI2763@alvherre.pgsql
"
- I changed start/stop/restart so that they keep track of the postmaster PID; also added a DESTROY sub to PostgresNode
thatsends SIGQUIT. This means that when the test finishes, the server gets an immediate stop signal.  We were getting a
lotof errors in the server log about failing to write to the stats file otherwise, until the node noticed that the
datadirwas gone.
 
"

> I believe we can fix this by forcing postmaster shutdown in an END
> routine instead of a DESTROY routine, and hence propose the attached
> patch, which does things in the right order for me.  I'm a pretty
> poor Perl programmer, so I'd appreciate somebody vetting this.

Another, perhaps more solid approach, would be put the DESTROY method
in charge of removing PGDATA and extend TestLib::tempdir with an
argument to be able to switch to CLEANUP => 0 at will. Then we use
this argument for PGDATA after sending SIGQUIT.
-- 
Michael



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Verifying embedded oids in *recv is a bad idea
Next
From: Amit Kapila
Date:
Subject: Re: Support for N synchronous standby servers - take 2