Thread: Bogus cleanup code in PostgresNode.pm

Bogus cleanup code in PostgresNode.pm

From
Tom Lane
Date:
I noticed that even when they are successful, buildfarm members bowerbird
and jacana tend to spew a lot of messages like this in their bin-check
steps:

Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/global:
Directorynot empty at /usr/lib/perl5/5.8/File/Temp.pm line 898 
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/pg_xlog:
Directorynot empty at /usr/lib/perl5/5.8/File/Temp.pm line 898 
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata:
Permissiondenied at /usr/lib/perl5/5.8/File/Temp.pm line 898 
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf: Directory not
emptyat /usr/lib/perl5/5.8/File/Temp.pm line 898 
### Signalling QUIT to 9156 for node "main"
# Running: pg_ctl kill QUIT 9156

What is happening here is that the test script is not bothering to do an
explicit $node->stop operation, and if it doesn't, the automatic cleanup
steps happen in the wrong order: the File::Temp destructor for the temp
data directory runs before PostgresNode.pm's DESTROY function, which is
what's issuing the "pg_ctl kill" command.  On Unix that's just messy,
but on Windows it fails because you can't delete a process's working
directory.  I am not sure whether this is guaranteed wrong or just
sometimes wrong; the Perl docs I can find say that destructors are run in
unspecified order once interpreter shutdown begins.  But by adding some
debug printout I was able to verify on my own machine that the data
directory was already gone when DESTROY runs.

I believe we can fix this by forcing postmaster shutdown in an END
routine instead of a DESTROY routine, and hence propose the attached
patch, which does things in the right order for me.  I'm a pretty
poor Perl programmer, so I'd appreciate somebody vetting this.

            regards, tom lane

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index cd2e974..120f3f2 100644
*** a/src/test/perl/PostgresNode.pm
--- b/src/test/perl/PostgresNode.pm
*************** sub stop
*** 662,667 ****
--- 662,668 ----
      my $pgdata = $self->data_dir;
      my $name   = $self->name;
      $mode = 'fast' unless defined $mode;
+     return unless defined $self->{_pid};
      print "### Stopping node \"$name\" using mode $mode\n";
      TestLib::system_log('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
      $self->{_pid} = undef;
*************** sub get_new_node
*** 883,896 ****
      return $node;
  }

! # Attempt automatic cleanup
! sub DESTROY
  {
!     my $self = shift;
!     my $name = $self->name;
!     return unless defined $self->{_pid};
!     print "### Signalling QUIT to $self->{_pid} for node \"$name\"\n";
!     TestLib::system_log('pg_ctl', 'kill', 'QUIT', $self->{_pid});
  }

  =pod
--- 884,896 ----
      return $node;
  }

! # Attempt automatic cleanup of all created nodes
! sub END
  {
!     foreach my $node (@all_nodes)
!     {
!         $node->teardown_node;
!     }
  }

  =pod

Re: Bogus cleanup code in PostgresNode.pm

From
Michael Paquier
Date:
On Mon, Apr 25, 2016 at 11:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I noticed that even when they are successful, buildfarm members bowerbird
> and jacana tend to spew a lot of messages like this in their bin-check
> steps:
>
> Can't remove directory
/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/global:Directory not empty at
/usr/lib/perl5/5.8/File/Temp.pmline 898
 
> Can't remove directory
/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/pg_xlog:Directory not empty at
/usr/lib/perl5/5.8/File/Temp.pmline 898
 
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata:
Permissiondenied at /usr/lib/perl5/5.8/File/Temp.pm line 898
 
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf: Directory
notempty at /usr/lib/perl5/5.8/File/Temp.pm line 898
 
> ### Signalling QUIT to 9156 for node "main"
> # Running: pg_ctl kill QUIT 9156
>
> What is happening here is that the test script is not bothering to do an
> explicit $node->stop operation, and if it doesn't, the automatic cleanup
> steps happen in the wrong order: the File::Temp destructor for the temp
> data directory runs before PostgresNode.pm's DESTROY function, which is
> what's issuing the "pg_ctl kill" command.  On Unix that's just messy,
> but on Windows it fails because you can't delete a process's working
> directory.  I am not sure whether this is guaranteed wrong or just
> sometimes wrong; the Perl docs I can find say that destructors are run in
> unspecified order once interpreter shutdown begins.  But by adding some
> debug printout I was able to verify on my own machine that the data
> directory was already gone when DESTROY runs.

The docs say regarding File::Temp that he object is removed once the
object goes out of scope in the parent:
http://search.cpan.org/~dagolden/File-Temp-0.2304/lib/File/Temp.pm
So basically it means that when we enter in PostgresNode's DESTROY the
temporary folder just "went out of scope" and has been removed?

DESTROY is run once per object, END is a global destructor, and END is
called really at the end of the execution. And actually one reason why
a DESTROY block instead of END is given by Alvaro here:
http://www.postgresql.org/message-id/20151201231121.GI2763@alvherre.pgsql
"
- I changed start/stop/restart so that they keep track of the postmaster PID; also added a DESTROY sub to PostgresNode
thatsends SIGQUIT. This means that when the test finishes, the server gets an immediate stop signal.  We were getting a
lotof errors in the server log about failing to write to the stats file otherwise, until the node noticed that the
datadirwas gone.
 
"

> I believe we can fix this by forcing postmaster shutdown in an END
> routine instead of a DESTROY routine, and hence propose the attached
> patch, which does things in the right order for me.  I'm a pretty
> poor Perl programmer, so I'd appreciate somebody vetting this.

Another, perhaps more solid approach, would be put the DESTROY method
in charge of removing PGDATA and extend TestLib::tempdir with an
argument to be able to switch to CLEANUP => 0 at will. Then we use
this argument for PGDATA after sending SIGQUIT.
-- 
Michael



Re: Bogus cleanup code in PostgresNode.pm

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Mon, Apr 25, 2016 at 11:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I believe we can fix this by forcing postmaster shutdown in an END
>> routine instead of a DESTROY routine, and hence propose the attached
>> patch, which does things in the right order for me.  I'm a pretty
>> poor Perl programmer, so I'd appreciate somebody vetting this.

> Another, perhaps more solid approach, would be put the DESTROY method
> in charge of removing PGDATA and extend TestLib::tempdir with an
> argument to be able to switch to CLEANUP => 0 at will. Then we use
> this argument for PGDATA after sending SIGQUIT.

Bearing in mind that I'm not a Perl expert --- this bothers me because
of what I read about the order of global destructor calls being
unspecified.  See http://perldoc.perl.org/perlobj.html#Destructors
specifically:
 When the last reference to an object goes away, the object is destroyed. If you only have one reference to an object
storedin a lexical scalar, the object is destroyed when that scalar goes out of scope. If you store the object in a
packageglobal, that object may not go out of scope until the program exits.
 

(the last sentence being the one that applies here) and
 The order in which objects are destroyed during the global destruction before the program exits is unpredictable.

I do not think it's a good idea to have destructors with
externally-visible effects happening in an undefined order.  The present
bug is exactly because those things are happening in the "wrong" order.
Doubling down on using the DESTROY method won't make that better.

Now, whether using END is really an improvement is a separate question.
I have the impression that END calls happen in a better-defined order,
but I'm not a perl monk.
        regards, tom lane



Re: Bogus cleanup code in PostgresNode.pm

From
Michael Paquier
Date:
On Tue, Apr 26, 2016 at 2:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now, whether using END is really an improvement is a separate question.
> I have the impression that END calls happen in a better-defined order,
> but I'm not a perl monk.

For the archive's sake, 08af9219 is the result commit.
-- 
Michael



Re: Bogus cleanup code in PostgresNode.pm

From
Alvaro Herrera
Date:
Michael Paquier wrote:
> On Tue, Apr 26, 2016 at 2:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Now, whether using END is really an improvement is a separate question.
> > I have the impression that END calls happen in a better-defined order,
> > but I'm not a perl monk.
> 
> For the archive's sake, 08af9219 is the result commit.

Thanks!

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services