Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc. - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Date
Msg-id CAB7nPqQN6RK=qjzHZ2na0Zd7q6q4YQ9mEdfveRXRKv8B+Ms_Ww@mail.gmail.com
Whole thread Raw
In response to Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.  (Amir Rohan <amir.rohan@zoho.com>)
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Fri, Oct 9, 2015 at 8:53 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Oct 9, 2015 at 8:47 PM, Amir Rohan wrote:
>> Ok, I've put myself down as reviewer in cfapp. I don't think I can
>> provide any more useful feedback that would actually result in changes
>> at this point, but I'll read through the entire discussion once last
>> time and write down final comments/notes. After that I have no problem
>> marking this for a committer to look at.
>
> OK. If you have any comments or remarks, please do not hesitate at all!

So, to let everybody know the issue, Amir has reported me offlist a
bug in one of the tests that can be reproduced more easily on a slow
machine:

> Amir wrote:
> Before posting the summary, I ran the latest v8 patch on today's git
> master (9c42727) and got some errors:
> t/004_timeline_switch.pl ...
> 1..1
> # ERROR:  invalid input syntax for type pg_lsn: ""
> # LINE 1: SELECT ''::pg_lsn <= pg_last_xlog_replay_location()
> #                ^
> # No tests run!

And here is my reply:
This is a timing issue and can happen when standby1, the promoted
standby which standby2 reconnects to to check that recovery works with
a timeline jump, is still in recovery after being restarted. There is
a small windows where this is possible, and this gets easier to
reproduce on slow machines (did so on a VM). So the issue was in test
004. I have updated the script to check pg_is_in_recovery() to be sure
that the node exits recovery before querying it with
pg_current_xlog_location.

It is worth noticing that the following change has saved me a lot of pain:
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -259,6 +259,7 @@ sub psql
        my ($stdout, $stderr);
        print("# Running SQL command: $sql\n");
        run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f',
'-'], '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
+       print "# Error output: $stderr\n" if $stderr ne "";
Perhaps we should consider backpatching it, it helped me find out the
issue I faced.

Attached is an updated patch fixing 004.
Regards,
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Rajeev rastogi
Date:
Subject: Dangling Client Backend Process
Next
From: Amir Rohan
Date:
Subject: Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.