Thread: [PATCH] big test separation POC

[PATCH] big test separation POC

From
Fabien COELHO
Date:
Dear hackers,

Per various discussion about the potential impact of Robins non regression 
tests, here is a poc patch to separate big tests from others.

"paralle_schedule" holds usual tests, "big_schedule" holds big tests.

The makefile derives serial_schedule, parallel_big_schedule and 
serial_big_schedule from the above descriptions.

Then: - make check in regress does usual tests - make bigcheck in regress does both usual and big tests

I'm not sure about what happens under vpath.

I have no opinion about what should be considered a big test.

-- 
Fabien.

Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
Note about the POC patch limitations/questions:
 - is deriving a schedule with a piece of shell okay?   or should perl/python/whatever scripting be better?
 - the big_schedule is assumed "sequential", i.e. one test per line.   maybe it could/should be parallel?
 - I'm not sure of the "parallel_schedule" and "big_schedule"   file names are the best possible choices.
 - I'm really not sure about VPATH stuff.
 - I do not understand why the makefile specifies $(srcdir) before   local files in some places.
 - should the "bigcheck" target be accessible from the project root?   that is should "make bigcheck" from ../../..
work?
 - the documentation is not updated, I guess something should be done   somewhere.

-- 
Fabien.



Re: [PATCH] big test separation POC

From
Andrew Dunstan
Date:
On 06/30/2013 02:54 PM, Fabien COELHO wrote:
>
> Note about the POC patch limitations/questions:
>
>  - is deriving a schedule with a piece of shell okay?
>    or should perl/python/whatever scripting be better?


I would think all we need are the results, i.e. the schedule files, plus 
some Makefile entries for them.


>
>  - the big_schedule is assumed "sequential", i.e. one test per line.
>    maybe it could/should be parallel?
>
>  - I'm not sure of the "parallel_schedule" and "big_schedule"
>    file names are the best possible choices.
>
>  - I'm really not sure about VPATH stuff.


This should be totally transparent to VPATH builds.


>
>  - I do not understand why the makefile specifies $(srcdir) before
>    local files in some places.
>

For VPATH builds :-)

>  - should the "bigcheck" target be accessible from the project root?
>    that is should "make bigcheck" from ../../.. work?
>


Yes, possibly, but it's not terribly important (for example, the 
buildfarm does "cd src/test/regress && make <testname>")

cheers

andrew






Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
>> Note about the POC patch limitations/questions:
>>
>>  - is deriving a schedule with a piece of shell okay?
>>    or should perl/python/whatever scripting be better?
>
> I would think all we need are the results, i.e. the schedule files, plus 
> some Makefile entries for them.

You can replicate data, but maintaining a set of files consistently looks 
like a bad idea to me, because it means that you have to update all 
replicated data for all changes. The current status is that there are two 
files, parallel & sequential, so it is not too bad. With big tests that 
could be 4, so it seems reasonnable to have at least some automatic 
derivation.

>>  - I'm really not sure about VPATH stuff.
>
> This should be totally transparent to VPATH builds.

Sure:-) It means that I have not tested that, so it may or may not work.

>>  - I do not understand why the makefile specifies $(srcdir) before
>>    local files in some places.
>
> For VPATH builds :-)

Hmmm. That is not what I call "transparent":-) So I understand that 
derived files should not have them, because they would be put in the build 
tree instead of the source tree.

-- 
Fabien.



Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
>>  - I do not understand why the makefile specifies $(srcdir) before
>>    local files in some places.
>
> For VPATH builds :-)

Here is a v2 which is more likely to work under VPATH.

-- 
Fabien.

Re: [PATCH] big test separation POC

From
Samrat Revagade
Date:
Hi Fabien,

On Mon, Jul 1, 2013 at 10:42 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

 - I do not understand why the makefile specifies $(srcdir) before
   local files in some places.

For VPATH builds :-)

Here is a v2 which is more likely to work under VPATH.

I really appreciate your efforts. I am reviewing your patch.

While testing patch, I found that make installcheck breaks with your patch and gives following error:

============== running regression test queries        ==============
pg_regress: could not open file "./serial_schedule" for reading: No such file or directory

looks like you forgot to add entry for serial_schedule.
 
Following sequence of commands will reproduces this error:
1)  ./configure
2) make
3) make install
4) initialize the database cluster (initdb)
5) start the server 
6) make installcheck

I will post more review comments if there are any.


-- 
Regards,

Samrat Revgade



Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
> While testing patch, I found that make installcheck breaks with your patch
> and gives following error:

Indeed, I did not put the dependency for that target, I really tested 
"check" & "bigcheck". The attached patch adds the needed dependency for 
installcheck, and I could run it. I checked that no other target seems to 
be missing a dependency...

-- 
Fabien.

Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
>> Here is a v2 which is more likely to work under VPATH.

Here is a POC v4 which relies on multiple --schedule instead of creating 
concatenated schedule files.

-- 
Fabien.

Re: [PATCH] big test separation POC

From
Andres Freund
Date:
On 2013-07-03 21:07:03 +0200, Fabien COELHO wrote:
> 
> >>Here is a v2 which is more likely to work under VPATH.
> 
> Here is a POC v4 which relies on multiple --schedule instead of creating
> concatenated schedule files.
> 
> -- 
> Fabien.

> diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
> index d5935b6..8a39f7d 100644
> --- a/src/test/regress/GNUmakefile
> +++ b/src/test/regress/GNUmakefile
> @@ -86,7 +86,7 @@ regress_data_files = \
>      $(wildcard $(srcdir)/output/*.source) \
>      $(filter-out $(addprefix $(srcdir)/,$(input_files)),$(wildcard $(srcdir)/sql/*.sql)) \
>      $(wildcard $(srcdir)/data/*.data) \
> -    $(srcdir)/parallel_schedule $(srcdir)/serial_schedule $(srcdir)/resultmap
> +    $(srcdir)/parallel_schedule $(srcdir)/parallel_big_schedule $(srcdir)/resultmap
>  
>  install-tests: all install install-lib installdirs-tests
>      $(MAKE) -C $(top_builddir)/contrib/spi install
> @@ -137,19 +137,43 @@ tablespace-setup:
>  ## Run tests
>  ##
>  
> +# installcheck vs check:
> +# - whether test is run against installed or compiled version
> +# test schedules: parallel, parallel_big, standby
> +# serial schedules can be derived from parallel schedules
> +
> +derived_schedules = serial_schedule serial_big_schedule
> +
> +serial_%: parallel_%
> +    echo "# this file is generated automatically, do not edit!" > $@
> +    egrep '^(test|ignore):' $< | \
> +    while read op list ; do \
> +      for test in $$list ; do \
> +        echo "$$op $$test" ; \
> +      done ; \
> +    done >> $@
> +

This won't work on windows all that easily. Maybe we should instead add
a "--run-serially" parameter to pg_regress?

> -installcheck: all tablespace-setup
> -    $(pg_regress_installcheck) $(REGRESS_OPTS) --schedule=$(srcdir)/serial_schedule $(EXTRA_TESTS)
> +# after installation, serial
> +installcheck: all tablespace-setup serial_schedule
> +    $(pg_regress_installcheck) $(REGRESS_OPTS) \
> +      --schedule=serial_schedule $(EXTRA_TESTS)

Why do we use the serial schedule for installcheck anyway? Just because
of max_connections?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
>> +serial_%: parallel_%
>> +    echo "# this file is generated automatically, do not edit!" > $@
>> +    egrep '^(test|ignore):' $< | \
>> +    while read op list ; do \
>> +      for test in $$list ; do \
>> +        echo "$$op $$test" ; \
>> +      done ; \
>> +    done >> $@
>> +
>
> This won't work on windows all that easily.

Hmmm. I made the assumption that a system with "gnu make" would also have 
a shell and basic unix commands available, but it is possibly quite naive.

ISTM that there are perl scripts used elsewhere for building postgresql. 
Would assuming that perl is available be admissible?

> Maybe we should instead add a "--run-serially" parameter to pg_regress?

I guess that it is quite possible and easy to do. I'm not sure whether it 
is desirable.

>> -installcheck: all tablespace-setup
>> -    $(pg_regress_installcheck) $(REGRESS_OPTS) --schedule=$(srcdir)/serial_schedule $(EXTRA_TESTS)
>> +# after installation, serial
>> +installcheck: all tablespace-setup serial_schedule
>> +    $(pg_regress_installcheck) $(REGRESS_OPTS) \
>> +      --schedule=serial_schedule $(EXTRA_TESTS)
>
> Why do we use the serial schedule for installcheck anyway? Just because
> of max_connections?

I asked myself the same question, and found no obvious answer. Maybe if 
a check is run against an installed version, it is assumed that postgresql 
is being used so the database should not be put under heavy load?

-- 
Fabien.



Re: [PATCH] big test separation POC

From
Samrat Revagade
Date:
Hi Fabien,

While applying latest version of the patch  (regress-big-v4.patch) on
latest PostgreSQL version i encountered following errors:

a) Using git:
$git apply --index regress-big-v4.patch

regress-big-v4.patch:10: trailing whitespace.
$(srcdir)/parallel_schedule $(srcdir)/parallel_big_schedule $(srcdir)/resultmap
regress-big-v4.patch:18: trailing whitespace.
# installcheck vs check:
regress-big-v4.patch:19: trailing whitespace.
# - whether test is run against installed or compiled version
regress-big-v4.patch:20: trailing whitespace.
# test schedules: parallel, parallel_big, standby
regress-big-v4.patch:21: trailing whitespace.
# serial schedules can be derived from parallel schedules
fatal: git apply: bad git-diff - expected /dev/null on line 97


b) Using patch:

$patch -d. -p1 < regress-big-v4.patch

(Stripping trailing CRs from patch.)
patching file src/test/regress/GNUmakefile
(Stripping trailing CRs from patch.)
patching file src/test/regress/parallel_big_schedule
(Stripping trailing CRs from patch.)
patching file src/test/regress/serial_schedule
Reversed (or previously applied) patch detected!  Assume -R? [n]

Is that a problem ?



Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
> While applying latest version of the patch  (regress-big-v4.patch) on
> latest PostgreSQL version i encountered following errors: [...]

> Is that a problem ?

Yes and no:-)

My understanding is that there is a conflict because of commits between 
this patch and head: a file that this patch deletes (it is derived by make 
rules) has been updated. It seems that git is not too good at detecting 
this and providing a meaningful message.

Please find attached an updated version which seemed to work for me.

Note that this is really a POC. How to derive a file is under discussion: 
it has been suggested that the unix shell approach would not work on 
Windows. I've suggested perl or python (which version?) but I'm not sure 
that it is okay either.

-- 
Fabien.

Re: [PATCH] big test separation POC

From
Alvaro Herrera
Date:
Fabien COELHO escribió:

> Note that this is really a POC. How to derive a file is under
> discussion: it has been suggested that the unix shell approach would
> not work on Windows. I've suggested perl or python (which version?)
> but I'm not sure that it is okay either.

The other option, suggested by Andres somewhere, is to have a new
parameter to pg_regress, something like --run-serially.  So you would
use the same parallel schedule file, but serially instead of following
the parallel specification.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] big test separation POC

From
Josh Berkus
Date:
On 07/11/2013 09:19 AM, Alvaro Herrera wrote:
> Fabien COELHO escribió:
>
>> Note that this is really a POC. How to derive a file is under
>> discussion: it has been suggested that the unix shell approach would
>> not work on Windows. I've suggested perl or python (which version?)
>> but I'm not sure that it is okay either.
>
> The other option, suggested by Andres somewhere, is to have a new
> parameter to pg_regress, something like --run-serially.  So you would
> use the same parallel schedule file, but serially instead of following
> the parallel specification.

Ok, this sounds like it needs a *lot* of discussion before it's a patch.Marking "returned with feedback", and we'll
discussit until September 
(or beyond).

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: [PATCH] big test separation POC

From
Fabien COELHO
Date:
> The other option, suggested by Andres somewhere, is to have a new
> parameter to pg_regress, something like --run-serially.

After looking at the source, ISTM that this option already exists under a 
different signature:
        --max-connections 1

> So you would use the same parallel schedule file, but serially instead 
> of following the parallel specification.

Yep. And there is nothing to do, which is even better:-)

-- 
Fabien.