Thread: [HACKERS] WIP patch for avoiding duplicate initdb runs during "make check"

Yesterday I spent a bit of time on an idea that we've talked about
before, which is to not run initdb over and over again in contexts like
"make check-world", or even just during "make check" in contrib or in the
recovery or subscription tests.  The idea would be to do it once and
then copy the created data directory tree into place for each test
run.  Thus we might trade this:

$ time initdb -D $PGDATA -N
...
real    0m1.235s
user    0m0.896s
sys     0m0.341s

for this:

$ time cp -a $PGDATA junk
real    0m0.146s
user    0m0.008s
sys     0m0.136s

The attached patch is far from being committable, but it's complete enough
to get an idea of what sort of overall performance gain we might expect.
On my workstation, I've lately been doing check-world like this:
    time make -s check-world -j4 PROVE_FLAGS='-j4'
and what I see is that HEAD takes about this long:
    real    2m1.514s
    user    2m8.113s
    sys     1m15.993s
and with this patch I get
    real    1m42.549s
    user    0m42.888s
    sys     0m46.982s
(These are median-of-3-runs; the real time bounces around a fair bit,
the CPU times not very much.)

On my laptop, a simple non-parallelized "time make check-world" takes
    real    9m12.813s
    user    2m15.113s
    sys     1m25.631s
and with patch
    real    8m3.785s
    user    1m19.024s
    sys     1m13.707s

So those are respectable gains, but not earth-shattering.  I'm not
sure if it's worth putting more work into it right now.  The main
thing that's needed to make it committable is to get rid of the
system("cp -a ...") call in favor of something more portable.
I looked longingly at our existing copydir() function but it seems
pretty tied to the backend environment.  It wouldn't be hard to
make a version for frontend, but I'm not going to expend that work
right now unless people are excited enough to want to commit this
now rather than later (or not at all).

(Other unfinished work: teaching the MSVC scripts to use this,
and teaching pg_upgrade's test script to use it.)

            regards, tom lane

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index dc8a89a..a28a03a 100644
*** a/src/Makefile.global.in
--- b/src/Makefile.global.in
*************** ifeq ($(MAKELEVEL),0)
*** 332,337 ****
--- 332,338 ----
      rm -rf '$(abs_top_builddir)'/tmp_install
      $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
      $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install
>'$(abs_top_builddir)'/tmp_install/log/install.log2>&1 
+     '$(abs_top_builddir)/tmp_install$(bindir)/initdb' -D '$(abs_top_builddir)/tmp_install/proto_pgdata' --no-clean
--no-sync-A trust >'$(abs_top_builddir)'/tmp_install/log/initdb.log 2>&1 
  endif
      $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra
DESTDIR='$(abs_top_builddir)'/tmp_installinstall >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done) 
  endif
*************** $(if $(filter $(PORTNAME),darwin),DYLD_L
*** 355,361 ****
  endef

  define with_temp_install
! PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call
add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir))
  endef

  ifeq ($(enable_tap_tests),yes)
--- 356,362 ----
  endef

  define with_temp_install
! PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call
add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir))
PG_PROTO_DATADIR='$(abs_top_builddir)/tmp_install/proto_pgdata'
  endef

  ifeq ($(enable_tap_tests),yes)
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index f4fa600..22c1a72 100644
*** a/src/test/perl/PostgresNode.pm
--- b/src/test/perl/PostgresNode.pm
*************** sub init
*** 404,411 ****
      mkdir $self->backup_dir;
      mkdir $self->archive_dir;

!     TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
!         @{ $params{extra} });
      TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);

      open my $conf, '>>', "$pgdata/postgresql.conf";
--- 404,426 ----
      mkdir $self->backup_dir;
      mkdir $self->archive_dir;

!     # If we're using default initdb parameters, and the top-level "make check"
!     # created a prototype data directory for us, just copy that.
!     if (!defined($params{extra}) &&
!         defined($ENV{PG_PROTO_DATADIR}) &&
!         -d $ENV{PG_PROTO_DATADIR})
!     {
!         rmdir($pgdata);
!         RecursiveCopy::copypath($ENV{PG_PROTO_DATADIR}, $pgdata);
!         chmod(0700, $pgdata);
!     }
!     else
!     {
!         TestLib::system_or_bail('initdb', '-D', $pgdata,
!                                 '--no-clean', '--no-sync', '-A', 'trust',
!                                 @{ $params{extra} });
!     }
!
      TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);

      open my $conf, '>>', "$pgdata/postgresql.conf";
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index abb742b..04d7fb9 100644
*** a/src/test/regress/pg_regress.c
--- b/src/test/regress/pg_regress.c
*************** regression_main(int argc, char *argv[],
*** 2214,2219 ****
--- 2214,2221 ----

      if (temp_instance)
      {
+         char       *pg_proto_datadir;
+         struct stat ppst;
          FILE       *pg_conf;
          const char *env_wait;
          int            wait_seconds;
*************** regression_main(int argc, char *argv[],
*** 2243,2262 ****
          if (!directory_exists(buf))
              make_directory(buf);

!         /* initdb */
!         header(_("initializing database system"));
!         snprintf(buf, sizeof(buf),
!                  "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
!                  bindir ? bindir : "",
!                  bindir ? "/" : "",
!                  temp_instance,
!                  debug ? " --debug" : "",
!                  nolocale ? " --no-locale" : "",
!                  outputdir);
!         if (system(buf))
          {
!             fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"),
progname,outputdir, buf); 
!             exit(2);
          }

          /*
--- 2245,2287 ----
          if (!directory_exists(buf))
              make_directory(buf);

!         /* see if we should run initdb or just copy a prototype datadir */
!         pg_proto_datadir = getenv("PG_PROTO_DATADIR");
!         if (!nolocale &&
!             pg_proto_datadir &&
!             stat(pg_proto_datadir, &ppst) == 0 &&
!             S_ISDIR(ppst.st_mode))
          {
!             /* copy prototype */
!             header(_("copying prototype data directory"));
!             snprintf(buf, sizeof(buf),
!                      "cp -a \"%s\" \"%s/data\" > \"%s/log/initdb.log\" 2>&1",
!                      pg_proto_datadir,
!                      temp_instance,
!                      outputdir);
!             if (system(buf))
!             {
!                 fprintf(stderr, _("\n%s: cp failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"),
progname,outputdir, buf); 
!                 exit(2);
!             }
!         }
!         else
!         {
!             /* run initdb */
!             header(_("initializing database system"));
!             snprintf(buf, sizeof(buf),
!                      "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
!                      bindir ? bindir : "",
!                      bindir ? "/" : "",
!                      temp_instance,
!                      debug ? " --debug" : "",
!                      nolocale ? " --no-locale" : "",
!                      outputdir);
!             if (system(buf))
!             {
!                 fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was:
%s\n"),progname, outputdir, buf); 
!                 exit(2);
!             }
          }

          /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> system("cp -a ...") call in favor of something more portable.

If we're ok with using Perl there's File::Copy::Recursive::dircopy()
which does exactly that.

-- 
greg



Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"

From
Michael Paquier
Date:
On Mon, Jul 3, 2017 at 9:25 PM, Greg Stark <stark@mit.edu> wrote:
> On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> system("cp -a ...") call in favor of something more portable.
>
> If we're ok with using Perl there's File::Copy::Recursive::dircopy()
> which does exactly that.

This stuff needs to support perl down to 5.8.0, and that's a reason
behind having src/test/perl/RecursiveCopy.pm. So I would suggest just
to use that. cp is not portable on Windows as well, that's a recipe
for non-portable code there.
-- 
Michael



Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"

From
Alvaro Herrera
Date:
Tom Lane wrote:

> (Other unfinished work: teaching the MSVC scripts to use this,
> and teaching pg_upgrade's test script to use it.)

Maybe it'd be simpler to rewrite pg_upgrade tests using PostgresNode
instead?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"

From
Michael Paquier
Date:
On Mon, Jul 3, 2017 at 10:02 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Tom Lane wrote:
>
>> (Other unfinished work: teaching the MSVC scripts to use this,
>> and teaching pg_upgrade's test script to use it.)
>
> Maybe it'd be simpler to rewrite pg_upgrade tests using PostgresNode
> instead?

You are looking for this patch:
https://commitfest.postgresql.org/14/1101/
And for this thread:
https://www.postgresql.org/message-id/flat/CAB7nPqRdaN1A1YNjxNL9T1jUEWct8ttqq29dNv8W_o37+e8wfA@mail.gmail.com
-- 
Michael



Michael Paquier <michael.paquier@gmail.com> writes:
> On Mon, Jul 3, 2017 at 9:25 PM, Greg Stark <stark@mit.edu> wrote:
>> On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> system("cp -a ...") call in favor of something more portable.

>> If we're ok with using Perl there's File::Copy::Recursive::dircopy()
>> which does exactly that.

> This stuff needs to support perl down to 5.8.0, and that's a reason
> behind having src/test/perl/RecursiveCopy.pm. So I would suggest just
> to use that. cp is not portable on Windows as well, that's a recipe
> for non-portable code there.

I can't see going this path in pg_regress, because then you would have
exactly zero test functionality in a non-Perl build.  What I had in
mind was a frontend-friendly version of backend/storage/file/copydir.c,
either just dropped into pg_regress.c or put in src/common/.
        regards, tom lane



Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"

From
Michael Paquier
Date:
On Mon, Jul 3, 2017 at 11:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Mon, Jul 3, 2017 at 9:25 PM, Greg Stark <stark@mit.edu> wrote:
>>> On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> system("cp -a ...") call in favor of something more portable.
>
>>> If we're ok with using Perl there's File::Copy::Recursive::dircopy()
>>> which does exactly that.
>
>> This stuff needs to support perl down to 5.8.0, and that's a reason
>> behind having src/test/perl/RecursiveCopy.pm. So I would suggest just
>> to use that. cp is not portable on Windows as well, that's a recipe
>> for non-portable code there.

This was under the assumption of "if we use perl" :)

> I can't see going this path in pg_regress, because then you would have
> exactly zero test functionality in a non-Perl build.

Indeed, release tarballs don't need perl to work. So that's a no-go.

> What I had in
> mind was a frontend-friendly version of backend/storage/file/copydir.c,
> either just dropped into pg_regress.c or put in src/common/.

+1 for src/common/.
-- 
Michael