Thread: [HACKERS] WIP patch for avoiding duplicate initdb runs during "make check"
Yesterday I spent a bit of time on an idea that we've talked about before, which is to not run initdb over and over again in contexts like "make check-world", or even just during "make check" in contrib or in the recovery or subscription tests. The idea would be to do it once and then copy the created data directory tree into place for each test run. Thus we might trade this: $ time initdb -D $PGDATA -N ... real 0m1.235s user 0m0.896s sys 0m0.341s for this: $ time cp -a $PGDATA junk real 0m0.146s user 0m0.008s sys 0m0.136s The attached patch is far from being committable, but it's complete enough to get an idea of what sort of overall performance gain we might expect. On my workstation, I've lately been doing check-world like this: time make -s check-world -j4 PROVE_FLAGS='-j4' and what I see is that HEAD takes about this long: real 2m1.514s user 2m8.113s sys 1m15.993s and with this patch I get real 1m42.549s user 0m42.888s sys 0m46.982s (These are median-of-3-runs; the real time bounces around a fair bit, the CPU times not very much.) On my laptop, a simple non-parallelized "time make check-world" takes real 9m12.813s user 2m15.113s sys 1m25.631s and with patch real 8m3.785s user 1m19.024s sys 1m13.707s So those are respectable gains, but not earth-shattering. I'm not sure if it's worth putting more work into it right now. The main thing that's needed to make it committable is to get rid of the system("cp -a ...") call in favor of something more portable. I looked longingly at our existing copydir() function but it seems pretty tied to the backend environment. It wouldn't be hard to make a version for frontend, but I'm not going to expend that work right now unless people are excited enough to want to commit this now rather than later (or not at all). (Other unfinished work: teaching the MSVC scripts to use this, and teaching pg_upgrade's test script to use it.) regards, tom lane diff --git a/src/Makefile.global.in b/src/Makefile.global.in index dc8a89a..a28a03a 100644 *** a/src/Makefile.global.in --- b/src/Makefile.global.in *************** ifeq ($(MAKELEVEL),0) *** 332,337 **** --- 332,338 ---- rm -rf '$(abs_top_builddir)'/tmp_install $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log2>&1 + '$(abs_top_builddir)/tmp_install$(bindir)/initdb' -D '$(abs_top_builddir)/tmp_install/proto_pgdata' --no-clean --no-sync-A trust >'$(abs_top_builddir)'/tmp_install/log/initdb.log 2>&1 endif $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_installinstall >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done) endif *************** $(if $(filter $(PORTNAME),darwin),DYLD_L *** 355,361 **** endef define with_temp_install ! PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir)) endef ifeq ($(enable_tap_tests),yes) --- 356,362 ---- endef define with_temp_install ! PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir)) PG_PROTO_DATADIR='$(abs_top_builddir)/tmp_install/proto_pgdata' endef ifeq ($(enable_tap_tests),yes) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index f4fa600..22c1a72 100644 *** a/src/test/perl/PostgresNode.pm --- b/src/test/perl/PostgresNode.pm *************** sub init *** 404,411 **** mkdir $self->backup_dir; mkdir $self->archive_dir; ! TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N', ! @{ $params{extra} }); TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata); open my $conf, '>>', "$pgdata/postgresql.conf"; --- 404,426 ---- mkdir $self->backup_dir; mkdir $self->archive_dir; ! # If we're using default initdb parameters, and the top-level "make check" ! # created a prototype data directory for us, just copy that. ! if (!defined($params{extra}) && ! defined($ENV{PG_PROTO_DATADIR}) && ! -d $ENV{PG_PROTO_DATADIR}) ! { ! rmdir($pgdata); ! RecursiveCopy::copypath($ENV{PG_PROTO_DATADIR}, $pgdata); ! chmod(0700, $pgdata); ! } ! else ! { ! TestLib::system_or_bail('initdb', '-D', $pgdata, ! '--no-clean', '--no-sync', '-A', 'trust', ! @{ $params{extra} }); ! } ! TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata); open my $conf, '>>', "$pgdata/postgresql.conf"; diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index abb742b..04d7fb9 100644 *** a/src/test/regress/pg_regress.c --- b/src/test/regress/pg_regress.c *************** regression_main(int argc, char *argv[], *** 2214,2219 **** --- 2214,2221 ---- if (temp_instance) { + char *pg_proto_datadir; + struct stat ppst; FILE *pg_conf; const char *env_wait; int wait_seconds; *************** regression_main(int argc, char *argv[], *** 2243,2262 **** if (!directory_exists(buf)) make_directory(buf); ! /* initdb */ ! header(_("initializing database system")); ! snprintf(buf, sizeof(buf), ! "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1", ! bindir ? bindir : "", ! bindir ? "/" : "", ! temp_instance, ! debug ? " --debug" : "", ! nolocale ? " --no-locale" : "", ! outputdir); ! if (system(buf)) { ! fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname,outputdir, buf); ! exit(2); } /* --- 2245,2287 ---- if (!directory_exists(buf)) make_directory(buf); ! /* see if we should run initdb or just copy a prototype datadir */ ! pg_proto_datadir = getenv("PG_PROTO_DATADIR"); ! if (!nolocale && ! pg_proto_datadir && ! stat(pg_proto_datadir, &ppst) == 0 && ! S_ISDIR(ppst.st_mode)) { ! /* copy prototype */ ! header(_("copying prototype data directory")); ! snprintf(buf, sizeof(buf), ! "cp -a \"%s\" \"%s/data\" > \"%s/log/initdb.log\" 2>&1", ! pg_proto_datadir, ! temp_instance, ! outputdir); ! if (system(buf)) ! { ! fprintf(stderr, _("\n%s: cp failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname,outputdir, buf); ! exit(2); ! } ! } ! else ! { ! /* run initdb */ ! header(_("initializing database system")); ! snprintf(buf, sizeof(buf), ! "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1", ! bindir ? bindir : "", ! bindir ? "/" : "", ! temp_instance, ! debug ? " --debug" : "", ! nolocale ? " --no-locale" : "", ! outputdir); ! if (system(buf)) ! { ! fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"),progname, outputdir, buf); ! exit(2); ! } } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during "make check"
From
Greg Stark
Date:
On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > system("cp -a ...") call in favor of something more portable. If we're ok with using Perl there's File::Copy::Recursive::dircopy() which does exactly that. -- greg
Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"
From
Michael Paquier
Date:
On Mon, Jul 3, 2017 at 9:25 PM, Greg Stark <stark@mit.edu> wrote: > On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> system("cp -a ...") call in favor of something more portable. > > If we're ok with using Perl there's File::Copy::Recursive::dircopy() > which does exactly that. This stuff needs to support perl down to 5.8.0, and that's a reason behind having src/test/perl/RecursiveCopy.pm. So I would suggest just to use that. cp is not portable on Windows as well, that's a recipe for non-portable code there. -- Michael
Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"
From
Alvaro Herrera
Date:
Tom Lane wrote: > (Other unfinished work: teaching the MSVC scripts to use this, > and teaching pg_upgrade's test script to use it.) Maybe it'd be simpler to rewrite pg_upgrade tests using PostgresNode instead? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"
From
Michael Paquier
Date:
On Mon, Jul 3, 2017 at 10:02 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Tom Lane wrote: > >> (Other unfinished work: teaching the MSVC scripts to use this, >> and teaching pg_upgrade's test script to use it.) > > Maybe it'd be simpler to rewrite pg_upgrade tests using PostgresNode > instead? You are looking for this patch: https://commitfest.postgresql.org/14/1101/ And for this thread: https://www.postgresql.org/message-id/flat/CAB7nPqRdaN1A1YNjxNL9T1jUEWct8ttqq29dNv8W_o37+e8wfA@mail.gmail.com -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Mon, Jul 3, 2017 at 9:25 PM, Greg Stark <stark@mit.edu> wrote: >> On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> system("cp -a ...") call in favor of something more portable. >> If we're ok with using Perl there's File::Copy::Recursive::dircopy() >> which does exactly that. > This stuff needs to support perl down to 5.8.0, and that's a reason > behind having src/test/perl/RecursiveCopy.pm. So I would suggest just > to use that. cp is not portable on Windows as well, that's a recipe > for non-portable code there. I can't see going this path in pg_regress, because then you would have exactly zero test functionality in a non-Perl build. What I had in mind was a frontend-friendly version of backend/storage/file/copydir.c, either just dropped into pg_regress.c or put in src/common/. regards, tom lane
Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during"make check"
From
Michael Paquier
Date:
On Mon, Jul 3, 2017 at 11:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Mon, Jul 3, 2017 at 9:25 PM, Greg Stark <stark@mit.edu> wrote: >>> On 2 July 2017 at 18:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> system("cp -a ...") call in favor of something more portable. > >>> If we're ok with using Perl there's File::Copy::Recursive::dircopy() >>> which does exactly that. > >> This stuff needs to support perl down to 5.8.0, and that's a reason >> behind having src/test/perl/RecursiveCopy.pm. So I would suggest just >> to use that. cp is not portable on Windows as well, that's a recipe >> for non-portable code there. This was under the assumption of "if we use perl" :) > I can't see going this path in pg_regress, because then you would have > exactly zero test functionality in a non-Perl build. Indeed, release tarballs don't need perl to work. So that's a no-go. > What I had in > mind was a frontend-friendly version of backend/storage/file/copydir.c, > either just dropped into pg_regress.c or put in src/common/. +1 for src/common/. -- Michael