Re: pg_dump additional options for performance - Mailing list pgsql-patches

From Simon Riggs
Subject Re: pg_dump additional options for performance
Date
Msg-id 1216885031.3894.739.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: pg_dump additional options for performance  (Stephen Frost <sfrost@snowman.net>)
Responses Re: pg_dump additional options for performance
List pgsql-patches
On Wed, 2008-07-23 at 23:20 -0400, Stephen Frost wrote:

> * Simon Riggs (simon@2ndquadrant.com) wrote:
> > ...and with command line help also.
>
> The documentation and whatnot looks good to me now.  There are a couple
> of other issues I found while looking through and testing the patch
> though-

Thanks for a good review.

> Index: src/bin/pg_dump/pg_dump.c
> ===================================================================
> RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
> retrieving revision 1.497
> diff -c -r1.497 pg_dump.c
> *** src/bin/pg_dump/pg_dump.c    20 Jul 2008 18:43:30 -0000    1.497
> --- src/bin/pg_dump/pg_dump.c    23 Jul 2008 17:04:24 -0000
> ***************
> *** 225,232 ****
>       RestoreOptions *ropt;
>
>       static int    disable_triggers = 0;
> !     static int  outputNoTablespaces = 0;
>       static int    use_setsessauth = 0;
>
>       static struct option long_options[] = {
>           {"data-only", no_argument, NULL, 'a'},
> --- 229,238 ----
>       RestoreOptions *ropt;
>
>       static int    disable_triggers = 0;
> !     static int outputNoTablespaces = 0;
>       static int    use_setsessauth = 0;
> +     static int    use_schemaBeforeData;
> +     static int    use_schemaAfterData;
>
>       static struct option long_options[] = {
>           {"data-only", no_argument, NULL, 'a'},
> ***************
>
> This hunk appears to have a bit of gratuitous whitespace change, not a
> big deal tho.

OK

> ***************
> *** 464,474 ****
> [...]
> +     if (dataOnly)
> +         dumpObjFlags = REQ_DATA;
> +
> +     if (use_schemaBeforeData == 1)
> +         dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
> +
> +     if (use_schemaAfterData == 1)
> +         dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
> +
> +     if (schemaOnly)
> +         dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
> ***************
>
> It wouldn't kill to be consistant between testing for '== 1' and just
> checking for non-zero.  Again, not really a big deal, and I wouldn't
> mention these if there weren't other issues.

OK

> ***************
> *** 646,652 ****
>        * Dumping blobs is now default unless we saw an inclusion switch or -s
>        * ... but even if we did see one of these, -b turns it back on.
>        */
> !     if (include_everything && !schemaOnly)
>           outputBlobs = true;
>
>       /*
> --- 689,695 ----
>        * Dumping blobs is now default unless we saw an inclusion switch or -s
>        * ... but even if we did see one of these, -b turns it back on.
>        */
> !     if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
>           outputBlobs = true;
>
>       /*
> ***************
>
> Shouldn't this change be to "WANT_DATA(dumpObjFlags)"?  That's what most
> of the '!schemaOnly' get translated to.  Otherwise I think you would be
> getting blobs when you've asked for just schema-before-data, which
> doesn't seem like it'd make much sense.

Yes, fixed

> ***************
> *** 712,718 ****
>       dumpStdStrings(g_fout);
>
>       /* The database item is always next, unless we don't want it at all */
> !     if (include_everything && !dataOnly)
>           dumpDatabase(g_fout);
>
>       /* Now the rearrangeable objects. */
> --- 755,761 ----
>       dumpStdStrings(g_fout);
>
>       /* The database item is always next, unless we don't want it at all */
> !     if (include_everything && WANT_DATA(dumpObjFlags))
>           dumpDatabase(g_fout);
>
>       /* Now the rearrangeable objects. */
> ***************
>
> Shouldn't this be 'WANT_PRE_SCHEMA(dumpObjFlags)'?

Yes, fixed

> ***************
> *** 3414,3420 ****
>               continue;
>
>           /* Ignore indexes of tables not to be dumped */
> !         if (!tbinfo->dobj.dump)
>               continue;
>
>           if (g_verbose)
> --- 3459,3465 ----
>               continue;
>
>           /* Ignore indexes of tables not to be dumped */
> !         if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
>               continue;
>
>           if (g_verbose)
> ***************
>
> I didn't test this, but it strikes me as an unnecessary addition?  If
> anything, wouldn't this check make more sense being done right after
> dropping into getIndexes()?  No sense going through the loop just for
> fun..  Technically, it's a behavioral change for --data-only since it
> used to gather index information anyway, but it's a good optimization if
> done in the right place.

Agreed. I've just removed this. Patch not about optimising logic.

> Also around here, there doesn't appear to be any checking in
> dumpEnumType(), which strikes me as odd.  Wouldn't that deserve a
>
> if (!WANT_PRE_SCHEMA(dumpObjFlags))
>     return;
>
> check?  If not even some kind of equivilant ->dobj.dump check..

Agreed. That appears to be a bug in pg_dump, since this would currently
dump enums if --data-only was specified, which in my understanding would
be wrong.

Have included this:

    /* Skip if not to be dumped */
    if (!tinfo->dobj.dump || !WANT_BEFORE_SCHEMA(dumpObjFlags))
        return;



> ***************
> *** 9803,9809 ****
>                       tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
>       }
>
> !     if (!schemaOnly)
>       {
>           resetPQExpBuffer(query);
>           appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
> --- 9848,9854 ----
>                       tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
>       }
>
> !     if (WANT_PRE_SCHEMA(dumpObjFlags))
>       {
>           resetPQExpBuffer(query);
>           appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
> ***************
>
> This is a mistaken logic invert, which results in setval's not being
> dumped at all when pulling out each piece seperately.  It should be:
>
> if (WANT_DATA(dumpObjFlags))
>
> so that setval's are correctly included on the --data-only piece.  As
> --data-only previously existed, this would be a regression too.

OK, fixed.


> Index: src/bin/pg_dump/pg_restore.c
> ===================================================================
> RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
> retrieving revision 1.88
> diff -c -r1.88 pg_restore.c
> *** src/bin/pg_dump/pg_restore.c    13 Apr 2008 03:49:22 -0000    1.88
> --- src/bin/pg_dump/pg_restore.c    23 Jul 2008 17:06:59 -0000
> +     if (dataOnly)
> +         dumpObjFlags = REQ_DATA;
> +
> +     if (use_schemaBeforeData == 1)
> +         dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
> +
> +     if (use_schemaAfterData == 1)
> +         dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
> +
> +     if (schemaOnly)
> +         dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
> +
> ***************
>
> Ditto previous comment on this, but in pg_restore.c.

OK

>
> ***************
> *** 405,410 ****
> --- 455,462 ----
>                "                           do not restore data of tables that could not be\n"
>                "                           created\n"));
>       printf(_("  --no-tablespaces         do not dump tablespace assignments\n"));
> +     printf(_("  --schema-before-data     dump only the part of schema before table data\n"));
> +     printf(_("  --schema-after-data         dump only the part of schema after table data\n"));
>       printf(_("  --use-set-session-authorization\n"
>                "                           use SESSION AUTHORIZATION commands instead of\n"
>                "                           OWNER TO commands\n"));
> ***************
>
> Forgot to mention this on pg_dump.c, but in both pg_dump and pg_restore,
> and I hate to be the bearer of bad news, but your command-line
> documentation doesn't line up properly in the output.  You shouldn't be
> using tabs there but instead should use spaces as the other help text
> does, so everything lines up nicely.

OK

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

pgsql-patches by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] WITH RECUSIVE patches 0723
Next
From: Tom Lane
Date:
Subject: Re: pg_dump additional options for performance