Re: pg_dump new feature: exporting functions only. Bad or good idea ? - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: pg_dump new feature: exporting functions only. Bad or good idea ?
Date
Msg-id CAKFQuwZA4ch0ajw4xpHFiTCaPFwk+G-bSn6bhcAS1sT_coPg8w@mail.gmail.com
Whole thread Raw
In response to Re: pg_dump new feature: exporting functions only. Bad or good idea ?  (Laetitia Avrot <laetitia.avrot@gmail.com>)
List pgsql-hackers
On Sat, Mar 26, 2022 at 1:53 AM Laetitia Avrot <laetitia.avrot@gmail.com> wrote:
Hello all,

Le sam. 26 mars 2022 à 01:13, Michael Paquier <michael@paquier.xyz> a écrit :
On Fri, Mar 25, 2022 at 10:09:33PM +0100, Daniel Gustafsson wrote:
> Agreed.  In this case it seems that adding --exclude-extension would make sense
> to keep conistency.  I took a quick stab at doing so with the attached while
> we're here.

src/test/modules/test_pg_dump would be the best place for the addition
of a couple of tests with this new switch.  Better to check as well
what happens when a command collides with --extension and
--exclude-extension.

    printf(_("  -e, --extension=PATTERN      dump the specified extension(s) only\n"));
+   printf(_("  --exclude-extension=PATTERN  do NOT dump the specified extension(s)\n"));
Shouldn't this be listed closer to --exclude-table-data in the --help
output?

I think it's time to sum up what we want to do:

- We'd like to use switches to export objects according to a pattern.
- For each object type we will have an --object=PATTERN flag and a --exclude-object=PATTERN
- Having a short flag for each of the long flags is not mandatory
- The object types that pg_dump can select so far are:
    - table (already written)
    - schema (already written)
    - extension (half-written, --exclude-extension not written)
    - routine (TBD ASAP). Routine flag operates on stored functions, stored procedures, aggregate functions, and window functions.
- By default, pg_dump does not export system objects but we found out that we could use --table='pg_catalog.*' to export them. This is a bug and will be fixed. pg_dump won't have the ability to export any system object anymore. Should the fix belong to that patch or do I need to create a separate patch? (Seems to me it should be separated)

If everyone is ok with the points above, I'll write both patches.


That looks correct.

I would say we should make the --table change and the --exclude-extension change as separate commits.

Michael's question brought up a point that we should address.  I do not think having these (now) 4 pairs of options presented strictly alphabetically in the documentation is a good choice and we should deviate from that convention here for something more user-friendly, and to reduce the repetitiveness that comes from having basically what could be one pair of options actually implemented as 3 pairs.  My initial approach would be to move them all to a subsection after the --help parameter and before the section header for -d.  That section would be presented something like:

"""
These options allow for fine-grained control of which user objects are produced by the dump (system objects are never dumped).  If no inclusion options are specified all objects are dumped except those that are explicitly excluded.  If even one inclusion option is specified then only those objects selected for inclusion, and not excluded, will appear in the dump.

These options can appear multiple times within a single pg_dump command line. For each of these there is a mandatory pattern value, so the actual option looks like, e.g., --table='public.*', which will select all relations in the public schema. See (Patterns). When using wildcards, be careful to quote the pattern if needed to prevent the shell from expanding the wildcards.

When using these options, dependencies of the selected objects are not automatically dumped, thus making such a dump potentially unsuitable for restoration into a clean database.

This subset of options control which schemas to select objects from for an otherwise normal dump.

--schema / -n
--exclude-schema / -N

The following subset specifies which non-schema objects to include.  These are added to the objects that end up selected due to their presence in a schema.  Specifically, the --exclude-schema option is ignored while processing these options.

--table / -t
  Considers all relations, not just tables.  i.e., views, materialized views, foreign tables, and sequences.
--routine
  Considers functions, procedures, aggregates, window functions
--extension / -e
  Considers extensions

When dumping data, only local table data is dumped by default.  Specific table data can be excluded using the --exclude-table-data option.

Specifying a foreign server using --include-foreign-data will cause related foreign table data to also be dumped.

The following subset specifies which objects to exclude.  An object that matches one of these patterns will never be dumped.

--exclude-table / -T
--exclude-routine
--exclude-extension

The following options control the dumping of large objects:

-b
--blobs
Include large objects in the dump. This is the default behavior except when --schema, --table, or --schema-only is specified. The -b switch is therefore only useful to add large objects to dumps where a specific schema or table has been requested. Note that blobs are considered data and therefore will be included when --data-only is used, but not when --schema-only is.

-B
--no-blobs
Exclude large objects in the dump.

When both -b and -B are given, the behavior is to output large objects, when data is being dumped, see the -b documentation.
"""

I've kept the blob wording as-is but moved it here since it seems to fit the criteria of the section.  I don't necessarily believe the wording is the best possible.  In particular the mention of --schema-only should simply be removed.  The comment that blobs are data implies they are not dumped during --schema-only.  Then, --schema and --table should be consolidated to refer to the "inclusion options" listed immediately above.  I don't know why we didn't just error if the user specified both -b and -B at the same time.  Is it too late to change that decision?

I'm tempted to move the three section-related options to the top of this as well.  Ordering --data-only and --schema-only alphabetically also doesn't make sense to me, nor does keeping them away from the --section option.

--no-privileges is separated from the other --no-[property] options (e.g., --no-comments) due to its having an -X short option and everything with a short option being listed before things without...I would move it, and probably place all of the "-no-[property] stuff, as-is, in a subsection with a descriptive leading paragraph.  The blob options probably belong here as well.

Yes, this is a fairly radical suggestion.  But combined with some xrefs from a table of contents at the top of the page I do believe it would make for a better user experience.  We already do this for the database connection parameters section of the documentation and the same rationale exists for taking the remaining large list of options and categorizing them into meaningful units.

David J.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Pointer subtraction with a null pointer
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers