Thread: pg_dump --with-* options

pg_dump --with-* options

From
Peter Eisentraut
Date:
I'm looking at the new in PG18 pg_dump --with-* options, and I'm having 
trouble understanding them.  (I did not look into the source code or the 
git or mailing list history for this, to try to understand it as a user.)

We have

   -a, --data-only      dump only the data, not the schema or statistics
   --no-data            do not dump data
   --with-data          dump the data  # this one is new

(and there is also --section=data), and then three analogous options for 
"schema" and "statistics".

What is the purpose of the --with-data option?  Dumping the data is the 
default.  Is this to override an earlier --no-data option?

The man page is only minimally more verbose: "Dump data. This is the 
default."  But why do you then need this option?

I think we should add some more documenting detail for these, but right 
now I don't know what it would be.




Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
> We have
> 
>   -a, --data-only      dump only the data, not the schema or statistics
>   --no-data            do not dump data
>   --with-data          dump the data  # this one is new
> 
> (and there is also --section=data), and then three analogous options for
> "schema" and "statistics".
> 
> What is the purpose of the --with-data option?  Dumping the data is the
> default.  Is this to override an earlier --no-data option?

I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.

-- 
nathan



Re: pg_dump --with-* options

From
Robert Haas
Date:
On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
> > We have
> >
> >   -a, --data-only      dump only the data, not the schema or statistics
> >   --no-data            do not dump data
> >   --with-data          dump the data  # this one is new
> >
> > (and there is also --section=data), and then three analogous options for
> > "schema" and "statistics".
> >
> > What is the purpose of the --with-data option?  Dumping the data is the
> > default.  Is this to override an earlier --no-data option?
>
> I believe the idea is that these will allow folks to be explicit about what
> they want instead of needing to understand the defaults for every
> component.

Am I too late to propose ripping this out?

I mean, if I look at pg_dump --help and there are options for
--with-broccoli and --without-mushrooms, I know that the defaults are
no brocooli, yes mushrooms, and I know which options I need to specify
to get the behavior that I want, whatever that happens to be. If all
options exist in both forms, it's a lot more confusing. Maybe there's
some issue of cross-version compatibility here that justifies this
complexity, but I don't see what it would be. I would think
--with-data has always been the default and always will be, so we just
don't need --with-data for anything. But maybe I'm confused.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_dump --with-* options

From
Fujii Masao
Date:

On 2025/06/12 22:47, Peter Eisentraut wrote:
> On 06.06.25 17:39, Nathan Bossart wrote:
>> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
>>> We have
>>>
>>>    -a, --data-only      dump only the data, not the schema or statistics
>>>    --no-data            do not dump data
>>>    --with-data          dump the data  # this one is new
>>>
>>> (and there is also --section=data), and then three analogous options for
>>> "schema" and "statistics".
>>>
>>> What is the purpose of the --with-data option?  Dumping the data is the
>>> default.  Is this to override an earlier --no-data option?
>>
>> I believe the idea is that these will allow folks to be explicit about what
>> they want instead of needing to understand the defaults for every
>> component.
>
> I get that idea, but we really need some more documentation for this, I think.  So far I could only guess how this is
supposedto be used, and I also happened to guess wrong. 
>
> My initial guess was that --with-data can override --no-data.  That would have been pretty standard "last option
wins"behavior.  But pg_dump rejects that.  Personally, I think that is kind of wrong. 
>
> But you can use --with-data to override, say, --schema-only.  That also seems kind of wrong to me, but anyway.

While testing pg_dump --with-* in relation to bug #18952 [1],
I also ran into this behavior. It was surprising,
as I expected pg_dump to reject that combination of options.
The current behavior seems confusing.

Regards,

[1] https://postgr.es/m/18952-be40a620f8b1e755@postgresql.org

--
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Thu, Jun 12, 2025 at 10:18:56AM -0400, Robert Haas wrote:
> On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
>> > What is the purpose of the --with-data option?  Dumping the data is the
>> > default.  Is this to override an earlier --no-data option?
>>
>> I believe the idea is that these will allow folks to be explicit about what
>> they want instead of needing to understand the defaults for every
>> component.
> 
> Am I too late to propose ripping this out?
> 
> I mean, if I look at pg_dump --help and there are options for
> --with-broccoli and --without-mushrooms, I know that the defaults are
> no brocooli, yes mushrooms, and I know which options I need to specify
> to get the behavior that I want, whatever that happens to be. If all
> options exist in both forms, it's a lot more confusing. Maybe there's
> some issue of cross-version compatibility here that justifies this
> complexity, but I don't see what it would be. I would think
> --with-data has always been the default and always will be, so we just
> don't need --with-data for anything. But maybe I'm confused.

If the idea is to remove all options for default behavior, we'd be removing
--no-statistics, --with-data, and --with-schema at this point.  Maybe we
could go a step further and even rip out --statistics-only (in favor of
--no-schema --no-data --with-statistics).  In general, I do think the list
of pg_dump options is pretty unwieldy at this point.

-- 
nathan



Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote:
> My initial guess was that --with-data can override --no-data.  That
> would have been pretty standard "last option wins" behavior.  But
> pg_dump rejects that.  Personally, I think that is kind of wrong.

Do we have other options that are order-sensitive?

> But in any case, if you want that level of precision, wouldn't it
> make
> more sense to use the --section option?

That's not possible with statistics, because some appear in
SECTION_DATA and some in SECTION_POST_DATA (e.g. stats on indexes,
which are in SECTION_POST_DATA).


Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> If the idea is to remove all options for default behavior, we'd be
> removing
> --no-statistics, --with-data, and --with-schema at this point.

That's OK with me.

>   Maybe we
> could go a step further and even rip out --statistics-only (in favor
> of
> --no-schema --no-data --with-statistics).

I'd probably keep --statistics-only.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Thu, Jun 12, 2025 at 08:58:15AM -0700, Jeff Davis wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
>> If the idea is to remove all options for default behavior, we'd be
>> removing
>> --no-statistics, --with-data, and --with-schema at this point.
> 
> That's OK with me.
> 
>>   Maybe we
>> could go a step further and even rip out --statistics-only (in favor
>> of
>> --no-schema --no-data --with-statistics).
> 
> I'd probably keep --statistics-only.

WFM

-- 
nathan



Re: pg_dump --with-* options

From
Robert Haas
Date:
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
> That's OK with me.

Same.

> >   Maybe we
> > could go a step further and even rip out --statistics-only (in favor
> > of
> > --no-schema --no-data --with-statistics).
>
> I'd probably keep --statistics-only.

I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_dump --with-* options

From
Laurenz Albe
Date:
On Thu, 2025-06-12 at 13:36 -0400, Robert Haas wrote:
> On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > > If the idea is to remove all options for default behavior, we'd be
> > > removing
> > > --no-statistics, --with-data, and --with-schema at this point.
> >
> > That's OK with me.
>
> Same.

I must be missing something, but I think --no-statistics is sorely needed.
How else can I get the effect of

  pg_dump --no-statistics mydb

Yours,
Laurenz Albe



Re: pg_dump --with-* options

From
Corey Huinker
Date:

On Thu, Jun 12, 2025 at 1:36 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
> That's OK with me.

Same.

> >   Maybe we
> > could go a step further and even rip out --statistics-only (in favor
> > of
> > --no-schema --no-data --with-statistics).
>
> I'd probably keep --statistics-only.

I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.

The use case for --statistics-only is to extract the existing statistics for the tables and indexes that are involved in a given query that is giving you problems, allowing you to apply those statistics to an existing QA/dev database and tweak them without further impacting operations on the production database. I think this will prove to be very useful, and having a --statistics-only flag conveys the clear intent of "I want the stats, and only the stats", 

 

--
Robert Haas
EDB: http://www.enterprisedb.com


If we're hot to remove options, how about we remove the sections flags? Their utility is reliant upon the user understanding exactly which things go in which section, and further assumes that everything deterministically goes in exactly one section, which is no longer the case as Jeff pointed out recently. They have outlived their usefulness.

If we have the full complement of -no-something flags, and the three -only flags, we wouldn't need the --with-something flags. That would mean making statistics export the default on dumps, which I think it should be anyway, because there's nothing else that we don't dump by default, and while it might seem strange to have them by default now, NOT having them by default will feel very strange a few years down the road.
 

Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:
> The use case for --statistics-only is to extract the existing statistics
> for the tables and indexes that are involved in a given query that is
> giving you problems, allowing you to apply those statistics to an existing
> QA/dev database and tweak them without further impacting operations on the
> production database. I think this will prove to be very useful, and having
> a --statistics-only flag conveys the clear intent of "I want the stats, and
> only the stats",

I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data.  It seems like we're trying
to avoid having multiple ways to do the same thing.

> If we're hot to remove options, how about we remove the sections flags?
> Their utility is reliant upon the user understanding exactly which things
> go in which section, and further assumes that everything deterministically
> goes in exactly one section, which is no longer the case as Jeff
> pointed out recently. They have outlived their usefulness.

I almost brought this up earlier as something else we could potentially
trim.  That's v19 material at this point, though.

-- 
nathan



Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Thu, Jun 12, 2025 at 04:39:00PM -0400, Corey Huinker wrote:
> On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> I do think this is useful functionality, I only suggested removing it
>> because AFAICT it is redundant, i.e., you can accomplish the same thing
>> with --with-statistics --no-schema --no-data.  It seems like we're trying
>> to avoid having multiple ways to do the same thing.
> 
> By that same argument, we should remove --schema-only and --data-only as
> well. I think we shouldn't because those two options have proved very
> convenient for users and they convey clear intent to the person reading the
> script, and I believe that --statistics-only will prove the same over time.

Those predate v18, so while might be able to mark them deprecated, I doubt
we'd remove them anytime soon.  

FWIW I don't have a tremendously strong opinion about --statistics-only.
I'd probably vote to remove it because 1) it's redundant, 2) once you add
an option, it's hard to remove it, and 3) pg_dump already has so many
options.  But I won't cry too hard if we keep it around.

-- 
nathan



Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote:
> FWIW I don't have a tremendously strong opinion about --statistics-
> only.

Same here. I won't cast a vote on this particular issue, as long as the
functionality is available.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote:
> > Do we have other options that are order-sensitive?
>
> I think most of them are.  For example:
>
> psql -p 5432 -p 5433
> initdb --data-checksums --no-data-checksums
> postgres --shared-buffers=1GB --shared-buffers=2GB

Interesting. I don't think the "last option wins" model applies to
other pg_dump options, though. For instance, in PG17:

  pg_dump --data-only --schema-only
  pg_dump: error: options -s/--schema-only and -a/--data-only cannot be
used together

I don't think it's simple to start using "last option wins" behavior
now. There are probably some combinations of options where it's not
clear whether a later option is an extra constraint or will override a
previous option.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Peter Eisentraut
Date:
On 12.06.25 23:20, Jeff Davis wrote:
> On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote:
>>> Do we have other options that are order-sensitive?
>>
>> I think most of them are.  For example:
>>
>> psql -p 5432 -p 5433
>> initdb --data-checksums --no-data-checksums
>> postgres --shared-buffers=1GB --shared-buffers=2GB
> 
> Interesting. I don't think the "last option wins" model applies to
> other pg_dump options, though. For instance, in PG17:
> 
>    pg_dump --data-only --schema-only
>    pg_dump: error: options -s/--schema-only and -a/--data-only cannot be
> used together
> 
> I don't think it's simple to start using "last option wins" behavior
> now. There are probably some combinations of options where it's not
> clear whether a later option is an extra constraint or will override a
> previous option.

It makes sense to raise an error if the specified options cannot be 
consolidated in an obvious way.  I'd expect

pg_recvlogical --create-slot --drop-slot

to fail, but I'd expect

pg_recvlogical --create-slot --slot=foo --slot=bar

to work.

One of the challenges in the current case is that it is not obvious how 
--with-data, --no-data, --data-only etc. are connected.  If that were 
clearer, then the way these options should combine or conflict would 
hopefully follow somewhat naturally.




Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Fri, 2025-06-13 at 07:22 +0200, Peter Eisentraut wrote:
> > I don't think it's simple to start using "last option wins"
> > behavior
> > now ...
> It makes sense to raise an error if the specified options cannot be
> consolidated in an obvious way.

To me, "last option wins" means that you don't raise an error; the
latter option simply overrides the earlier one.

Given that the pg_dump options are not order-sensitive now (unless I'm
missing something), I'm worried about the consequences of trying to
make them so now.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote:

> By the way, if we keep --with-statistics in pg_dump, are we planning
> to
> continue using the --with-xxx naming pattern for new options that
> specify extra data to dump?

Good point. Now that we are getting rid of some of the other options,
we don't need to worry about consistency with them, and I think we
should just use "--statistics".

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Corey Huinker
Date:
One of the challenges in the current case is that it is not obvious how
--with-data, --no-data, --data-only etc. are connected.  If that were
clearer, then the way these options should combine or conflict would
hopefully follow somewhat naturally.

They all should be mutually exclusive, and usage of any two of them should raise an error, hence order not mattering.

Re: pg_dump --with-* options

From
Fujii Masao
Date:

On 2025/06/14 5:32, Nathan Bossart wrote:
> On Fri, Jun 13, 2025 at 08:58:04AM -0700, Jeff Davis wrote:
>> On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote:
>>> By the way, if we keep --with-statistics in pg_dump, are we planning
>>> to
>>> continue using the --with-xxx naming pattern for new options that
>>> specify extra data to dump?
>>
>> Good point. Now that we are getting rid of some of the other options,
>> we don't need to worry about consistency with them, and I think we
>> should just use "--statistics".
> 
> +1

+1

I noticed that --statistics (i.e., the current --with-statistics) causes
statistics to be dumped even when used with --data-only or --schema-only.
So, as far as I understand, here are the possible combinations of dump
targets and options:

  schema, data, stats:       --statistics
  schema, data:              (default)
  schema, stats:             --schema-only --statistics
  data, stats:               --data-only --statistics
  schema only:               --schema-only
  data only:                 --data-only
  stats only:                --statistics-only

This makes me wonder if --no-data and --no-schema are still necessary.
They were also introduced in v18, but might now be redundant. If so,
should we consider removing them?

If we do keep them, we could also use --no-schema --statistics to
dump data and statistics, but I find --data-only --statistics more intuitive.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

From
Fujii Masao
Date:

On 2025/06/17 9:58, Nathan Bossart wrote:
> On Mon, Jun 16, 2025 at 07:09:17PM -0400, Tom Lane wrote:
>> I find myself increasingly persuaded by Corey's point of view ...
> 
> +1

Can you clarify how using on-by-default would simplify things?
I'm not sure it actually makes the options simpler.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Mon, 2025-06-16 at 15:35 -0400, Corey Huinker wrote:
>
> I think this is the exact sort of confusion caused by having two of
> the three types default to on in all circumstances, and one default
> to off in one special circumstance.

That's certainly a part of the confusion, but the "--x-only" options
also put us in a tough spot.

If --data-only had always been spelled "--no-schema" (or "--without-
data" or whatever), and --schema-only had always been spelled "--no-
data", then I think it would be a lot easier to add statistics into the
mix.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Thu, 2025-06-12 at 08:58 -0700, Jeff Davis wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
> That's OK with me.

Actually, I take that back, we can't just remove --no-statistics.
Remember that statistics currently default to "on" for pg_restore even
though they default "off" for pg_dump.

So pg_restore still needs a way to turn stats off.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Wed, Jun 18, 2025 at 08:29:16AM -0700, Jeff Davis wrote:
> Actually, I take that back, we can't just remove --no-statistics.
> Remember that statistics currently default to "on" for pg_restore even
> though they default "off" for pg_dump.
> 
> So pg_restore still needs a way to turn stats off.

IIUC the current proposal is to:

* Dump/restore stats by default.
* Keep the --no-statistics, --no-schema, and --no-data options.
* Keep the --statistics-only, --schema-only, and --data-only options.
* Remove the --with-statistics, --with-schema, and --with-data options.

How does that sound?

-- 
nathan



Re: pg_dump --with-* options

From
Jeff Davis
Date:
On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
> IIUC the current proposal is to:
>
> * Dump/restore stats by default.

IIUC some people still object to this. Turning stats off by default was
on the Open Items list. At this point I think we need a pretty strong
consensus to override that and I'm not sure we have one right now.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Wed, Jun 18, 2025 at 09:53:01AM -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
>> IIUC the current proposal is to:
>> 
>> * Dump/restore stats by default.
> 
> IIUC some people still object to this. Turning stats off by default was
> on the Open Items list. At this point I think we need a pretty strong
> consensus to override that and I'm not sure we have one right now.

Okay, so I see two main choices on the table:

1) Turn on stats by default in pg_dump.  Keep --no-* flags and --*-only
flags, and remove the --with-* flags.

2) Keep stats off by default in pg_dump.  Keep --no-{schema,data} flags and
--*-only flags, remove --no-statistics and --with-{schema,data}, and rename
--with-statistics to --statistics.

Is that an accurate summary?

-- 
nathan



Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:
> I had thought we had a consensus that pg_upgrade should preserve stats
> but regularly pg_dump shouldn't include them; perhaps I misunderstood
> or that changed.

I think it's a bit of both.  I skimmed through the past discussions and
found that not only was there a rough consensus in 2024 that stats _should_
be on by default [0], but also that an updated vote tally didn't show much
of a consensus at all [1].  Like you, I thought we had pretty much closed
that door, but the aforementioned analysis along with further discussion
has me convinced that we might want to reconsider [2].

[0] https://postgr.es/m/e16cd9caf4f5229a152d318d70b4d323a03e3539.camel@j-davis.com
[1] https://postgr.es/m/aFCIB1AwXuNzxHXX%40nathan
[2] https://postgr.es/m/aFC9rWSeFz7c07uI%40nathan

-- 
nathan



Re: pg_dump --with-* options

From
Fujii Masao
Date:

On 2025/06/25 5:07, Robert Haas wrote:
> On Tue, Jun 24, 2025 at 12:48 PM Nathan Bossart
> <nathandbossart@gmail.com> wrote:
>> On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:
>>> I had thought we had a consensus that pg_upgrade should preserve stats
>>> but regularly pg_dump shouldn't include them; perhaps I misunderstood
>>> or that changed.
>>
>> I think it's a bit of both.  I skimmed through the past discussions and
>> found that not only was there a rough consensus in 2024 that stats _should_
>> be on by default [0], but also that an updated vote tally didn't show much
>> of a consensus at all [1].  Like you, I thought we had pretty much closed
>> that door, but the aforementioned analysis along with further discussion
>> has me convinced that we might want to reconsider [2].
> 
> Well, I don't know: I still think that's the right answer, so I don't
> really want to reconsider, but I understand that I'm not in charge
> here.

For the record, my vote is: default "off" for pg_dump and pg_dumpall,
and "on" for pg_restore.

For pg_dump and pg_dumpall, I agree with Jeff's idea in [1],
but if the statistics is skipped by default, I don't think
we need a --no-statistics option. So, here's how I think
the options should work:

     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, -and -statistics
     * Remove: --no-statistics, --with-schema, and --with-data
     * Combinations:
         Schema + Data + Stats     : --statistics
         Schema + Data             : (default)
         Schema + Stats            : --no-data --statistics
         Data + Stats              : --no-schema --statistics
         Schema only               : --schema-only   (or --no-data)
         Data only                 : --data-only     (or --no-schema)
         Stats only                : --statistics-only (or --no-schema --no-data --statistics)

As I mentioned in [2], if we treat --statistics in the similar way to
--sequence-data, i.e., allow --statistics to be used with --schema-only
or --data-only, we could simplify further:

     * Keep: --schema-only, --data-only, --statistics-only, and --statistics
     * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data
     * Combinations:
         Schema + Data + Stats     : --statistics
         Schema + Data             : (default)
         Schema + Stats            : --schema-only --statistics
         Data + Stats              : --data-only --statistics
         Schema only               : --schema-only
         Data only                 : --data-only
         Stats only                : --statistics-only

Some may find this confusing due to mixing --statistics with --schema-only
or --data-only, so I understand if there's hesitation.

For pg_restore, I believe there's agreement to restore statistics
by default if they exist in the archive. So:

     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, and --no-statistics
     * Remove: --with-schema, --with-data, and --statistics
     * Combinations:
         Schema + Data + Stats     : (default)
         Schema + Data             : --no-statistics
         Schema + Stats            : --no-data
         Data + Stats              : --no-schema
         Schema only               : --schema-only   (or --no-data --no-statistics)
         Data only                 : --data-only     (or --no-schema --no-statistics)
         Stats only                : --statistics-only (or --no-schema --no-data)

Thought?

Regards,

[1] https://postgr.es/m/031558c60e84362898922caa6a90587e7fdf2a57.camel@j-davis.com
[2] https://postgr.es/m/94f89b0a-5d83-4a67-9092-50ba3913441c@oss.nttdata.com

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

From
Nathan Bossart
Date:
On Wed, Jun 25, 2025 at 08:18:28AM +0900, Fujii Masao wrote:
> For pg_dump and pg_dumpall, I agree with Jeff's idea in [1],
> but if the statistics is skipped by default, I don't think
> we need a --no-statistics option. So, here's how I think
> the options should work:
> 
>     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, -and -statistics
>     * Remove: --no-statistics, --with-schema, and --with-data
>     * Combinations:
>         Schema + Data + Stats     : --statistics
>         Schema + Data             : (default)
>         Schema + Stats            : --no-data --statistics
>         Data + Stats              : --no-schema --statistics
>         Schema only               : --schema-only   (or --no-data)
>         Data only                 : --data-only     (or --no-schema)
>         Stats only                : --statistics-only (or --no-schema --no-data --statistics)

I believe this is equivalent to the second option I proposed upthread [0].
Jeff proposed a variation of this option that keeps --no-statistics around
so that we could more easily change the default for stats down the road
[1].

> As I mentioned in [2], if we treat --statistics in the similar way to
> --sequence-data, i.e., allow --statistics to be used with --schema-only
> or --data-only, we could simplify further:
> 
>     * Keep: --schema-only, --data-only, --statistics-only, and --statistics
>     * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data
>     * Combinations:
>         Schema + Data + Stats     : --statistics
>         Schema + Data             : (default)
>         Schema + Stats            : --schema-only --statistics
>         Data + Stats              : --data-only --statistics
>         Schema only               : --schema-only
>         Data only                 : --data-only
>         Stats only                : --statistics-only
> 
> Some may find this confusing due to mixing --statistics with --schema-only
> or --data-only, so I understand if there's hesitation.

Hm.  I didn't really intend for --sequence-data to set a precedent here.
That's mostly intended as a submode for --binary-upgrade.  Perhaps we
should consider removing it as a documented option and instead convert it
to --binary-upgrade=sequence-data or something.  In any case, allowing
"only" options to be used in conjunction with --statistics seems a little
confusing to me.  But I'm not strongly opposed to the idea.

> For pg_restore, I believe there's agreement to restore statistics
> by default if they exist in the archive. So:
> 
>     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, and --no-statistics
>     * Remove: --with-schema, --with-data, and --statistics
>     * Combinations:
>         Schema + Data + Stats     : (default)
>         Schema + Data             : --no-statistics
>         Schema + Stats            : --no-data
>         Data + Stats              : --no-schema
>         Schema only               : --schema-only   (or --no-data --no-statistics)
>         Data only                 : --data-only     (or --no-schema --no-statistics)
>         Stats only                : --statistics-only (or --no-schema --no-data)

+1

[0] https://postgr.es/m/aFLxvrh71VWqdL9A%40nathan
[1] https://postgr.es/m/031558c60e84362898922caa6a90587e7fdf2a57.camel%40j-davis.com

-- 
nathan



Re: pg_dump --with-* options

From
Greg Sabino Mullane
Date:
On Wed, Jun 25, 2025 at 10:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> This is so close to ideal. It's just that the first bullet should be "off by default" :)

If we did that, the only way to dump statistics would be --statistics-only, right?  You wouldn't be able to include statistics along with other
things.

Oh, right, would also need a --statistics
 
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support