Thread: Extensions not dumped when --schema is used

Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Hello,

I've discovered something today that I didn't really expect. When a user dumps a database with the --schema flag of pg_dump, extensions in this schema aren't dumped. As far as I can tell, the documentation isn't clear about this ("Dump only schemas matching pattern; this selects both the schema itself, and all its contained objects."), though the source code definitely is ("We dump all user-added extensions by default, or none of them if include_everything is false (i.e., a --schema or --table switch was given).", in pg_dump.c).

I was wondering the reason behind this choice. If anyone knows, I'd be happy to hear about it.

I see two things:
* it's been overlooked, and we should dump all the extensions available in a schema if this schema has been selected through the --schema flag.
* it's kind of like the large objects handling, and I'd pretty interested in adding a --extensions (as the same way there is a --blobs flag).

Thanks.

Regards.


--
Guillaume.

Re: Extensions not dumped when --schema is used

From
Laurenz Albe
Date:
On Wed, 2020-05-20 at 10:06 +0200, Guillaume Lelarge wrote:
> I've discovered something today that I didn't really expect.
> When a user dumps a database with the --schema flag of pg_dump,
> extensions in this schema aren't dumped. As far as I can tell,
> the documentation isn't clear about this ("Dump only schemas
> matching pattern; this selects both the schema itself, and all
> its contained objects."), though the source code definitely is
> ("We dump all user-added extensions by default, or none of them
> if include_everything is false (i.e., a --schema or --table
> switch was given).", in pg_dump.c).
> 
> I was wondering the reason behind this choice. If anyone knows,
> I'd be happy to hear about it.
> 
> I see two things:
> * it's been overlooked, and we should dump all the extensions
>   available in a schema if this schema has been selected through
>   the --schema flag.
> * it's kind of like the large objects handling, and I'd pretty
>   interested in adding a --extensions (as the same way there is a
>   --blobs flag).

I am not sure if there is a good reason for the current behavior,
but I'd favor the second solution.  I think as extensions as belonging
to the database rather than the schema; the schema is just where the
objects are housed.

Yours,
Laurenz Albe




Re: Extensions not dumped when --schema is used

From
Daniel Gustafsson
Date:
> On 20 May 2020, at 10:06, Guillaume Lelarge <guillaume@lelarge.info> wrote:

> I was wondering the reason behind this choice. If anyone knows, I'd be happy to hear about it.

Extensions were dumped unconditionally in the beginning, but it was changed to
match how procedural language definitions were dumped.

> * it's been overlooked, and we should dump all the extensions available in a schema if this schema has been selected
throughthe --schema flag. 

For reference, --schema-only will include all the extensions, but not
--schema=foo and not "--schema-only --schema=foo".

Extensions don't belong to a schema per se, the namespace oid in pg_extension
marks where most of the objects are contained but not necessarily all of them.
Given that, it makes sense to not include extensions for --schema.  However,
that can be considered sort of an implementation detail which may not be
entirely obvious to users (especially since you yourself is a power-user).

> * it's kind of like the large objects handling, and I'd pretty interested in adding a --extensions (as the same way
thereis a --blobs flag). 

An object in a schema might have attributes that depend on an extension in
order to restore, so it makes sense to provide a way to include them for a
--schema dump.  The question is what --extensions should do: only dump any
extensions that objects in the schema depend on; require a pattern and only
dump matching extensions; dump all extensions (probably not) or something else?

cheers ./daniel


Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit :
> On 20 May 2020, at 10:06, Guillaume Lelarge <guillaume@lelarge.info> wrote:

> I was wondering the reason behind this choice. If anyone knows, I'd be happy to hear about it.

Extensions were dumped unconditionally in the beginning, but it was changed to
match how procedural language definitions were dumped.


That makes sense. Thank you.

> * it's been overlooked, and we should dump all the extensions available in a schema if this schema has been selected through the --schema flag.

For reference, --schema-only will include all the extensions, but not
--schema=foo and not "--schema-only --schema=foo".


Yes.

Extensions don't belong to a schema per se, the namespace oid in pg_extension
marks where most of the objects are contained but not necessarily all of them.
Given that, it makes sense to not include extensions for --schema.  However,
that can be considered sort of an implementation detail which may not be
entirely obvious to users (especially since you yourself is a power-user).


I agree.

> * it's kind of like the large objects handling, and I'd pretty interested in adding a --extensions (as the same way there is a --blobs flag).

An object in a schema might have attributes that depend on an extension in
order to restore, so it makes sense to provide a way to include them for a
--schema dump.

That's actually how I figured this out. A customer can't restore his dump because of a missing extension (pg_trgm to be precise).

  The question is what --extensions should do: only dump any
extensions that objects in the schema depend on; require a pattern and only
dump matching extensions; dump all extensions (probably not) or something else?


Actually, "dump all extensions" (#3) would make sense to me, and has my vote. Otherwise, and though it would imply much more work, "only dump any extension that objects in the schema depend on" (#1) comes second in my opinion. Using the pattern means something more manual for users, and I really think it would be a bad idea. People dump databases, schemas, and tables. Theu usually don't know which extensions those objects depend on. But, anyway, I would work on any of these solutions, depending on what most people think is best.

Thanks.


--
Guillaume.

Re: Extensions not dumped when --schema is used

From
Daniel Gustafsson
Date:
> On 20 May 2020, at 11:38, Guillaume Lelarge <guillaume@lelarge.info> wrote:

> Actually, "dump all extensions" (#3) would make sense to me, and has my vote.

Wouldn't that open for another set of problems when running with --schema=bar
and getting errors on restoring for relocatable extensions like these:

    CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA foo;

Only dumping extensions depended on has the same problem of course.

> People dump databases, schemas, and tables. Theu usually don't know which extensions those objects depend on.

That I totally agree with, question is how we best can help users here.

cheers ./daniel


Re: Extensions not dumped when --schema is used

From
Tom Lane
Date:
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit :
>>  The question is what --extensions should do: only dump any
>> extensions that objects in the schema depend on; require a pattern and only
>> dump matching extensions; dump all extensions (probably not) or something
>> else?

> Actually, "dump all extensions" (#3) would make sense to me, and has my
> vote.

I think that makes no sense at all.  By definition, a dump that's been
restricted with --schema, --table, or any similar switch is incomplete
and may not restore on its own.  Typical examples include foreign key
references to tables in other schemas, views using functions in other
schemas, etc etc.  I see no reason for extension dependencies to be
treated differently from those cases.

In any use of selective dump, it's the user's job to select a set of
objects that she wants dumped (or restored).  Trying to second-guess that
is mostly going to make the feature less usable for power-user cases.

As a counterexample, what if you want the dump to be restorable on a
system that doesn't have all of the extensions available on the source?
You carefully pick out the tables that you need, which don't require the
unavailable extensions ... and then pg_dump decides you don't know what
you're doing and includes all the problematic extensions anyway.

I could get behind an "--extensions=PATTERN" switch to allow selective
addition of extensions to a selective dump, but I don't want to see us
overruling the user's choices about what to dump.

            regards, tom lane



Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Le mer. 20 mai 2020 à 16:39, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit :
>>  The question is what --extensions should do: only dump any
>> extensions that objects in the schema depend on; require a pattern and only
>> dump matching extensions; dump all extensions (probably not) or something
>> else?

> Actually, "dump all extensions" (#3) would make sense to me, and has my
> vote.

I think that makes no sense at all.  By definition, a dump that's been
restricted with --schema, --table, or any similar switch is incomplete
and may not restore on its own.  Typical examples include foreign key
references to tables in other schemas, views using functions in other
schemas, etc etc.  I see no reason for extension dependencies to be
treated differently from those cases.


Agreed.

In any use of selective dump, it's the user's job to select a set of
objects that she wants dumped (or restored).  Trying to second-guess that
is mostly going to make the feature less usable for power-user cases.


Agreed, though right now he has no way to do this for extensions.

As a counterexample, what if you want the dump to be restorable on a
system that doesn't have all of the extensions available on the source?
You carefully pick out the tables that you need, which don't require the
unavailable extensions ... and then pg_dump decides you don't know what
you're doing and includes all the problematic extensions anyway.


That's true.

I could get behind an "--extensions=PATTERN" switch to allow selective
addition of extensions to a selective dump, but I don't want to see us
overruling the user's choices about what to dump.


With all your comments, I can only agree to your views. I'll try to work on this anytime soon.


--
Guillaume.

Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Hi,

Le sam. 23 mai 2020 à 14:53, Guillaume Lelarge <guillaume@lelarge.info> a écrit :
Le mer. 20 mai 2020 à 16:39, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit :
>>  The question is what --extensions should do: only dump any
>> extensions that objects in the schema depend on; require a pattern and only
>> dump matching extensions; dump all extensions (probably not) or something
>> else?

> Actually, "dump all extensions" (#3) would make sense to me, and has my
> vote.

I think that makes no sense at all.  By definition, a dump that's been
restricted with --schema, --table, or any similar switch is incomplete
and may not restore on its own.  Typical examples include foreign key
references to tables in other schemas, views using functions in other
schemas, etc etc.  I see no reason for extension dependencies to be
treated differently from those cases.


Agreed.

In any use of selective dump, it's the user's job to select a set of
objects that she wants dumped (or restored).  Trying to second-guess that
is mostly going to make the feature less usable for power-user cases.


Agreed, though right now he has no way to do this for extensions.

As a counterexample, what if you want the dump to be restorable on a
system that doesn't have all of the extensions available on the source?
You carefully pick out the tables that you need, which don't require the
unavailable extensions ... and then pg_dump decides you don't know what
you're doing and includes all the problematic extensions anyway.


That's true.

I could get behind an "--extensions=PATTERN" switch to allow selective
addition of extensions to a selective dump, but I don't want to see us
overruling the user's choices about what to dump.


With all your comments, I can only agree to your views. I'll try to work on this anytime soon.


"Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.

I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit fest (2021-03), right?


--
Guillaume.
Attachment

Re: Extensions not dumped when --schema is used

From
Julien Rouhaud
Date:
On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet,
Itook a shot at it. See attached patch.
 

Great!

I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible.  I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.

> I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit
fest(2021-03), right?
 

Correct, and please add it on the commit fest!



Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :
On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.

Great!

I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible.  I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.


I tried that all afternoon yesterday, but failed to do so. My had still hurts, but I'll try again though it may take some time.

> I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit fest (2021-03), right?

Correct, and please add it on the commit fest!



--
Guillaume.

Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Le mar. 26 janv. 2021 à 13:41, Guillaume Lelarge <guillaume@lelarge.info> a écrit :
Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :
On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.

Great!

I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible.  I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.


I tried that all afternoon yesterday, but failed to do so. My had still hurts, but I'll try again though it may take some time.


s/My had/My head/ ..

> I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit fest (2021-03), right?

Correct, and please add it on the commit fest!



 
--
Guillaume.

Re: Extensions not dumped when --schema is used

From
Asif Rehman
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

The patch applies cleanly and looks fine to me. However consider this scenario.

- CREATE SCHEMA foo;
- CREATE EXTENSION file_fdw WITH SCHEMA foo;
- pg_dump  --file=/tmp/test.sql --exclude-schema=foo postgres

This will still include the extension 'file_fdw' in the backup script. Shouldn't it be excluded as well?

The new status of this patch is: Waiting on Author

Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Hi,

Thanks for the review.

Le mer. 3 févr. 2021 à 18:33, Asif Rehman <asifr.rehman@gmail.com> a écrit :
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

The patch applies cleanly and looks fine to me. However consider this scenario.

- CREATE SCHEMA foo;
- CREATE EXTENSION file_fdw WITH SCHEMA foo;
- pg_dump  --file=/tmp/test.sql --exclude-schema=foo postgres

This will still include the extension 'file_fdw' in the backup script. Shouldn't it be excluded as well?

The new status of this patch is: Waiting on Author

This behaviour is already there without my patch, and I think it's a valid behaviour. An extension doesn't belong to a schema. Its objects do, but the extension doesn't.


--
Guillaume.

Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Le mar. 26 janv. 2021 à 13:42, Guillaume Lelarge <guillaume@lelarge.info> a écrit :
Le mar. 26 janv. 2021 à 13:41, Guillaume Lelarge <guillaume@lelarge.info> a écrit :
Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :
On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.

Great!

I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible.  I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.


I tried that all afternoon yesterday, but failed to do so. My had still hurts, but I'll try again though it may take some time.


s/My had/My head/ ..


I finally managed to get a working TAP test for my patch. I have no idea if it's good, and if it's enough. Anyway, new version of the patch attached.


--
Guillaume.
Attachment

Re: Extensions not dumped when --schema is used

From
Michael Paquier
Date:
On Mon, Mar 15, 2021 at 11:37:02AM +0100, Guillaume Lelarge wrote:
> Anyways. Yeah, I know we're near feature freeze. This feature would be nice
> to have, but I don't feel strongly about it. I think this feature is
> currently lacking in PostgreSQL but I don't much care if it makes it to 14
> or any future release. If you have time to work on the pg_dump test suite
> and are interested, then sure, go ahead, I'm fine with this. Otherwise I'll
> do it in a few weeks and if it means it'll land in v15, then be it. That's
> not an issue in itself.

Okay.  So I have looked at that stuff in details, and after fixing
all the issues reported upthread in the code, docs and tests I am
finishing with the attached.  The tests have been moved out of
src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both
positive and negative tests (used the trick with plpgsql for the
latter to avoid the dump of the extension test_pg_dump or any data
related to it).
--
Michael

Attachment

Re: Extensions not dumped when --schema is used

From
Michael Paquier
Date:
On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote:
> Okay.  So I have looked at that stuff in details, and after fixing
> all the issues reported upthread in the code, docs and tests I am
> finishing with the attached.  The tests have been moved out of
> src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both
> positive and negative tests (used the trick with plpgsql for the
> latter to avoid the dump of the extension test_pg_dump or any data
> related to it).

I have double-checked this stuff this morning, and did not notice any
issues.  So, applied.
--
Michael

Attachment

Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Le mer. 31 mars 2021 à 02:37, Michael Paquier <michael@paquier.xyz> a écrit :
On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote:
> Okay.  So I have looked at that stuff in details, and after fixing
> all the issues reported upthread in the code, docs and tests I am
> finishing with the attached.  The tests have been moved out of
> src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both
> positive and negative tests (used the trick with plpgsql for the
> latter to avoid the dump of the extension test_pg_dump or any data
> related to it).

I have double-checked this stuff this morning, and did not notice any
issues.  So, applied.

Thanks a lot. I've seen your email yesterday but had too much work going on to find the time to test your patch. Anyway, I'll take a look at how you coded the TAP test to better understand that part and to be able to do it myself next time.

Thanks again.

Re: Extensions not dumped when --schema is used

From
Noah Misch
Date:
On Wed, Mar 31, 2021 at 09:37:44AM +0900, Michael Paquier wrote:
> On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote:
> > Okay.  So I have looked at that stuff in details, and after fixing
> > all the issues reported upthread in the code, docs and tests I am
> > finishing with the attached.  The tests have been moved out of
> > src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both
> > positive and negative tests (used the trick with plpgsql for the
> > latter to avoid the dump of the extension test_pg_dump or any data
> > related to it).
> 
> I have double-checked this stuff this morning, and did not notice any
> issues.  So, applied.

I noticed the patch's behavior for relations that are members of non-dumped
extensions and are also registered using pg_extension_config_dump().  It
depends on the schema:

- If extschema='public', "pg_dump -e plpgsql" makes no mention of the
  relations.
- If extschema='public', "pg_dump -e plpgsql --schema=public" includes
  commands to dump the relation data.  This surprised me.  (The
  --schema=public argument causes selectDumpableNamespace() to set
  nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.)
- If extschema is not any sort of built-in schema, "pg_dump -e plpgsql"
  includes commands to dump the relation data.  This surprised me.

I'm attaching a test case patch that demonstrates this.  Is this behavior
intentional?

Attachment

Re: Extensions not dumped when --schema is used

From
Noah Misch
Date:
On Sun, Apr 04, 2021 at 03:08:02PM -0700, Noah Misch wrote:
> On Wed, Mar 31, 2021 at 09:37:44AM +0900, Michael Paquier wrote:
> > On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote:
> > > Okay.  So I have looked at that stuff in details, and after fixing
> > > all the issues reported upthread in the code, docs and tests I am
> > > finishing with the attached.  The tests have been moved out of
> > > src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both
> > > positive and negative tests (used the trick with plpgsql for the
> > > latter to avoid the dump of the extension test_pg_dump or any data
> > > related to it).
> > 
> > I have double-checked this stuff this morning, and did not notice any
> > issues.  So, applied.
> 
> I noticed the patch's behavior for relations that are members of non-dumped
> extensions and are also registered using pg_extension_config_dump().  It
> depends on the schema:
> 
> - If extschema='public', "pg_dump -e plpgsql" makes no mention of the
>   relations.
> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes
>   commands to dump the relation data.  This surprised me.  (The
>   --schema=public argument causes selectDumpableNamespace() to set
>   nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.)
> - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql"
>   includes commands to dump the relation data.  This surprised me.
> 
> I'm attaching a test case patch that demonstrates this.  Is this behavior
> intentional?

I think this is a bug in $SUBJECT.



Re: Extensions not dumped when --schema is used

From
Michael Paquier
Date:
On Wed, Apr 07, 2021 at 07:42:11PM -0700, Noah Misch wrote:
> I think this is a bug in $SUBJECT.

Sorry for the late reply.  I intend to answer to that and this is
registered as an open item, but I got busy with some other things.
--
Michael

Attachment

Re: Extensions not dumped when --schema is used

From
Michael Paquier
Date:
On Sun, Apr 04, 2021 at 03:08:02PM -0700, Noah Misch wrote:
> I noticed the patch's behavior for relations that are members of non-dumped
> extensions and are also registered using pg_extension_config_dump().  It
> depends on the schema:
>
> - If extschema='public', "pg_dump -e plpgsql" makes no mention of the
>   relations.

This one is expected to me.  The caller of pg_dump is not specifying
the extension that should be dumped, hence it looks logic to me to not
dump the tables marked as pg_extension_config_dump() part as an
extension not listed.

> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes
>   commands to dump the relation data.  This surprised me.  (The
>   --schema=public argument causes selectDumpableNamespace() to set
>   nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.)

This one would be expected to me.  Per the discussion of upthread, we
want --schema and --extension to be two separate and exclusive
switches.  So, once the caller specifies --schema we should dump the
contents of the schema, even if its extension is not listed with
--extension.  Anyway, the behavior to select if a schema can be dumped
or not is not really related to this new code, right?  And "public" is
a mixed beast, being treated as a system object and a user object by
selectDumpableNamespace().

> - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql"
>   includes commands to dump the relation data.  This surprised me.

Hmm.  But you are right that this one is inconsistent with the first
case where the extension is not listed.  I would have said that as the
extension is not directly specified and that the schema is not passed
down either then we should not dump it at all, but this combination
actually does so.  Maybe we should add an extra logic into
selectDumpableNamespace(), close to the end of it, to decide if,
depending on the contents of the extensions to include, we should dump
its associated schema or not?  Another solution would be to make use
of schema_include_oids to filter out the schemas we don't want, but
that would mean that --extension gets priority over --schema or
--table but we did ot want that per the original discussion.
--
Michael

Attachment

Re: Extensions not dumped when --schema is used

From
Noah Misch
Date:
On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote:
> On Sun, Apr 04, 2021 at 03:08:02PM -0700, Noah Misch wrote:
> > I noticed the patch's behavior for relations that are members of non-dumped
> > extensions and are also registered using pg_extension_config_dump().  It
> > depends on the schema:
> > 
> > - If extschema='public', "pg_dump -e plpgsql" makes no mention of the
> >   relations.
> 
> This one is expected to me.  The caller of pg_dump is not specifying
> the extension that should be dumped, hence it looks logic to me to not
> dump the tables marked as pg_extension_config_dump() part as an
> extension not listed.

Agreed.

> > - If extschema='public', "pg_dump -e plpgsql --schema=public" includes
> >   commands to dump the relation data.  This surprised me.  (The
> >   --schema=public argument causes selectDumpableNamespace() to set
> >   nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.)
> 
> This one would be expected to me.  Per the discussion of upthread, we
> want --schema and --extension to be two separate and exclusive
> switches.  So, once the caller specifies --schema we should dump the
> contents of the schema, even if its extension is not listed with
> --extension.

I may disagree with this later, but I'm setting it aside for the moment.

> Anyway, the behavior to select if a schema can be dumped
> or not is not really related to this new code, right?  And "public" is
> a mixed beast, being treated as a system object and a user object by
> selectDumpableNamespace().

Correct.

> > - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql"
> >   includes commands to dump the relation data.  This surprised me.
> 
> Hmm.  But you are right that this one is inconsistent with the first
> case where the extension is not listed.  I would have said that as the
> extension is not directly specified and that the schema is not passed
> down either then we should not dump it at all, but this combination
> actually does so.  Maybe we should add an extra logic into
> selectDumpableNamespace(), close to the end of it, to decide if,
> depending on the contents of the extensions to include, we should dump
> its associated schema or not?  Another solution would be to make use
> of schema_include_oids to filter out the schemas we don't want, but
> that would mean that --extension gets priority over --schema or
> --table but we did ot want that per the original discussion.

No, neither of those solutions apply.  "pg_dump -e plpgsql" selects all
schemas.  That is consistent with its documentation; plain "pg_dump" has long
selected all schemas, and the documentation for "-e" does not claim that "-e"
changes the selection of non-extension objects.  We're not going to solve the
problem by making selectDumpableNamespace() select some additional aspect of
schema foo, because it's already selecting every available aspect.  Like you
say, we're also not going to solve the problem by removing some existing
aspect of schema foo from selection, because that would remove dump material
unrelated to any extension.

This isn't a problem of selecting schemas for inclusion in the dump.  This is
a problem of associating extensions with their pg_extension_config_dump()
relations and omitting those extension-member relations when "-e" causes
omission of the extension.



Re: Extensions not dumped when --schema is used

From
Michael Paquier
Date:
On Tue, Apr 13, 2021 at 08:00:34AM -0700, Noah Misch wrote:
> On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote:
>>> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes
>>>   commands to dump the relation data.  This surprised me.  (The
>>>   --schema=public argument causes selectDumpableNamespace() to set
>>>   nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.)
>>
>> This one would be expected to me.  Per the discussion of upthread, we
>> want --schema and --extension to be two separate and exclusive
>> switches.  So, once the caller specifies --schema we should dump the
>> contents of the schema, even if its extension is not listed with
>> --extension.
>
> I may disagree with this later, but I'm setting it aside for the moment.
>
> This isn't a problem of selecting schemas for inclusion in the dump.  This is
> a problem of associating extensions with their pg_extension_config_dump()
> relations and omitting those extension-member relations when "-e" causes
> omission of the extension.

At code level, the decision to dump the data of any extension's
dumpable table is done in processExtensionTables().  I have to admit
that your feeling here keeps the code simpler than what I have been
thinking if we apply an extra filtering based on the list of
extensions in this code path.  So I can see the value in your argument
to not dump at all the data of an extension's dumpable table as long
as its extension is not listed, and this, even if its schema is
explicitly listed.

So I got down to make the behavior more consistent with the patch
attached.  This passes your case.  It is worth noting that if a
table is part of a schema created by an extension, but that the table
is not dependent on the extension, we would still dump its data if
using --schema with this table's schema while the extension is not
part of the list from --extension.  In the attached, that's just the
extra test with without_extension_implicit_schema.

(By the way, good catch with the duplicated --no-sync in the new
tests.)

What do you think?
--
Michael

Attachment

Re: Extensions not dumped when --schema is used

From
Noah Misch
Date:
On Wed, Apr 14, 2021 at 10:38:17AM +0900, Michael Paquier wrote:
> On Tue, Apr 13, 2021 at 08:00:34AM -0700, Noah Misch wrote:
> > On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote:
> >>> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes
> >>>   commands to dump the relation data.  This surprised me.  (The
> >>>   --schema=public argument causes selectDumpableNamespace() to set
> >>>   nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.)

> > This isn't a problem of selecting schemas for inclusion in the dump.  This is
> > a problem of associating extensions with their pg_extension_config_dump()
> > relations and omitting those extension-member relations when "-e" causes
> > omission of the extension.
> 
> At code level, the decision to dump the data of any extension's
> dumpable table is done in processExtensionTables().  I have to admit
> that your feeling here keeps the code simpler than what I have been
> thinking if we apply an extra filtering based on the list of
> extensions in this code path.  So I can see the value in your argument
> to not dump at all the data of an extension's dumpable table as long
> as its extension is not listed, and this, even if its schema is
> explicitly listed.
> 
> So I got down to make the behavior more consistent with the patch
> attached.  This passes your case.

Yes.

> It is worth noting that if a
> table is part of a schema created by an extension, but that the table
> is not dependent on the extension, we would still dump its data if
> using --schema with this table's schema while the extension is not
> part of the list from --extension.  In the attached, that's just the
> extra test with without_extension_implicit_schema.

That's consistent with v13, and it seems fine.  I've not used a non-test
extension that creates a schema.

> --- a/src/test/modules/test_pg_dump/t/001_base.pl
> +++ b/src/test/modules/test_pg_dump/t/001_base.pl
> @@ -208,6 +208,30 @@ my %pgdump_runs = (
>              'pg_dump', '--no-sync', "--file=$tempdir/without_extension.sql",
>              '--extension=plpgsql', 'postgres',
>          ],
> +    },
> +
> +    # plgsql in the list of extensions blocks the dump of extension
> +    # test_pg_dump.
> +    without_extension_explicit_schema => {
> +        dump_cmd => [
> +            'pg_dump',
> +            '--no-sync',
> +            "--file=$tempdir/without_extension_explicit_schema.sql",
> +            '--extension=plpgsql',
> +            '--schema=public',
> +            'postgres',
> +        ],
> +    },
> +
> +    without_extension_implicit_schema => {
> +        dump_cmd => [
> +            'pg_dump',
> +            '--no-sync',
> +            "--file=$tempdir/without_extension_implicit_schema.sql",
> +            '--extension=plpgsql',
> +            '--schema=regress_pg_dump_schema',
> +            'postgres',
> +        ],
>      },);

The name "without_extension_explicit_schema" arose because that test differs
from the "without_extension" test by adding --schema=public.  The test named
"without_extension_implicit_schema" differs from "without_extension" by adding
--schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive
of the test.  I recommend picking a different name.  Other than that, the
change looks good.



Re: Extensions not dumped when --schema is used

From
Michael Paquier
Date:
On Wed, Apr 14, 2021 at 05:31:15AM -0700, Noah Misch wrote:
> The name "without_extension_explicit_schema" arose because that test differs
> from the "without_extension" test by adding --schema=public.  The test named
> "without_extension_implicit_schema" differs from "without_extension" by adding
> --schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive
> of the test.  I recommend picking a different name.  Other than that, the
> change looks good.

Thanks for the review.  I have picked up "internal" instead, as
that's the schema created within the extension itself, and applied the
patch.
--
Michael

Attachment

Re: Extensions not dumped when --schema is used

From
Guillaume Lelarge
Date:
Le jeu. 15 avr. 2021 à 09:58, Michael Paquier <michael@paquier.xyz> a écrit :
On Wed, Apr 14, 2021 at 05:31:15AM -0700, Noah Misch wrote:
> The name "without_extension_explicit_schema" arose because that test differs
> from the "without_extension" test by adding --schema=public.  The test named
> "without_extension_implicit_schema" differs from "without_extension" by adding
> --schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive
> of the test.  I recommend picking a different name.  Other than that, the
> change looks good.

Thanks for the review.  I have picked up "internal" instead, as
that's the schema created within the extension itself, and applied the
patch.

Thanks for the work on this. I didn't understand everything on the issue, which is why I didn't say a thing, but I followed the thread, and very much appreciated the fix.


--
Guillaume.