Thread: Extensions not dumped when --schema is used
Hello,
I've discovered something today that I didn't really expect. When a user dumps a database with the --schema flag of pg_dump, extensions in this schema aren't dumped. As far as I can tell, the documentation isn't clear about this ("Dump only schemas matching pattern; this selects both the schema itself, and all its contained objects."), though the source code definitely is ("We dump all user-added extensions by default, or none of them if include_everything is false (i.e., a --schema or --table switch was given).", in pg_dump.c).
I was wondering the reason behind this choice. If anyone knows, I'd be happy to hear about it.
I see two things:
* it's been overlooked, and we should dump all the extensions available in a schema if this schema has been selected through the --schema flag.
* it's kind of like the large objects handling, and I'd pretty interested in adding a --extensions (as the same way there is a --blobs flag).
Thanks.
Regards.
--
Guillaume.
On Wed, 2020-05-20 at 10:06 +0200, Guillaume Lelarge wrote: > I've discovered something today that I didn't really expect. > When a user dumps a database with the --schema flag of pg_dump, > extensions in this schema aren't dumped. As far as I can tell, > the documentation isn't clear about this ("Dump only schemas > matching pattern; this selects both the schema itself, and all > its contained objects."), though the source code definitely is > ("We dump all user-added extensions by default, or none of them > if include_everything is false (i.e., a --schema or --table > switch was given).", in pg_dump.c). > > I was wondering the reason behind this choice. If anyone knows, > I'd be happy to hear about it. > > I see two things: > * it's been overlooked, and we should dump all the extensions > available in a schema if this schema has been selected through > the --schema flag. > * it's kind of like the large objects handling, and I'd pretty > interested in adding a --extensions (as the same way there is a > --blobs flag). I am not sure if there is a good reason for the current behavior, but I'd favor the second solution. I think as extensions as belonging to the database rather than the schema; the schema is just where the objects are housed. Yours, Laurenz Albe
> On 20 May 2020, at 10:06, Guillaume Lelarge <guillaume@lelarge.info> wrote: > I was wondering the reason behind this choice. If anyone knows, I'd be happy to hear about it. Extensions were dumped unconditionally in the beginning, but it was changed to match how procedural language definitions were dumped. > * it's been overlooked, and we should dump all the extensions available in a schema if this schema has been selected throughthe --schema flag. For reference, --schema-only will include all the extensions, but not --schema=foo and not "--schema-only --schema=foo". Extensions don't belong to a schema per se, the namespace oid in pg_extension marks where most of the objects are contained but not necessarily all of them. Given that, it makes sense to not include extensions for --schema. However, that can be considered sort of an implementation detail which may not be entirely obvious to users (especially since you yourself is a power-user). > * it's kind of like the large objects handling, and I'd pretty interested in adding a --extensions (as the same way thereis a --blobs flag). An object in a schema might have attributes that depend on an extension in order to restore, so it makes sense to provide a way to include them for a --schema dump. The question is what --extensions should do: only dump any extensions that objects in the schema depend on; require a pattern and only dump matching extensions; dump all extensions (probably not) or something else? cheers ./daniel
Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit :
> On 20 May 2020, at 10:06, Guillaume Lelarge <guillaume@lelarge.info> wrote:
> I was wondering the reason behind this choice. If anyone knows, I'd be happy to hear about it.
Extensions were dumped unconditionally in the beginning, but it was changed to
match how procedural language definitions were dumped.
That makes sense. Thank you.
> * it's been overlooked, and we should dump all the extensions available in a schema if this schema has been selected through the --schema flag.
For reference, --schema-only will include all the extensions, but not
--schema=foo and not "--schema-only --schema=foo".
Yes.
Extensions don't belong to a schema per se, the namespace oid in pg_extension
marks where most of the objects are contained but not necessarily all of them.
Given that, it makes sense to not include extensions for --schema. However,
that can be considered sort of an implementation detail which may not be
entirely obvious to users (especially since you yourself is a power-user).
I agree.
> * it's kind of like the large objects handling, and I'd pretty interested in adding a --extensions (as the same way there is a --blobs flag).
An object in a schema might have attributes that depend on an extension in
order to restore, so it makes sense to provide a way to include them for a
--schema dump.
That's actually how I figured this out. A customer can't restore his dump because of a missing extension (pg_trgm to be precise).
The question is what --extensions should do: only dump any
extensions that objects in the schema depend on; require a pattern and only
dump matching extensions; dump all extensions (probably not) or something else?
Actually, "dump all extensions" (#3) would make sense to me, and has my vote. Otherwise, and though it would imply much more work, "only dump any extension that objects in the schema depend on" (#1) comes second in my opinion. Using the pattern means something more manual for users, and I really think it would be a bad idea. People dump databases, schemas, and tables. Theu usually don't know which extensions those objects depend on. But, anyway, I would work on any of these solutions, depending on what most people think is best.
Thanks.
--
Guillaume.
> On 20 May 2020, at 11:38, Guillaume Lelarge <guillaume@lelarge.info> wrote: > Actually, "dump all extensions" (#3) would make sense to me, and has my vote. Wouldn't that open for another set of problems when running with --schema=bar and getting errors on restoring for relocatable extensions like these: CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA foo; Only dumping extensions depended on has the same problem of course. > People dump databases, schemas, and tables. Theu usually don't know which extensions those objects depend on. That I totally agree with, question is how we best can help users here. cheers ./daniel
Guillaume Lelarge <guillaume@lelarge.info> writes: > Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit : >> The question is what --extensions should do: only dump any >> extensions that objects in the schema depend on; require a pattern and only >> dump matching extensions; dump all extensions (probably not) or something >> else? > Actually, "dump all extensions" (#3) would make sense to me, and has my > vote. I think that makes no sense at all. By definition, a dump that's been restricted with --schema, --table, or any similar switch is incomplete and may not restore on its own. Typical examples include foreign key references to tables in other schemas, views using functions in other schemas, etc etc. I see no reason for extension dependencies to be treated differently from those cases. In any use of selective dump, it's the user's job to select a set of objects that she wants dumped (or restored). Trying to second-guess that is mostly going to make the feature less usable for power-user cases. As a counterexample, what if you want the dump to be restorable on a system that doesn't have all of the extensions available on the source? You carefully pick out the tables that you need, which don't require the unavailable extensions ... and then pg_dump decides you don't know what you're doing and includes all the problematic extensions anyway. I could get behind an "--extensions=PATTERN" switch to allow selective addition of extensions to a selective dump, but I don't want to see us overruling the user's choices about what to dump. regards, tom lane
Le mer. 20 mai 2020 à 16:39, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit :
>> The question is what --extensions should do: only dump any
>> extensions that objects in the schema depend on; require a pattern and only
>> dump matching extensions; dump all extensions (probably not) or something
>> else?
> Actually, "dump all extensions" (#3) would make sense to me, and has my
> vote.
I think that makes no sense at all. By definition, a dump that's been
restricted with --schema, --table, or any similar switch is incomplete
and may not restore on its own. Typical examples include foreign key
references to tables in other schemas, views using functions in other
schemas, etc etc. I see no reason for extension dependencies to be
treated differently from those cases.
Agreed.
In any use of selective dump, it's the user's job to select a set of
objects that she wants dumped (or restored). Trying to second-guess that
is mostly going to make the feature less usable for power-user cases.
Agreed, though right now he has no way to do this for extensions.
As a counterexample, what if you want the dump to be restorable on a
system that doesn't have all of the extensions available on the source?
You carefully pick out the tables that you need, which don't require the
unavailable extensions ... and then pg_dump decides you don't know what
you're doing and includes all the problematic extensions anyway.
That's true.
I could get behind an "--extensions=PATTERN" switch to allow selective
addition of extensions to a selective dump, but I don't want to see us
overruling the user's choices about what to dump.
With all your comments, I can only agree to your views. I'll try to work on this anytime soon.
--
Guillaume.
Hi,
Le sam. 23 mai 2020 à 14:53, Guillaume Lelarge <guillaume@lelarge.info> a écrit :
Le mer. 20 mai 2020 à 16:39, Tom Lane <tgl@sss.pgh.pa.us> a écrit :Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson <daniel@yesql.se> a écrit :
>> The question is what --extensions should do: only dump any
>> extensions that objects in the schema depend on; require a pattern and only
>> dump matching extensions; dump all extensions (probably not) or something
>> else?
> Actually, "dump all extensions" (#3) would make sense to me, and has my
> vote.
I think that makes no sense at all. By definition, a dump that's been
restricted with --schema, --table, or any similar switch is incomplete
and may not restore on its own. Typical examples include foreign key
references to tables in other schemas, views using functions in other
schemas, etc etc. I see no reason for extension dependencies to be
treated differently from those cases.Agreed.In any use of selective dump, it's the user's job to select a set of
objects that she wants dumped (or restored). Trying to second-guess that
is mostly going to make the feature less usable for power-user cases.Agreed, though right now he has no way to do this for extensions.As a counterexample, what if you want the dump to be restorable on a
system that doesn't have all of the extensions available on the source?
You carefully pick out the tables that you need, which don't require the
unavailable extensions ... and then pg_dump decides you don't know what
you're doing and includes all the problematic extensions anyway.That's true.I could get behind an "--extensions=PATTERN" switch to allow selective
addition of extensions to a selective dump, but I don't want to see us
overruling the user's choices about what to dump.With all your comments, I can only agree to your views. I'll try to work on this anytime soon.
"Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.
I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit fest (2021-03), right?
Guillaume.
Attachment
On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge <guillaume@lelarge.info> wrote: > > "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, Itook a shot at it. See attached patch. Great! I didn't reviewed it thoroughly yet, but after a quick look it sounds sensible. I'd prefer to see some tests added, and it looks like a test for plpgsql could be added quite easily. > I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit fest(2021-03), right? Correct, and please add it on the commit fest!
Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :
On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.
Great!
I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible. I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.
I tried that all afternoon yesterday, but failed to do so. My had still hurts, but I'll try again though it may take some time.
> I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit fest (2021-03), right?
Correct, and please add it on the commit fest!
Done, see https://commitfest.postgresql.org/32/2956/.
Guillaume.
Le mar. 26 janv. 2021 à 13:41, Guillaume Lelarge <guillaume@lelarge.info> a écrit :
Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.
Great!
I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible. I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.I tried that all afternoon yesterday, but failed to do so. My had still hurts, but I'll try again though it may take some time.
s/My had/My head/ ..
> I don't know if I should add this right away in the commit fest app. If yes, I guess it should go on the next commit fest (2021-03), right?
Correct, and please add it on the commit fest!Done, see https://commitfest.postgresql.org/32/2956/.
Guillaume.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested The patch applies cleanly and looks fine to me. However consider this scenario. - CREATE SCHEMA foo; - CREATE EXTENSION file_fdw WITH SCHEMA foo; - pg_dump --file=/tmp/test.sql --exclude-schema=foo postgres This will still include the extension 'file_fdw' in the backup script. Shouldn't it be excluded as well? The new status of this patch is: Waiting on Author
Hi,
Thanks for the review.
Le mer. 3 févr. 2021 à 18:33, Asif Rehman <asifr.rehman@gmail.com> a écrit :
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
The patch applies cleanly and looks fine to me. However consider this scenario.
- CREATE SCHEMA foo;
- CREATE EXTENSION file_fdw WITH SCHEMA foo;
- pg_dump --file=/tmp/test.sql --exclude-schema=foo postgres
This will still include the extension 'file_fdw' in the backup script. Shouldn't it be excluded as well?
The new status of this patch is: Waiting on Author
This behaviour is already there without my patch, and I think it's a valid behaviour. An extension doesn't belong to a schema. Its objects do, but the extension doesn't.
Guillaume.
Le mar. 26 janv. 2021 à 13:42, Guillaume Lelarge <guillaume@lelarge.info> a écrit :
Le mar. 26 janv. 2021 à 13:41, Guillaume Lelarge <guillaume@lelarge.info> a écrit :Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot this, sorry. As nobody worked on it yet, I took a shot at it. See attached patch.
Great!
I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible. I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.I tried that all afternoon yesterday, but failed to do so. My had still hurts, but I'll try again though it may take some time.s/My had/My head/ ..
I finally managed to get a working TAP test for my patch. I have no idea if it's good, and if it's enough. Anyway, new version of the patch attached.
Guillaume.
Attachment
On Mon, Mar 15, 2021 at 11:37:02AM +0100, Guillaume Lelarge wrote: > Anyways. Yeah, I know we're near feature freeze. This feature would be nice > to have, but I don't feel strongly about it. I think this feature is > currently lacking in PostgreSQL but I don't much care if it makes it to 14 > or any future release. If you have time to work on the pg_dump test suite > and are interested, then sure, go ahead, I'm fine with this. Otherwise I'll > do it in a few weeks and if it means it'll land in v15, then be it. That's > not an issue in itself. Okay. So I have looked at that stuff in details, and after fixing all the issues reported upthread in the code, docs and tests I am finishing with the attached. The tests have been moved out of src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both positive and negative tests (used the trick with plpgsql for the latter to avoid the dump of the extension test_pg_dump or any data related to it). -- Michael
Attachment
On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote: > Okay. So I have looked at that stuff in details, and after fixing > all the issues reported upthread in the code, docs and tests I am > finishing with the attached. The tests have been moved out of > src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both > positive and negative tests (used the trick with plpgsql for the > latter to avoid the dump of the extension test_pg_dump or any data > related to it). I have double-checked this stuff this morning, and did not notice any issues. So, applied. -- Michael
Attachment
Le mer. 31 mars 2021 à 02:37, Michael Paquier <michael@paquier.xyz> a écrit :
On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote:
> Okay. So I have looked at that stuff in details, and after fixing
> all the issues reported upthread in the code, docs and tests I am
> finishing with the attached. The tests have been moved out of
> src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both
> positive and negative tests (used the trick with plpgsql for the
> latter to avoid the dump of the extension test_pg_dump or any data
> related to it).
I have double-checked this stuff this morning, and did not notice any
issues. So, applied.
Thanks a lot. I've seen your email yesterday but had too much work going on to find the time to test your patch. Anyway, I'll take a look at how you coded the TAP test to better understand that part and to be able to do it myself next time.
Thanks again.
On Wed, Mar 31, 2021 at 09:37:44AM +0900, Michael Paquier wrote: > On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote: > > Okay. So I have looked at that stuff in details, and after fixing > > all the issues reported upthread in the code, docs and tests I am > > finishing with the attached. The tests have been moved out of > > src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both > > positive and negative tests (used the trick with plpgsql for the > > latter to avoid the dump of the extension test_pg_dump or any data > > related to it). > > I have double-checked this stuff this morning, and did not notice any > issues. So, applied. I noticed the patch's behavior for relations that are members of non-dumped extensions and are also registered using pg_extension_config_dump(). It depends on the schema: - If extschema='public', "pg_dump -e plpgsql" makes no mention of the relations. - If extschema='public', "pg_dump -e plpgsql --schema=public" includes commands to dump the relation data. This surprised me. (The --schema=public argument causes selectDumpableNamespace() to set nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql" includes commands to dump the relation data. This surprised me. I'm attaching a test case patch that demonstrates this. Is this behavior intentional?
Attachment
On Sun, Apr 04, 2021 at 03:08:02PM -0700, Noah Misch wrote: > On Wed, Mar 31, 2021 at 09:37:44AM +0900, Michael Paquier wrote: > > On Tue, Mar 30, 2021 at 12:02:45PM +0900, Michael Paquier wrote: > > > Okay. So I have looked at that stuff in details, and after fixing > > > all the issues reported upthread in the code, docs and tests I am > > > finishing with the attached. The tests have been moved out of > > > src/bin/pg_dump/ to src/test/modules/test_pg_dump/, and include both > > > positive and negative tests (used the trick with plpgsql for the > > > latter to avoid the dump of the extension test_pg_dump or any data > > > related to it). > > > > I have double-checked this stuff this morning, and did not notice any > > issues. So, applied. > > I noticed the patch's behavior for relations that are members of non-dumped > extensions and are also registered using pg_extension_config_dump(). It > depends on the schema: > > - If extschema='public', "pg_dump -e plpgsql" makes no mention of the > relations. > - If extschema='public', "pg_dump -e plpgsql --schema=public" includes > commands to dump the relation data. This surprised me. (The > --schema=public argument causes selectDumpableNamespace() to set > nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) > - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql" > includes commands to dump the relation data. This surprised me. > > I'm attaching a test case patch that demonstrates this. Is this behavior > intentional? I think this is a bug in $SUBJECT.
On Wed, Apr 07, 2021 at 07:42:11PM -0700, Noah Misch wrote: > I think this is a bug in $SUBJECT. Sorry for the late reply. I intend to answer to that and this is registered as an open item, but I got busy with some other things. -- Michael
Attachment
On Sun, Apr 04, 2021 at 03:08:02PM -0700, Noah Misch wrote: > I noticed the patch's behavior for relations that are members of non-dumped > extensions and are also registered using pg_extension_config_dump(). It > depends on the schema: > > - If extschema='public', "pg_dump -e plpgsql" makes no mention of the > relations. This one is expected to me. The caller of pg_dump is not specifying the extension that should be dumped, hence it looks logic to me to not dump the tables marked as pg_extension_config_dump() part as an extension not listed. > - If extschema='public', "pg_dump -e plpgsql --schema=public" includes > commands to dump the relation data. This surprised me. (The > --schema=public argument causes selectDumpableNamespace() to set > nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) This one would be expected to me. Per the discussion of upthread, we want --schema and --extension to be two separate and exclusive switches. So, once the caller specifies --schema we should dump the contents of the schema, even if its extension is not listed with --extension. Anyway, the behavior to select if a schema can be dumped or not is not really related to this new code, right? And "public" is a mixed beast, being treated as a system object and a user object by selectDumpableNamespace(). > - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql" > includes commands to dump the relation data. This surprised me. Hmm. But you are right that this one is inconsistent with the first case where the extension is not listed. I would have said that as the extension is not directly specified and that the schema is not passed down either then we should not dump it at all, but this combination actually does so. Maybe we should add an extra logic into selectDumpableNamespace(), close to the end of it, to decide if, depending on the contents of the extensions to include, we should dump its associated schema or not? Another solution would be to make use of schema_include_oids to filter out the schemas we don't want, but that would mean that --extension gets priority over --schema or --table but we did ot want that per the original discussion. -- Michael
Attachment
On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote: > On Sun, Apr 04, 2021 at 03:08:02PM -0700, Noah Misch wrote: > > I noticed the patch's behavior for relations that are members of non-dumped > > extensions and are also registered using pg_extension_config_dump(). It > > depends on the schema: > > > > - If extschema='public', "pg_dump -e plpgsql" makes no mention of the > > relations. > > This one is expected to me. The caller of pg_dump is not specifying > the extension that should be dumped, hence it looks logic to me to not > dump the tables marked as pg_extension_config_dump() part as an > extension not listed. Agreed. > > - If extschema='public', "pg_dump -e plpgsql --schema=public" includes > > commands to dump the relation data. This surprised me. (The > > --schema=public argument causes selectDumpableNamespace() to set > > nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) > > This one would be expected to me. Per the discussion of upthread, we > want --schema and --extension to be two separate and exclusive > switches. So, once the caller specifies --schema we should dump the > contents of the schema, even if its extension is not listed with > --extension. I may disagree with this later, but I'm setting it aside for the moment. > Anyway, the behavior to select if a schema can be dumped > or not is not really related to this new code, right? And "public" is > a mixed beast, being treated as a system object and a user object by > selectDumpableNamespace(). Correct. > > - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql" > > includes commands to dump the relation data. This surprised me. > > Hmm. But you are right that this one is inconsistent with the first > case where the extension is not listed. I would have said that as the > extension is not directly specified and that the schema is not passed > down either then we should not dump it at all, but this combination > actually does so. Maybe we should add an extra logic into > selectDumpableNamespace(), close to the end of it, to decide if, > depending on the contents of the extensions to include, we should dump > its associated schema or not? Another solution would be to make use > of schema_include_oids to filter out the schemas we don't want, but > that would mean that --extension gets priority over --schema or > --table but we did ot want that per the original discussion. No, neither of those solutions apply. "pg_dump -e plpgsql" selects all schemas. That is consistent with its documentation; plain "pg_dump" has long selected all schemas, and the documentation for "-e" does not claim that "-e" changes the selection of non-extension objects. We're not going to solve the problem by making selectDumpableNamespace() select some additional aspect of schema foo, because it's already selecting every available aspect. Like you say, we're also not going to solve the problem by removing some existing aspect of schema foo from selection, because that would remove dump material unrelated to any extension. This isn't a problem of selecting schemas for inclusion in the dump. This is a problem of associating extensions with their pg_extension_config_dump() relations and omitting those extension-member relations when "-e" causes omission of the extension.
On Tue, Apr 13, 2021 at 08:00:34AM -0700, Noah Misch wrote: > On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote: >>> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes >>> commands to dump the relation data. This surprised me. (The >>> --schema=public argument causes selectDumpableNamespace() to set >>> nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) >> >> This one would be expected to me. Per the discussion of upthread, we >> want --schema and --extension to be two separate and exclusive >> switches. So, once the caller specifies --schema we should dump the >> contents of the schema, even if its extension is not listed with >> --extension. > > I may disagree with this later, but I'm setting it aside for the moment. > > This isn't a problem of selecting schemas for inclusion in the dump. This is > a problem of associating extensions with their pg_extension_config_dump() > relations and omitting those extension-member relations when "-e" causes > omission of the extension. At code level, the decision to dump the data of any extension's dumpable table is done in processExtensionTables(). I have to admit that your feeling here keeps the code simpler than what I have been thinking if we apply an extra filtering based on the list of extensions in this code path. So I can see the value in your argument to not dump at all the data of an extension's dumpable table as long as its extension is not listed, and this, even if its schema is explicitly listed. So I got down to make the behavior more consistent with the patch attached. This passes your case. It is worth noting that if a table is part of a schema created by an extension, but that the table is not dependent on the extension, we would still dump its data if using --schema with this table's schema while the extension is not part of the list from --extension. In the attached, that's just the extra test with without_extension_implicit_schema. (By the way, good catch with the duplicated --no-sync in the new tests.) What do you think? -- Michael
Attachment
On Wed, Apr 14, 2021 at 10:38:17AM +0900, Michael Paquier wrote: > On Tue, Apr 13, 2021 at 08:00:34AM -0700, Noah Misch wrote: > > On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote: > >>> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes > >>> commands to dump the relation data. This surprised me. (The > >>> --schema=public argument causes selectDumpableNamespace() to set > >>> nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) > > This isn't a problem of selecting schemas for inclusion in the dump. This is > > a problem of associating extensions with their pg_extension_config_dump() > > relations and omitting those extension-member relations when "-e" causes > > omission of the extension. > > At code level, the decision to dump the data of any extension's > dumpable table is done in processExtensionTables(). I have to admit > that your feeling here keeps the code simpler than what I have been > thinking if we apply an extra filtering based on the list of > extensions in this code path. So I can see the value in your argument > to not dump at all the data of an extension's dumpable table as long > as its extension is not listed, and this, even if its schema is > explicitly listed. > > So I got down to make the behavior more consistent with the patch > attached. This passes your case. Yes. > It is worth noting that if a > table is part of a schema created by an extension, but that the table > is not dependent on the extension, we would still dump its data if > using --schema with this table's schema while the extension is not > part of the list from --extension. In the attached, that's just the > extra test with without_extension_implicit_schema. That's consistent with v13, and it seems fine. I've not used a non-test extension that creates a schema. > --- a/src/test/modules/test_pg_dump/t/001_base.pl > +++ b/src/test/modules/test_pg_dump/t/001_base.pl > @@ -208,6 +208,30 @@ my %pgdump_runs = ( > 'pg_dump', '--no-sync', "--file=$tempdir/without_extension.sql", > '--extension=plpgsql', 'postgres', > ], > + }, > + > + # plgsql in the list of extensions blocks the dump of extension > + # test_pg_dump. > + without_extension_explicit_schema => { > + dump_cmd => [ > + 'pg_dump', > + '--no-sync', > + "--file=$tempdir/without_extension_explicit_schema.sql", > + '--extension=plpgsql', > + '--schema=public', > + 'postgres', > + ], > + }, > + > + without_extension_implicit_schema => { > + dump_cmd => [ > + 'pg_dump', > + '--no-sync', > + "--file=$tempdir/without_extension_implicit_schema.sql", > + '--extension=plpgsql', > + '--schema=regress_pg_dump_schema', > + 'postgres', > + ], > },); The name "without_extension_explicit_schema" arose because that test differs from the "without_extension" test by adding --schema=public. The test named "without_extension_implicit_schema" differs from "without_extension" by adding --schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive of the test. I recommend picking a different name. Other than that, the change looks good.
On Wed, Apr 14, 2021 at 05:31:15AM -0700, Noah Misch wrote: > The name "without_extension_explicit_schema" arose because that test differs > from the "without_extension" test by adding --schema=public. The test named > "without_extension_implicit_schema" differs from "without_extension" by adding > --schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive > of the test. I recommend picking a different name. Other than that, the > change looks good. Thanks for the review. I have picked up "internal" instead, as that's the schema created within the extension itself, and applied the patch. -- Michael
Attachment
Le jeu. 15 avr. 2021 à 09:58, Michael Paquier <michael@paquier.xyz> a écrit :
On Wed, Apr 14, 2021 at 05:31:15AM -0700, Noah Misch wrote:
> The name "without_extension_explicit_schema" arose because that test differs
> from the "without_extension" test by adding --schema=public. The test named
> "without_extension_implicit_schema" differs from "without_extension" by adding
> --schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive
> of the test. I recommend picking a different name. Other than that, the
> change looks good.
Thanks for the review. I have picked up "internal" instead, as
that's the schema created within the extension itself, and applied the
patch.
Thanks for the work on this. I didn't understand everything on the issue, which is why I didn't say a thing, but I followed the thread, and very much appreciated the fix.
--
Guillaume.