Thread: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Hi,
Attached is a patch adds a --no-comments argument to pg_dump to skip generation of COMMENT statements when generating a backup. This is crucial for non-superusers to restore a database backup in a Single Transaction. Currently, this requires one to remove COMMENTs via scripts, which is inelegant at best.
A similar discussion had taken place earlier [1], however that stopped (with a Todo entry) since it required more work at the backend.
This patch provides a stop-gap solution until then. If the feedback is to tackle this is by filtering comments for specific DB objects, an argument name (for e.g. --no-extension-comments or something) would help and I could submit a revised patch.
Alternatively, if there are no objections, I could submit this to the Commitfest.
References:
-
robins
robins
Attachment
Greetings, * Robins Tharakan (tharakan@gmail.com) wrote: > Attached is a patch adds a --no-comments argument to pg_dump to skip > generation of COMMENT statements when generating a backup. This is crucial > for non-superusers to restore a database backup in a Single Transaction. > Currently, this requires one to remove COMMENTs via scripts, which is > inelegant at best. Having --no-comments seems generally useful to me, in any case. > A similar discussion had taken place earlier [1], however that stopped > (with a Todo entry) since it required more work at the backend. Well, that was one possible solution (COMMENT IF NOT EXISTS). That does seem like it'd be nice too, though what I think we really want here is something like: CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah; That general capability has been asked for and discussed before, but it's complicated because we support comments on lots of objects and doesn't address other possible issues with this approach (comments aren't the only things that could exist on plpgsql, and I can't see us having a way to get every component included in a single command...). That leads to an interesting thought which we could consider, which would be represented in some crazy syntax such as: CREATE EXTENSION IF NOT EXISTS plpgsql ... ( COMMENT xyz; GRANT USAGE ON EXTENSION plpgsql whatever; ); > This patch provides a stop-gap solution until then. If the feedback is to > tackle this is by filtering comments for specific DB objects, an argument > name (for e.g. --no-extension-comments or something) would help and I could > submit a revised patch. That seems like it might be a bit too specific. > Alternatively, if there are no objections, I could submit this to the > Commitfest. Yes, please add it to the commitfest. Thanks! Stephen
Greetings,
* Robins Tharakan (tharakan@gmail.com) wrote:
> Attached is a patch adds a --no-comments argument to pg_dump to skip
> generation of COMMENT statements when generating a backup. This is crucial
> for non-superusers to restore a database backup in a Single Transaction.
> Currently, this requires one to remove COMMENTs via scripts, which is
> inelegant at best.
Having --no-comments seems generally useful to me, in any case.
It smacks of being excessive to me.
CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;
A less verbose way to add comments to objects would be nice but we have an immediate problem that we either need to solve or document a best practice for.
COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to what seems to me is the underlying problem...that people don't want a non-functional (usually...) aspect preventing successful restoration.
COMMENT ON object TRY 'text' -- i.e., replace the word IS with TRY
If the attempt to comment fails for any reason log a warning (maybe) but otherwise ignore the result and continue on without invoking an error.
One suggestion I've seen is to simply "COMMENT ON EXTENSION plpgsql IS NULL;" If that works in the scenarios people are currently dealing with then I'd say we should advise that such an action be taken for those whom wish to generate dumps that can be loaded by non-super-users. If the affected users cannot make that work then maybe we should just remove the comment from the extension. People can lookup "plpgsql" in the docs easily enough and "PL/pgSQL procedural language" doesn't do anything more than expand the acronym.
David J.
David, * David G. Johnston (david.g.johnston@gmail.com) wrote: > On Fri, May 26, 2017 at 7:47 AM, Stephen Frost <sfrost@snowman.net> wrote: > > * Robins Tharakan (tharakan@gmail.com) wrote: > > > Attached is a patch adds a --no-comments argument to pg_dump to skip > > > generation of COMMENT statements when generating a backup. This is > > crucial > > > for non-superusers to restore a database backup in a Single Transaction. > > > Currently, this requires one to remove COMMENTs via scripts, which is > > > inelegant at best. > > > > Having --no-comments seems generally useful to me, in any case. > > It smacks of being excessive to me. > I have a hard time having an issue with an option that exists to exclude a particular type of object from being in the dump. A user who never uses large objects/blobs might feel the same way about "--no-blobs", or a user who never uses ACLs wondering why we have a "--no-privileges". In the end, these are also possible components that users may wish to have for their own reasons. What I, certainly, agree isn't ideal is requiring users to use such an option to generate a database-level dump file (assuming they have access to more-or-less all database objects) which can be restored as a non-superuser, that's why I was a bit hesitant about this particular solution to the overall problem. I do agree that if there is simply no use-case, ever, for wishing to strip comments from a database then it might be excessive to provide such an option, but I tend to feel that there is a reasonable, general, use-case for the option. > > CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah; > > A less verbose way to add comments to objects would be nice but we have an > immediate problem that we either need to solve or document a best practice > for. The above would be a solution to the immediate problem in as much as adding COMMENT IF NOT EXISTS would be. > COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to > what seems to me is the underlying problem...that people don't want a > non-functional (usually...) aspect preventing successful restoration. I tend to disagree with this characterization- I'm of the opinion that people are, rightly, confused as to why we bother to try and add a COMMENT to an object which we didn't actually end up creating (as it already existed), and then throw an error on it to boot. Were pg_dump a bit more intelligent of an application, it would realize that once the CREATE ... IF NOT EXISTS returned a notice saying "this thing already existed" that it would realize that it shouldn't try to adjust the attributes of that object, as it was already existing. That, however, would preclude the ability of pg_dump to produce a text file used for restore, unless we started to write into that text file DO blocks, which I doubt would go over very well. > COMMENT ON object TRY 'text' -- i.e., replace the word IS with TRY I'm not sure that I see why we should invent wholly new syntax for something which we already have in IF NOT EXISTS, or why this should really be viewed or considered any differently from the IF NOT EXISTS syntax. Perhaps you could elaborate as to how this is different from IF NOT EXISTS? > If the attempt to comment fails for any reason log a warning (maybe) but > otherwise ignore the result and continue on without invoking an error. In the IF NOT EXISTS case we log a NOTICE, which seems like it's what would be appropriate here also, again begging the question of it this is really different from IF NOT EXISTS in a meaningful way. > One suggestion I've seen is to simply "COMMENT ON EXTENSION plpgsql IS > NULL;" If that works in the scenarios people are currently dealing with > then I'd say we should advise that such an action be taken for those whom > wish to generate dumps that can be loaded by non-super-users. If the > affected users cannot make that work then maybe we should just remove the > comment from the extension. People can lookup "plpgsql" in the docs easily > enough and "PL/pgSQL procedural language" doesn't do anything more than > expand the acronym. Removing the comment as a way to deal with our deficiency in this area strikes me as akin to adding planner hints. Thanks! Stephen
Stephen,
David,
* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Fri, May 26, 2017 at 7:47 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Robins Tharakan (tharakan@gmail.com) wrote:
> > > Attached is a patch adds a --no-comments argument to pg_dump to skip
> > > generation of COMMENT statements when generating a backup. This is
> > crucial
> > > for non-superusers to restore a database backup in a Single Transaction.
> > > Currently, this requires one to remove COMMENTs via scripts, which is
> > > inelegant at best.
> >
> > Having --no-comments seems generally useful to me, in any case.
>
> It smacks of being excessive to me.
>
I have a hard time having an issue with an option that exists to exclude
a particular type of object from being in the dump.
Excessive with respect to the problem at hand. A single comment in the dump is unable to be restored. Because of that we are going to require people to omit every comment in their database. The loss of all that information is in excess of what is required to solve the stated problem which is how I was thinking of excessive.
I agree that the general idea of allowing users to choose to include or exclude comments is worth discussion along the same lines of large objects and privileges.
> > CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;
>
> A less verbose way to add comments to objects would be nice but we have an
> immediate problem that we either need to solve or document a best practice
> for.
The above would be a solution to the immediate problem in as much as
adding COMMENT IF NOT EXISTS would be.
I believe it would take a lot longer, possibly even until 12, to get the linked comment feature committed compared to committing COMMENT IF NOT EXISTS or some variation (or putting in a hack for that matter).
> COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to
> what seems to me is the underlying problem...that people don't want a
> non-functional (usually...) aspect preventing successful restoration.
I tend to disagree with this characterization- I'm of the opinion that
people are, rightly, confused as to why we bother to try and add a
COMMENT to an object which we didn't actually end up creating (as it
already existed), and then throw an error on it to boot.
My characterization does actually describe the current system though. While I won't doubt that people do hold your belief it is an underlying mis-understanding as to how PostgreSQL works since comments aren't, as you say below, actual attributes but rather objects in their own right. I would love to have someone solve the systemic problem here. But the actual complaint can probably be addressed without it.
Were pg_dump a
bit more intelligent of an application, it would realize that once the
CREATE ... IF NOT EXISTS returned a notice saying "this thing already
existed" that it would realize that it shouldn't try to adjust the
attributes of that object, as it was already existing.
pg_dump isn't in play during the restore which is where the error occurs.
During restore you have pg_restore+psql - and having cross-statement logic in those applications is likely a non-starter. That is ultimately the problem here, and which is indeed fixed by the outstanding proposal of embedding COMMENT within the CREATE/ALTER object commands. But today, comments are independent objects and solving the problem within that domain will probably prove easier than changing how the system treats comments.
> COMMENT ON object TRY 'text' -- i.e., replace the word IS with TRY
Perhaps you could elaborate as to how this is different from IF NOT
EXISTS?
If the comment on plpgsql were removed for some reason the COMMENT ON IF NOT EXISTS would fire and then it would fail due to permissions. With "TRY" whether the comment (or object for that matter) exists or not the new comment would be attempted and if the permission failure kicked in it wouldn't care.
If the
> affected users cannot make that work then maybe we should just remove the
> comment from the extension.
Removing the comment as a way to deal with our deficiency in this area
strikes me as akin to adding planner hints.
Maybe, but the proposal you are supporting has been around for years, with people generally in favor of it, and hasn't happened yet. At some point I'd rather hold my nose and fix the problem myself than wait for the professional to arrive and do it right. Any, hey, we've had multiple planner hints since 8.4 ;)
David J.
David, * David G. Johnston (david.g.johnston@gmail.com) wrote: > On Tue, May 30, 2017 at 8:41 PM, Stephen Frost <sfrost@snowman.net> wrote: > > * David G. Johnston (david.g.johnston@gmail.com) wrote: > > > On Fri, May 26, 2017 at 7:47 AM, Stephen Frost <sfrost@snowman.net> > > wrote: > > > > * Robins Tharakan (tharakan@gmail.com) wrote: > > > > > Attached is a patch adds a --no-comments argument to pg_dump to skip > > > > > generation of COMMENT statements when generating a backup. This is > > > > crucial > > > > > for non-superusers to restore a database backup in a Single > > Transaction. > > > > > Currently, this requires one to remove COMMENTs via scripts, which is > > > > > inelegant at best. > > > > > > > > Having --no-comments seems generally useful to me, in any case. > > > > > > It smacks of being excessive to me. > > > > > > > I have a hard time having an issue with an option that exists to exclude > > a particular type of object from being in the dump. > > Excessive with respect to the problem at hand. Fair enough. I tend to agree with that sentiment. > A single comment in the > dump is unable to be restored. Because of that we are going to require > people to omit every comment in their database. The loss of all that > information is in excess of what is required to solve the stated problem > which is how I was thinking of excessive. That would be most unfortunate, I agree. > I agree that the general idea of allowing users to choose to include or > exclude comments is worth discussion along the same lines of large objects > and privileges. Good, then we can at least consider this particular feature as being one we're generally interested in and move forward with it, even if we also, perhaps, come up with a better solution to the specific issue at hand. > > > > CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah; > > > > > > A less verbose way to add comments to objects would be nice but we have > > an > > > immediate problem that we either need to solve or document a best > > practice > > > for. > > > > The above would be a solution to the immediate problem in as much as > > adding COMMENT IF NOT EXISTS would be. > > I believe it would take a lot longer, possibly even until 12, to get the > linked comment feature committed compared to committing COMMENT IF NOT > EXISTS or some variation (or putting in a hack for that matter). Perhaps, but I'm not convinced that such speculation really helps move us forward. > > > COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to > > > what seems to me is the underlying problem...that people don't want a > > > non-functional (usually...) aspect preventing successful restoration. > > > > I tend to disagree with this characterization- I'm of the opinion that > > people are, rightly, confused as to why we bother to try and add a > > COMMENT to an object which we didn't actually end up creating (as it > > already existed), and then throw an error on it to boot. > > My characterization does actually describe the current system though. I'm not entirely sure what I was getting at above, to be honest, but I believe we're generally on the same page here. I certainly agree that users don't expect a pg_dump to not be able to be successfully restored. What I may have been getting at is simply that it's not about a lack of COMMENT IF NOT EXISTS, in an ideal world, but perhaps that's what we need to solve this particular issue, for now. > While I won't doubt that people do hold your belief it is an underlying > mis-understanding as to how PostgreSQL works since comments aren't, as you > say below, actual attributes but rather objects in their own right. I > would love to have someone solve the systemic problem here. But the actual > complaint can probably be addressed without it. Right, I tend to follow your line of thought here. > > Were pg_dump a > > bit more intelligent of an application, it would realize that once the > > CREATE ... IF NOT EXISTS returned a notice saying "this thing already > > existed" that it would realize that it shouldn't try to adjust the > > attributes of that object, as it was already existing. > > pg_dump isn't in play during the restore which is where the error occurs. Ah, but pg_dump could potentially dump out more complicated logic than it does today. We currently presume that there is never any need to reason about the results of a prior command before executing the next in pg_dump's output. In some ways, it's rather impressive that we've gotten this far with that assumption, but ensuring that is the case means that our users are also able to rely on that and write simple scripts which can be rerun to reset the database to a specific state. > During restore you have pg_restore+psql - and having cross-statement logic > in those applications is likely a non-starter. It would certainly be a very large shift from what we generate today. :) > That is ultimately the > problem here, and which is indeed fixed by the outstanding proposal of > embedding COMMENT within the CREATE/ALTER object commands. But today, > comments are independent objects and solving the problem within that domain > will probably prove easier than changing how the system treats comments. I agree that adding COMMENT IF NOT EXISTS (or perhaps COMMENT ... TRY) would be simpler than changing the syntax for every command to support a COMMENT attribute being included. The issue here, however, is what I mentioned previously- COMMENTs aren't the only attributes of an object and if we really want to support "CREATE OBJECT + ATTRIBUTES IF NOT EXISTS, otherwise do nothing" then that approach simply doesn't work, without heavily modifying our syntax to allow much greater flexibility than we've ever had before. That's why I was suggesting that we have a mechanism for passing a set of sub-commands to be performed on an object *if* we end up creating it, on the basis of CREATE ... IF NOT EXISTS. > > > COMMENT ON object TRY 'text' -- i.e., replace the word IS with TRY > > > > Perhaps you could elaborate as to how this is different from IF NOT > > EXISTS? > > If the comment on plpgsql were removed for some reason the COMMENT ON IF > NOT EXISTS would fire and then it would fail due to permissions. With > "TRY" whether the comment (or object for that matter) exists or not the new > comment would be attempted and if the permission failure kicked in it > wouldn't care. Ah, I see that distinction. I'm wondering how it might relate to other attributes which an object might have and if we need to have similar options for each (or perhaps we do? I've not looked, but I don't recall anything similar offhand). > > If the > > > affected users cannot make that work then maybe we should just remove the > > > comment from the extension. > > > > Removing the comment as a way to deal with our deficiency in this area > > strikes me as akin to adding planner hints. > > Maybe, but the proposal you are supporting has been around for years, with > people generally in favor of it, and hasn't happened yet. At some point > I'd rather hold my nose and fix the problem myself than wait for the > professional to arrive and do it right. Any, hey, we've had multiple > planner hints since 8.4 ;) That's a fair point, and might allow a quick-fix (I seriously doubt anyone would really miss the comment we have on plpgsql today), but I'm going to continue to push back a bit on this as I'd really like a better solution, even if it's a bit more work. If such a solution doesn't materialize before the last CF for PG 11, then I will support a motion to simply remove the comment. Thanks! Stephen
On Tue, May 30, 2017 at 8:55 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: >> Having --no-comments seems generally useful to me, in any case. > > It smacks of being excessive to me. It sounds perfectly sensible to me. It's not exactly an elegant solution to the original problem, but it's a reasonable switch on its own merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, May 30, 2017 at 8:55 PM, David G. Johnston > <david.g.johnston@gmail.com> wrote: >>> Having --no-comments seems generally useful to me, in any case. >> It smacks of being excessive to me. > It sounds perfectly sensible to me. It's not exactly an elegant > solution to the original problem, but it's a reasonable switch on its > own merits. I dunno. What's the actual use-case, other than as a bad workaround to a problem we should fix a different way? regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, May 30, 2017 at 8:55 PM, David G. Johnston > > <david.g.johnston@gmail.com> wrote: > >>> Having --no-comments seems generally useful to me, in any case. > > >> It smacks of being excessive to me. > > > It sounds perfectly sensible to me. It's not exactly an elegant > > solution to the original problem, but it's a reasonable switch on its > > own merits. > > I dunno. What's the actual use-case, other than as a bad workaround > to a problem we should fix a different way? Perhaps it's a bit of a stretch, I'll admit, but certainly "minmization" and "obfuscation" come to mind, which are often done in other fields and might well apply in very specific cases to PG schemas. I can certainly also see a case being made that you'd like to extract a schema-only dump which doesn't include any comments because the comments have information that you'd rather not share publicly, while the schema itself is fine to share. Again, a bit of a stretch, but not unreasonable. Otherwise, well, for my 2c anyway, feels like it's simply fleshing out the options which correspond to the different components of an object. We provide similar for ACLs, security labels, and tablespace association. If there are other components of an object which we should consider adding an option to exclude, I'm all ears, personally (indexes?). Also, with the changes that I've made to pg_dump, I'm hopeful that such options will end up requiring a very minor amount of code to implement. There's more work to be done in that area too, certainly, but I do feel like it's better than it was. I definitely would like to see more flexibility in this area in general. Thanks! Stephen
On Thu, Jun 1, 2017 at 10:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I dunno. What's the actual use-case, other than as a bad workaround > to a problem we should fix a different way? Well, that's a fair point. I don't have a specific use case in mind. However, I also don't think that options for controlling what gets dumped should be subjected to extreme vetting. On the strength mostly of my own experiences trying to solve database problems in the real world, I generally think that pg_dump could benefit from significantly more options to control what gets dumped. The existing options are pretty good, but it's not that hard to run into a situation that they don't cover. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, passed Spec compliant: not tested Documentation: tested, passed It's a very simple change and I have not to complain about source and documentation changes. But I wonder the lack of tap tests of this new pg_dump behavior. Shouldn't we add tap tests? The new status of this patch is: Waiting on Author
On Thu, Jun 01, 2017 at 10:05:09PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, May 30, 2017 at 8:55 PM, David G. Johnston > > <david.g.johnston@gmail.com> wrote: > >>> Having --no-comments seems generally useful to me, in any case. > > >> It smacks of being excessive to me. > > > It sounds perfectly sensible to me. It's not exactly an elegant > > solution to the original problem, but it's a reasonable switch on > > its own merits. > > I dunno. What's the actual use-case, other than as a bad workaround > to a problem we should fix a different way? The one I run into frequently is in a proprietary fork, RDS Postgres. It'll happily dump out COMMENT ON EXTENSION plpgsq IS ... which is great as far as it goes, but errors out when you try to reload it. While bending over backwards to support proprietary forks strikes me as a terrible idea, I'd like to enable pg_dump to produce and consume ToCs just as pg_restore does with its -l/-L options. This would provide the finest possible grain. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Jul 18, 2017 at 3:45 AM, David Fetter <david@fetter.org> wrote: > The one I run into frequently is in a proprietary fork, RDS Postgres. > It'll happily dump out COMMENT ON EXTENSION plpgsq IS ... > which is great as far as it goes, but errors out when you try to > reload it. > > While bending over backwards to support proprietary forks strikes me > as a terrible idea, I'd like to enable pg_dump to produce and consume > ToCs just as pg_restore does with its -l/-L options. This would > provide the finest possible grain. Let's add as well a similar switch to pg_dumpall that gets pushed down to all the created pg_dump commands. If this patch gets integrated as-is this is going to be asked. And tests would be welcome. -- Michael
On Tue, Jul 18, 2017 at 08:38:25AM +0200, Michael Paquier wrote: > On Tue, Jul 18, 2017 at 3:45 AM, David Fetter <david@fetter.org> wrote: > > The one I run into frequently is in a proprietary fork, RDS Postgres. > > It'll happily dump out COMMENT ON EXTENSION plpgsq IS ... > > which is great as far as it goes, but errors out when you try to > > reload it. > > > > While bending over backwards to support proprietary forks strikes me > > as a terrible idea, I'd like to enable pg_dump to produce and consume > > ToCs just as pg_restore does with its -l/-L options. This would > > provide the finest possible grain. > > Let's add as well a similar switch to pg_dumpall that gets pushed down > to all the created pg_dump commands. If this patch gets integrated > as-is this is going to be asked. And tests would be welcome. Excellent point about pg_dumpall. I'll see what I can draft up in the next day or two and report back. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 18 July 2017 at 23:55, David Fetter <david@fetter.org> wrote:
Excellent point about pg_dumpall. I'll see what I can draft up in the
next day or two and report back.
Hi David,
You may want to consider this patch (attached) which additionally has the pg_dumpall changes.
It would be great if you could help with the tests though, am unsure how to go about them.
-
robins
Attachment
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
From
Fabrízio de Royes Mello
Date:
On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakan <tharakan@gmail.com> wrote:
>
>
> On 18 July 2017 at 23:55, David Fetter <david@fetter.org> wrote:
>>
>> Excellent point about pg_dumpall. I'll see what I can draft up in the
>> next day or two and report back.
>
>
>
> Hi David,
>
> You may want to consider this patch (attached) which additionally has the pg_dumpall changes.
> It would be great if you could help with the tests though, am unsure how to go about them.
>
You should add the properly sgml docs for this pg_dumpall change also.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On Wed, Jul 19, 2017 at 8:59 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote: > On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakan <tharakan@gmail.com> wrote: >> You may want to consider this patch (attached) which additionally has the >> pg_dumpall changes. >> It would be great if you could help with the tests though, am unsure how >> to go about them. > > You should add the properly sgml docs for this pg_dumpall change also. Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in extensions are in src/test/modules/test_pg_dump, but you just care about the former with this patch. And if you implement some new tests, look at the other tests and base your work on that. -- Michael
On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Jul 19, 2017 at 8:59 PM,Fabrízio de Royes Mello
> You should add the properly sgml docs for this pg_dumpall change also.
Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
extensions are in src/test/modules/test_pg_dump, but you just care
about the former with this patch. And if you implement some new tests,
look at the other tests and base your work on that.
Thanks Michael /
Fabrízio.Updated patch (attached) additionally adds SGML changes for pg_dumpall.
(I'll try to work on the tests, but sending this
nonetheless).
-
robins
Attachment
On 20 July 2017 at 05:14, Robins Tharakan <tharakan@gmail.com> wrote:
On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com> wrote:On Wed, Jul 19, 2017 at 8:59 PM,Fabrízio de Royes Mello
> You should add the properly sgml docs for this pg_dumpall change also.
Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
extensions are in src/test/modules/test_pg_dump, but you just care
about the former with this patch. And if you implement some new tests,
look at the other tests and base your work on that.Thanks Michael /Fabrízio.Updated patch (attached) additionally adds SGML changes for pg_dumpall.(I'll try to work on the tests, but sending thisnonetheless).
Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall.
(Have zipped it since patch size jumped to ~40kb).
-
robins
Attachment
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
From
Fabrízio de Royes Mello
Date:
On Mon, Aug 7, 2017 at 10:43 AM, Robins Tharakan <tharakan@gmail.com> wrote:
>
> On 20 July 2017 at 05:14, Robins Tharakan <tharakan@gmail.com> wrote:
>>
>> On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com> wrote:
>>>
>>> On Wed, Jul 19, 2017 at 8:59 PM,
>>> Fabrízio de Royes Mello
>>> > You should add the properly sgml docs for this pg_dumpall change also.
>>>
>>> Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
>>> extensions are in src/test/modules/test_pg_dump, but you just care
>>> about the former with this patch. And if you implement some new tests,
>>> look at the other tests and base your work on that.
>>
>>
>> Thanks Michael /
>> Fabrízio.
>>
>> Updated patch (attached) additionally adds SGML changes for pg_dumpall.
>> (I'll try to work on the tests, but sending this
>> nonetheless
>> ).
>>
>
> Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall.
> (Have zipped it since patch size jumped to ~40kb).
>
The patch applies cleanly to current master and all tests run without failures.
I also test against all current supported versions (9.2 ... 9.6) and didn't find any issue.
Changed status to "ready for commiter".
Regards,
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On 7 August 2017 at 16:14, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote: > > On Mon, Aug 7, 2017 at 10:43 AM, Robins Tharakan <tharakan@gmail.com> wrote: >> >> On 20 July 2017 at 05:14, Robins Tharakan <tharakan@gmail.com> wrote: >>> >>> On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com> >>> wrote: >>>> >>>> On Wed, Jul 19, 2017 at 8:59 PM, >>>> Fabrízio de Royes Mello >>>> > You should add the properly sgml docs for this pg_dumpall change also. >>>> >>>> Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in >>>> extensions are in src/test/modules/test_pg_dump, but you just care >>>> about the former with this patch. And if you implement some new tests, >>>> look at the other tests and base your work on that. >>> >>> >>> Thanks Michael / >>> Fabrízio. >>> >>> Updated patch (attached) additionally adds SGML changes for pg_dumpall. >>> (I'll try to work on the tests, but sending this >>> nonetheless >>> ). >>> >> >> Attached is an updated patch (v4) with basic tests for pg_dump / >> pg_dumpall. >> (Have zipped it since patch size jumped to ~40kb). >> > > The patch applies cleanly to current master and all tests run without > failures. > > I also test against all current supported versions (9.2 ... 9.6) and didn't > find any issue. > > Changed status to "ready for commiter". I get the problem, but not this solution. It's too specific and of zero other value, yet not even exactly specific to the issue. We definitely don't want --no-extension-comments, but --no-comments removes ALL comments just to solve a weird problem. (Meta)Data loss, surely? Thinking ahead, are we going to add a new --no-objecttype switch every time someone wants it? It would make more sense to add something more general and extensible such as --exclude-objects=comment or similar name -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I get the problem, but not this solution. It's too specific and of
> The patch applies cleanly to current master and all tests run without
> failures.
>
> I also test against all current supported versions (9.2 ... 9.6) and didn't
> find any issue.
>
> Changed status to "ready for commiter".
zero other value, yet not even exactly specific to the issue. We
definitely don't want --no-extension-comments, but --no-comments
removes ALL comments just to solve a weird problem. (Meta)Data loss,
surely?
It was argued, successfully IMO, to have this capability independent of any particular problem to be solved. In that context I haven't given the proposed implementation much thought.
I do agree that this is a very unappealing solution for the stated problem and am hoping someone will either have an ingenious idea to solve it the right way or is willing to hold their nose and put in a hack.
David J.
On Mon, Aug 21, 2017 at 5:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Thinking ahead, are we going to add a new --no-objecttype switch every > time someone wants it? I'd personally be fine with --no-whatever for any whatever that might be a subsidiary property of database objects. We've got --no-security-labels, --no-tablespaces, --no-owner, and --no-privileges already, so what's wrong with --no-comments? (We've also got --no-publications; I think it's arguable whether that is the same kind of thing.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Sep 2, 2017 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Aug 21, 2017 at 5:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Thinking ahead, are we going to add a new --no-objecttype switch every >> time someone wants it? > > I'd personally be fine with --no-whatever for any whatever that might > be a subsidiary property of database objects. We've got > --no-security-labels, --no-tablespaces, --no-owner, and > --no-privileges already, so what's wrong with --no-comments? > > (We've also got --no-publications; I think it's arguable whether that > is the same kind of thing.) And --no-subscriptions in the same bucket. -- Michael
On 1 September 2017 at 22:08, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sat, Sep 2, 2017 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Aug 21, 2017 at 5:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Thinking ahead, are we going to add a new --no-objecttype switch every >>> time someone wants it? >> >> I'd personally be fine with --no-whatever for any whatever that might >> be a subsidiary property of database objects. We've got >> --no-security-labels, --no-tablespaces, --no-owner, and >> --no-privileges already, so what's wrong with --no-comments? >> >> (We've also got --no-publications; I think it's arguable whether that >> is the same kind of thing.) > > And --no-subscriptions in the same bucket. Yes, it is. I was suggesting that we remove those as well. But back to the main point which is that --no-comments discards ALL comments simply to exclude one pointless and annoying comment. That runs counter to our stance that we do not allow silent data loss. I want to solve the problem too. I accept that not everyone uses comments, but if they do, spilling them all on the floor is a user visible slip up that we should not be encouraging. Sorry y'all. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> I'd personally be fine with --no-whatever for any whatever that might >>> be a subsidiary property of database objects. We've got >>> --no-security-labels, --no-tablespaces, --no-owner, and >>> --no-privileges already, so what's wrong with --no-comments? >>> >>> (We've also got --no-publications; I think it's arguable whether that >>> is the same kind of thing.) >> >> And --no-subscriptions in the same bucket. > > Yes, it is. I was suggesting that we remove those as well. That seems like a non-starter to me. I have used those options many times to solve real problems, and I'm sure other people have as well. We wouldn't have ended up with all of these options if users didn't want to control such things. > But back to the main point which is that --no-comments discards ALL > comments simply to exclude one pointless and annoying comment. That > runs counter to our stance that we do not allow silent data loss. I > want to solve the problem too. I accept that not everyone uses > comments, but if they do, spilling them all on the floor is a user > visible slip up that we should not be encouraging. Sorry y'all. /me shrugs. I agree that there could be better solutions to the original problem, but I think there's no real argument that a user can't exclude comments from a backup if they wish to do so. As the OP already pointed out, it can still be done without the switch; it's just more annoying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 7, 2017 at 1:43 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> I'd personally be fine with --no-whatever for any whatever that might >>>> be a subsidiary property of database objects. We've got >>>> --no-security-labels, --no-tablespaces, --no-owner, and >>>> --no-privileges already, so what's wrong with --no-comments? >>>> >>>> (We've also got --no-publications; I think it's arguable whether that >>>> is the same kind of thing.) >>> >>> And --no-subscriptions in the same bucket. >> >> Yes, it is. I was suggesting that we remove those as well. FWIW, I do too. They are useful for given application code paths. > That seems like a non-starter to me. I have used those options many > times to solve real problems, and I'm sure other people have as well. > We wouldn't have ended up with all of these options if users didn't > want to control such things. As there begins to be many switches of this kind and much code duplication, I think that some refactoring into a more generic switch infrastructure would be nicer. -- Michael
Michael, * Michael Paquier (michael.paquier@gmail.com) wrote: > As there begins to be many switches of this kind and much code > duplication, I think that some refactoring into a more generic switch > infrastructure would be nicer. I have been thinking about this also and agree that it would be nicer, but then these options would just become "shorthand" for the generic switch. Thanks! Stephen
On Sun, Sep 10, 2017 at 6:25 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Michael Paquier (michael.paquier@gmail.com) wrote: >> As there begins to be many switches of this kind and much code >> duplication, I think that some refactoring into a more generic switch >> infrastructure would be nicer. > > I have been thinking about this also and agree that it would be nicer, > but then these options would just become "shorthand" for the generic > switch. I don't really like the "generic switch infrastructure" concept. I think it will make specifying behaviors harder without any corresponding benefit. There are quite a few --no-xxx switches now but the total number of them that we could conceivably end up with doesn't seem to be a lot bigger than what we have already. Now, if we want switches to exclude a certain kind of object (e.g. table, function, text search configuration) from the backup altogether, that should be done in some generic way, like --exclude-object-type=table. But that's not what this is about. This is about excluding a certain kind of property (comments, in this case) from being backed up. And an individual kind of object doesn't have many more properties than what we already handle. So I think this is just an excuse for turning --no-security-labels into --no-object-property=security-label. To me, that's just plain worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 11:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: > So I think this is just an excuse for turning --no-security-labels > into --no-object-property=security-label. To me, that's just plain > worse. It does not seem that my thoughts here have been correctly transmitted to your brain. I do not mean to change the user-facing options, just to refactor the code internally so as --no-foo switches can be more easily generated, added and handled as they are associated with an object type. A portion of the complains is caused by the fact that a lot of similar code is duplicated. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 6:41 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Sep 11, 2017 at 11:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> So I think this is just an excuse for turning --no-security-labels >> into --no-object-property=security-label. To me, that's just plain >> worse. > > It does not seem that my thoughts here have been correctly transmitted > to your brain. Dang, we really should have put more work into that inter-brain link. PostgreSQL group mind FTW! > I do not mean to change the user-facing options, just > to refactor the code internally so as --no-foo switches can be more > easily generated, added and handled as they are associated with an > object type. A portion of the complains is caused by the fact that a > lot of similar code is duplicated. Ah, well. No objection to refactoring away duplicate code, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 7, 2017 at 11:14 AM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote: > I also test against all current supported versions (9.2 ... 9.6) and didn't > find any issue. > > Changed status to "ready for commiter". On a very fast read this patch looks OK to me, but I'm a bit concerned about whether we have consensus for it. By my count the vote is 6-3 in favor of proceeding. +1: Robins Tharakan, Stephen Frost, David Fetter, Fabrizio Mello, Michael Paquier, Robert Haas -1: David G. Johnston, Tom Lane, Simon Riggs I guess that's probably sufficient to go forward, but does anyone wish to dispute my characterization of their vote or the vote tally in general? Would anyone else like to cast a vote? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, all, * Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, Aug 7, 2017 at 11:14 AM, Fabrízio de Royes Mello > <fabriziomello@gmail.com> wrote: > > I also test against all current supported versions (9.2 ... 9.6) and didn't > > find any issue. > > > > Changed status to "ready for commiter". > > On a very fast read this patch looks OK to me, but I'm a bit concerned > about whether we have consensus for it. By my count the vote is 6-3 > in favor of proceeding. > > +1: Robins Tharakan, Stephen Frost, David Fetter, Fabrizio Mello, > Michael Paquier, Robert Haas > -1: David G. Johnston, Tom Lane, Simon Riggs > > I guess that's probably sufficient to go forward, but does anyone wish > to dispute my characterization of their vote or the vote tally in > general? Would anyone else like to cast a vote? This patch continues to drag on, so I'll try to spur it a bit. David G. Jonston, you are listed as a '-1' on this but your last comment on this thread was: CAKFQuwZuYhh+KUzp76cdB4r8Z616ZwEWcHEEDq6vEqgdGP9QZw@mail.gmail.com Where you commented: > It was argued, successfully IMO, to have this capability independent of > any particular problem to be solved. In that context I haven't given the > proposed implementation much thought. > I do agree that this is a very unappealing solution for the stated problem > and am hoping someone will either have an ingenious idea to solve it the > right way or is willing to hold their nose and put in a hack. Which didn't strike me as really being against this patch but rather a complaint that making use of this feature to "fix" the non-superuser Single Transaction restore was a hack, which is somewhat indepedent. We have lots of features that people use in various very hacky ways, but that doesn't mean those are bad features for us to have. Simon, you are also a '-1' on this but your last comment also makes me think you're unhappy with this feature being used in a hacky way but perhaps aren't against the feature itself: > But back to the main point which is that --no-comments discards ALL > comments simply to exclude one pointless and annoying comment. That > runs counter to our stance that we do not allow silent data loss. I > want to solve the problem too. I accept that not everyone uses > comments, but if they do, spilling them all on the floor is a user > visible slip up that we should not be encouraging. Sorry y'all. I really can't see accepting this as a data loss issue when this feature is used only when requested by a user (it would *not* be enabled by default; I'd be against that too). We have other options would also be considered data loss with this viewpoint but I can't imagine we would be willing to remove them (eg: --no-privileges, or perhaps more clearly --no-blobs, which was only recently added). Tom, you're also listed as a '-1' on this but your last comment on this thread was 16858.1496369109@sss.pgh.pa.us where you asked: > I dunno. What's the actual use-case, other than as a bad workaround > to a problem we should fix a different way? which was answered by both Robert and I; Robert's reply being mainly that it's likely to prove useful in the field even if we don't have a specific use-case today, while I outlined at least feasible use-cases (minimization, obfuscation). For my 2c, I don't know that a large use-case is needed for this particular feature to begin with. In the end, I feel like this patch has actually been through the ringer and back because it was brought up in the context of solving a problem that we'd all like to fix in a better way. Had it been simply dropped on us as a "I'd like to not have comments in my pg_dump output and here's a patch to allow me to do that" I suspect it would have been committed without anywhere near this level of effort. I'll just reiterate Robert's ask for people to clarify their positions. If we're not moving forward, then the next step is to mark the patch as Rejected. Thanks! Stephen
Attachment
On Monday, January 22, 2018, Stephen Frost <sfrost@snowman.net> wrote:
In the end, I feel like this patch has actually been through the ringer
and back because it was brought up in the context of solving a problem
that we'd all like to fix in a better way. Had it been simply dropped
on us as a "I'd like to not have comments in my pg_dump output and
here's a patch to allow me to do that" I suspect it would have been
committed without anywhere near this level of effort.
+0; but recognizing our users are going to remain annoyed by the existing problem and that telling them that this is the answer will not sit well.
David J.
> On Monday, January 22, 2018, Stephen Frost <sfrost@snowman.net> wrote: >> In the end, I feel like this patch has actually been through the ringer >> and back because it was brought up in the context of solving a problem >> that we'd all like to fix in a better way. Had it been simply dropped >> on us as a "I'd like to not have comments in my pg_dump output and >> here's a patch to allow me to do that" I suspect it would have been >> committed without anywhere near this level of effort. Indeed ... but it was *not* presented that way, and that's what makes this somewhat of a difficult call. You and Robert argued that there were valid use-cases, but I feel like that was somewhat of an after-the-fact justification, rather than an actual organic feature request. "David G. Johnston" <david.g.johnston@gmail.com> writes: > +0; but recognizing our users are going to remain annoyed by the existing > problem and that telling them that this is the answer will not sit well. FWIW, today's pg_dump refactoring got rid of one of the use-cases for this, namely the COMMENT ON DATABASE aspect. We got rid of another aspect (creating/commenting on the public schema) some time ago, via a method that was admittedly a bit of a hack but got the job done. What seems to be left is that given a default installation, pg_dump will emit a "COMMENT ON EXTENSION plpgsql" that can't be restored by an unprivileged user ... as well as a "CREATE EXTENSION IF NOT EXISTS plpgsql", which is an utter kluge. Maybe we can think of a rule that will exclude plpgsql from dumps without causing too much havoc. The most obvious hack is just to exclude PL objects with OIDs below 16384 from the dump. An issue with that is that it'd lose any nondefault changes to the ACL for plpgsql, which seems like a problem. On the other hand, I think the rule we have in place for the public schema is preventing dumping local adjustments to that, and there have been no complaints. (Maybe I'm wrong and the initacl mechanism handles that case? If so, seems like we could get it to handle local changes to plpgsql's ACL as well.) regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > On Monday, January 22, 2018, Stephen Frost <sfrost@snowman.net> wrote: > >> In the end, I feel like this patch has actually been through the ringer > >> and back because it was brought up in the context of solving a problem > >> that we'd all like to fix in a better way. Had it been simply dropped > >> on us as a "I'd like to not have comments in my pg_dump output and > >> here's a patch to allow me to do that" I suspect it would have been > >> committed without anywhere near this level of effort. > > Indeed ... but it was *not* presented that way, and that's what makes > this somewhat of a difficult call. You and Robert argued that there were > valid use-cases, but I feel like that was somewhat of an after-the-fact > justification, rather than an actual organic feature request. This was definitely the kind of follow-on work that I was anticipating happening with the rework that was done to dump_contains in a9f0e8e5a2e. I would think that we'd support enable/disable of all of the different component types that pg_dump recognizes, and we're most of the way there at this point. The comment above the #define's in pg_dump.h even hints at that- component types of an object which can be selected for dumping. > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > +0; but recognizing our users are going to remain annoyed by the existing > > problem and that telling them that this is the answer will not sit well. > > FWIW, today's pg_dump refactoring got rid of one of the use-cases for > this, namely the COMMENT ON DATABASE aspect. We got rid of another aspect > (creating/commenting on the public schema) some time ago, via a method > that was admittedly a bit of a hack but got the job done. What seems to > be left is that given a default installation, pg_dump will emit a > "COMMENT ON EXTENSION plpgsql" that can't be restored by an unprivileged > user ... as well as a "CREATE EXTENSION IF NOT EXISTS plpgsql", which is > an utter kluge. Maybe we can think of a rule that will exclude plpgsql > from dumps without causing too much havoc. Having a generic --exclude-extension=plpgsql might be interesting.. I can certainly recall wishing for such an option when working with postgis. > The most obvious hack is just to exclude PL objects with OIDs below 16384 > from the dump. An issue with that is that it'd lose any nondefault > changes to the ACL for plpgsql, which seems like a problem. On the > other hand, I think the rule we have in place for the public schema is > preventing dumping local adjustments to that, and there have been no > complaints. (Maybe I'm wrong and the initacl mechanism handles that > case? If so, seems like we could get it to handle local changes to > plpgsql's ACL as well.) Both the public schema and plpgsql's ACLs should be handled properly (with local changes preserved) through the pg_init_privs system. I'm not sure what you're referring to regarding a rule preventing dumping local adjustments to the public schema, as far as I can recall we've basically always supported that. Losing non-default ACLs for plpgsql seems like a step backwards. Frankly, the comment on plpgsql is probably one of the most worthless comments we have anyway- all it does is re-state information you already know: it's a procedural language called 'plpgsql'. I'd suggest we could probably survive without it, though that is a long route to "fixing" this, though at least we could tell people it's been fixed in newer versions and there's a kludge available until then.. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> The most obvious hack is just to exclude PL objects with OIDs below 16384 >> from the dump. An issue with that is that it'd lose any nondefault >> changes to the ACL for plpgsql, which seems like a problem. On the >> other hand, I think the rule we have in place for the public schema is >> preventing dumping local adjustments to that, and there have been no >> complaints. (Maybe I'm wrong and the initacl mechanism handles that >> case? If so, seems like we could get it to handle local changes to >> plpgsql's ACL as well.) > Both the public schema and plpgsql's ACLs should be handled properly > (with local changes preserved) through the pg_init_privs system. I'm > not sure what you're referring to regarding a rule preventing dumping > local adjustments to the public schema, as far as I can recall we've > basically always supported that. I went looking and realized that actually what we're interested in here is the plpgsql extension, not the plpgsql language ... and in fact the behavior I was thinking of is already there, except for some reason it's only applied during binary upgrade. So I think we should give serious consideration to the attached, which just removes the binary_upgrade condition (and adjusts nearby comments). With this, I get empty dumps for the default state of template1 and postgres, which is what one really would expect. And testing shows that if you modify the ACL of language plpgsql, that does still come through as expected. (Note: this breaks the parts of the pg_dump regression tests that expect to see "CREATE LANGUAGE plpgsql". I've not bothered to update those, pending the decision on whether to do this.) regards, tom lane diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d65ea54..87b15a1 100644 *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *************** selectDumpableAccessMethod(AccessMethodI *** 1617,1637 **** * selectDumpableExtension: policy-setting subroutine * Mark an extension as to be dumped or not * ! * Normally, we dump all extensions, or none of them if include_everything ! * is false (i.e., a --schema or --table switch was given). However, in ! * binary-upgrade mode it's necessary to skip built-in extensions, since we * assume those will already be installed in the target database. We identify * such extensions by their having OIDs in the range reserved for initdb. */ static void selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt) { /* ! * Use DUMP_COMPONENT_ACL for from-initdb extensions, to allow users to ! * change permissions on those objects, if they wish to, and have those ! * changes preserved. */ ! if (dopt->binary_upgrade && extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid) extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL; else extinfo->dobj.dump = extinfo->dobj.dump_contains = --- 1617,1637 ---- * selectDumpableExtension: policy-setting subroutine * Mark an extension as to be dumped or not * ! * Built-in extensions should be skipped except for checking ACLs, since we * assume those will already be installed in the target database. We identify * such extensions by their having OIDs in the range reserved for initdb. + * We dump all user-added extensions by default, or none of them if + * include_everything is false (i.e., a --schema or --table switch was given). */ static void selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt) { /* ! * Use DUMP_COMPONENT_ACL for built-in extensions, to allow users to ! * change permissions on their member objects, if they wish to, and have ! * those changes preserved. */ ! if (extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid) extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL; else extinfo->dobj.dump = extinfo->dobj.dump_contains =
I wrote: > I went looking and realized that actually what we're interested in here > is the plpgsql extension, not the plpgsql language ... and in fact the > behavior I was thinking of is already there, except for some reason it's > only applied during binary upgrade. So I think we should give serious > consideration to the attached, which just removes the binary_upgrade > condition (and adjusts nearby comments). In further testing of that, I noticed that it made the behavior of our other bugaboo, the public schema, rather inconsistent. With this builtin-extensions hack, the plpgsql extension doesn't get dumped, whether or not you say "clean". But the public schema does get dropped and recreated if you say "clean". That's not helpful for non-superuser users of pg_dump, so I think we should try to fix it. Hence, the attached updated patch also makes selection of the public schema use the DUMP_COMPONENT infrastructure rather than hardwired code. I note btw that the removed bit in getNamespaces() is flat out wrong independently of these other considerations, because it makes the SQL put into an archive file vary depending on whether --clean was specified at pg_dump time. That's not supposed to happen. I've also updated the regression test this time, as I think this is committable. regards, tom lane diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index f55aa36..fb03793 100644 *** a/src/bin/pg_dump/pg_backup_archiver.c --- b/src/bin/pg_dump/pg_backup_archiver.c *************** _printTocEntry(ArchiveHandle *AH, TocEnt *** 3453,3481 **** { RestoreOptions *ropt = AH->public.ropt; - /* - * Avoid dumping the public schema, as it will already be created ... - * unless we are using --clean mode (and *not* --create mode), in which - * case we've previously issued a DROP for it so we'd better recreate it. - * - * Likewise for its comment, if any. (We could try issuing the COMMENT - * command anyway; but it'd fail if the restore is done as non-super-user, - * so let's not.) - * - * XXX it looks pretty ugly to hard-wire the public schema like this, but - * it sits in a sort of no-mans-land between being a system object and a - * user object, so it really is special in a way. - */ - if (!(ropt->dropSchema && !ropt->createDB)) - { - if (strcmp(te->desc, "SCHEMA") == 0 && - strcmp(te->tag, "public") == 0) - return; - if (strcmp(te->desc, "COMMENT") == 0 && - strcmp(te->tag, "SCHEMA public") == 0) - return; - } - /* Select owner, schema, and tablespace as necessary */ _becomeOwner(AH, te); _selectOutputSchema(AH, te->namespace); --- 3453,3458 ---- diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d65ea54..50399c9 100644 *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *************** selectDumpableNamespace(NamespaceInfo *n *** 1407,1412 **** --- 1407,1425 ---- /* Other system schemas don't get dumped */ nsinfo->dobj.dump_contains = nsinfo->dobj.dump = DUMP_COMPONENT_NONE; } + else if (strcmp(nsinfo->dobj.name, "public") == 0) + { + /* + * The public schema is a strange beast that sits in a sort of + * no-mans-land between being a system object and a user object. We + * don't want to dump creation or comment commands for it, because + * that complicates matters for non-superuser use of pg_dump. But we + * should dump any ACL changes that have occurred for it, and of + * course we should dump contained objects. + */ + nsinfo->dobj.dump = DUMP_COMPONENT_ACL; + nsinfo->dobj.dump_contains = DUMP_COMPONENT_ALL; + } else nsinfo->dobj.dump_contains = nsinfo->dobj.dump = DUMP_COMPONENT_ALL; *************** selectDumpableAccessMethod(AccessMethodI *** 1617,1637 **** * selectDumpableExtension: policy-setting subroutine * Mark an extension as to be dumped or not * ! * Normally, we dump all extensions, or none of them if include_everything ! * is false (i.e., a --schema or --table switch was given). However, in ! * binary-upgrade mode it's necessary to skip built-in extensions, since we * assume those will already be installed in the target database. We identify * such extensions by their having OIDs in the range reserved for initdb. */ static void selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt) { /* ! * Use DUMP_COMPONENT_ACL for from-initdb extensions, to allow users to ! * change permissions on those objects, if they wish to, and have those ! * changes preserved. */ ! if (dopt->binary_upgrade && extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid) extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL; else extinfo->dobj.dump = extinfo->dobj.dump_contains = --- 1630,1650 ---- * selectDumpableExtension: policy-setting subroutine * Mark an extension as to be dumped or not * ! * Built-in extensions should be skipped except for checking ACLs, since we * assume those will already be installed in the target database. We identify * such extensions by their having OIDs in the range reserved for initdb. + * We dump all user-added extensions by default, or none of them if + * include_everything is false (i.e., a --schema or --table switch was given). */ static void selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt) { /* ! * Use DUMP_COMPONENT_ACL for built-in extensions, to allow users to ! * change permissions on their member objects, if they wish to, and have ! * those changes preserved. */ ! if (extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid) extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL; else extinfo->dobj.dump = extinfo->dobj.dump_contains = *************** getNamespaces(Archive *fout, int *numNam *** 4435,4463 **** init_acl_subquery->data, init_racl_subquery->data); - /* - * When we are doing a 'clean' run, we will be dropping and recreating - * the 'public' schema (the only object which has that kind of - * treatment in the backend and which has an entry in pg_init_privs) - * and therefore we should not consider any initial privileges in - * pg_init_privs in that case. - * - * See pg_backup_archiver.c:_printTocEntry() for the details on why - * the public schema is special in this regard. - * - * Note that if the public schema is dropped and re-created, this is - * essentially a no-op because the new public schema won't have an - * entry in pg_init_privs anyway, as the entry will be removed when - * the public schema is dropped. - * - * Further, we have to handle the case where the public schema does - * not exist at all. - */ - if (dopt->outputClean) - appendPQExpBuffer(query, " AND pip.objoid <> " - "coalesce((select oid from pg_namespace " - "where nspname = 'public'),0)"); - appendPQExpBuffer(query, ") "); destroyPQExpBuffer(acl_subquery); --- 4448,4453 ---- diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl index 74730bf..3e9b4d9 100644 *** a/src/bin/pg_dump/t/002_pg_dump.pl --- b/src/bin/pg_dump/t/002_pg_dump.pl *************** qr/^ALTER (?!EVENT TRIGGER|LARGE OBJECT| *** 1548,1577 **** all_runs => 1, catch_all => 'COMMENT commands', regexp => qr/^COMMENT ON EXTENSION plpgsql IS .*;/m, ! like => { clean => 1, clean_if_exists => 1, createdb => 1, defaults => 1, exclude_dump_test_schema => 1, exclude_test_table => 1, exclude_test_table_data => 1, no_blobs => 1, - no_privs => 1, no_owner => 1, pg_dumpall_dbprivs => 1, schema_only => 1, section_pre_data => 1, ! with_oids => 1, }, ! unlike => { ! binary_upgrade => 1, ! column_inserts => 1, ! data_only => 1, ! only_dump_test_schema => 1, ! only_dump_test_table => 1, ! role => 1, ! section_post_data => 1, ! test_schema_plus_blobs => 1, }, }, 'COMMENT ON TABLE dump_test.test_table' => { all_runs => 1, --- 1548,1578 ---- all_runs => 1, catch_all => 'COMMENT commands', regexp => qr/^COMMENT ON EXTENSION plpgsql IS .*;/m, ! # this shouldn't ever get emitted anymore ! like => {}, ! unlike => { ! binary_upgrade => 1, clean => 1, clean_if_exists => 1, + column_inserts => 1, createdb => 1, + data_only => 1, defaults => 1, exclude_dump_test_schema => 1, exclude_test_table => 1, exclude_test_table_data => 1, no_blobs => 1, no_owner => 1, + no_privs => 1, + only_dump_test_schema => 1, + only_dump_test_table => 1, pg_dumpall_dbprivs => 1, + role => 1, schema_only => 1, + section_post_data => 1, section_pre_data => 1, ! test_schema_plus_blobs => 1, ! with_oids => 1, }, }, 'COMMENT ON TABLE dump_test.test_table' => { all_runs => 1, *************** qr/CREATE CAST \(timestamp with time zon *** 2751,2783 **** regexp => qr/^ \QCREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;\E /xm, ! like => { clean => 1, clean_if_exists => 1, createdb => 1, defaults => 1, exclude_dump_test_schema => 1, exclude_test_table => 1, exclude_test_table_data => 1, no_blobs => 1, - no_privs => 1, no_owner => 1, ! pg_dumpall_dbprivs => 1, ! schema_only => 1, ! section_pre_data => 1, ! with_oids => 1, }, ! unlike => { ! binary_upgrade => 1, ! column_inserts => 1, ! data_only => 1, only_dump_test_schema => 1, only_dump_test_table => 1, pg_dumpall_globals => 1, pg_dumpall_globals_clean => 1, role => 1, section_data => 1, section_post_data => 1, ! test_schema_plus_blobs => 1, }, }, 'CREATE AGGREGATE dump_test.newavg' => { all_runs => 1, --- 2752,2785 ---- regexp => qr/^ \QCREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;\E /xm, ! # this shouldn't ever get emitted anymore ! like => {}, ! unlike => { ! binary_upgrade => 1, clean => 1, clean_if_exists => 1, + column_inserts => 1, createdb => 1, + data_only => 1, defaults => 1, exclude_dump_test_schema => 1, exclude_test_table => 1, exclude_test_table_data => 1, no_blobs => 1, no_owner => 1, ! no_privs => 1, only_dump_test_schema => 1, only_dump_test_table => 1, + pg_dumpall_dbprivs => 1, pg_dumpall_globals => 1, pg_dumpall_globals_clean => 1, role => 1, + schema_only => 1, section_data => 1, section_post_data => 1, ! section_pre_data => 1, ! test_schema_plus_blobs => 1, ! with_oids => 1, }, }, 'CREATE AGGREGATE dump_test.newavg' => { all_runs => 1, *************** qr/CREATE TRANSFORM FOR integer LANGUAGE *** 4565,4575 **** all_runs => 1, catch_all => 'CREATE ... commands', regexp => qr/^CREATE SCHEMA public;/m, ! like => { ! clean => 1, ! clean_if_exists => 1, }, unlike => { binary_upgrade => 1, createdb => 1, defaults => 1, exclude_test_table => 1, --- 4567,4578 ---- all_runs => 1, catch_all => 'CREATE ... commands', regexp => qr/^CREATE SCHEMA public;/m, ! # this shouldn't ever get emitted anymore ! like => {}, unlike => { binary_upgrade => 1, + clean => 1, + clean_if_exists => 1, createdb => 1, defaults => 1, exclude_test_table => 1, *************** qr/CREATE TRANSFORM FOR integer LANGUAGE *** 5395,5402 **** all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP SCHEMA public;/m, ! like => { clean => 1 }, unlike => { clean_if_exists => 1, pg_dumpall_globals_clean => 1, }, }, --- 5398,5407 ---- all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP SCHEMA public;/m, ! # this shouldn't ever get emitted anymore ! like => {}, unlike => { + clean => 1, clean_if_exists => 1, pg_dumpall_globals_clean => 1, }, }, *************** qr/CREATE TRANSFORM FOR integer LANGUAGE *** 5404,5420 **** all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP SCHEMA IF EXISTS public;/m, ! like => { clean_if_exists => 1 }, unlike => { clean => 1, pg_dumpall_globals_clean => 1, }, }, 'DROP EXTENSION plpgsql' => { all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP EXTENSION plpgsql;/m, ! like => { clean => 1, }, unlike => { clean_if_exists => 1, pg_dumpall_globals_clean => 1, }, }, --- 5409,5429 ---- all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP SCHEMA IF EXISTS public;/m, ! # this shouldn't ever get emitted anymore ! like => {}, unlike => { clean => 1, + clean_if_exists => 1, pg_dumpall_globals_clean => 1, }, }, 'DROP EXTENSION plpgsql' => { all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP EXTENSION plpgsql;/m, ! # this shouldn't ever get emitted anymore ! like => {}, unlike => { + clean => 1, clean_if_exists => 1, pg_dumpall_globals_clean => 1, }, }, *************** qr/CREATE TRANSFORM FOR integer LANGUAGE *** 5494,5502 **** all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP EXTENSION IF EXISTS plpgsql;/m, ! like => { clean_if_exists => 1, }, unlike => { clean => 1, pg_dumpall_globals_clean => 1, }, }, 'DROP FUNCTION IF EXISTS dump_test.pltestlang_call_handler()' => { --- 5503,5513 ---- all_runs => 1, catch_all => 'DROP ... commands', regexp => qr/^DROP EXTENSION IF EXISTS plpgsql;/m, ! # this shouldn't ever get emitted anymore ! like => {}, unlike => { clean => 1, + clean_if_exists => 1, pg_dumpall_globals_clean => 1, }, }, 'DROP FUNCTION IF EXISTS dump_test.pltestlang_call_handler()' => { *************** qr/^GRANT SELECT ON TABLE measurement_y2 *** 6264,6274 **** \Q--\E\n\n \QGRANT USAGE ON SCHEMA public TO PUBLIC;\E /xm, ! like => { ! clean => 1, ! clean_if_exists => 1, }, unlike => { binary_upgrade => 1, createdb => 1, defaults => 1, exclude_dump_test_schema => 1, --- 6275,6286 ---- \Q--\E\n\n \QGRANT USAGE ON SCHEMA public TO PUBLIC;\E /xm, ! # this shouldn't ever get emitted anymore ! like => {}, unlike => { binary_upgrade => 1, + clean => 1, + clean_if_exists => 1, createdb => 1, defaults => 1, exclude_dump_test_schema => 1, *************** qr/^GRANT SELECT ON TABLE measurement_y2 *** 6537,6542 **** --- 6549,6556 ---- /xm, like => { binary_upgrade => 1, + clean => 1, + clean_if_exists => 1, createdb => 1, defaults => 1, exclude_dump_test_schema => 1, *************** qr/^GRANT SELECT ON TABLE measurement_y2 *** 6549,6556 **** section_pre_data => 1, with_oids => 1, }, unlike => { - clean => 1, - clean_if_exists => 1, only_dump_test_schema => 1, only_dump_test_table => 1, pg_dumpall_globals_clean => 1, --- 6563,6568 ---- *************** qr/^GRANT SELECT ON TABLE measurement_y2 *** 6576,6593 **** exclude_test_table_data => 1, no_blobs => 1, no_owner => 1, pg_dumpall_dbprivs => 1, schema_only => 1, section_pre_data => 1, with_oids => 1, }, unlike => { - only_dump_test_schema => 1, - only_dump_test_table => 1, pg_dumpall_globals_clean => 1, - role => 1, section_data => 1, ! section_post_data => 1, ! test_schema_plus_blobs => 1, }, }, 'REVOKE commands' => { # catch-all for REVOKE commands all_runs => 0, # catch-all --- 6588,6605 ---- exclude_test_table_data => 1, no_blobs => 1, no_owner => 1, + only_dump_test_schema => 1, + only_dump_test_table => 1, pg_dumpall_dbprivs => 1, + role => 1, schema_only => 1, section_pre_data => 1, + test_schema_plus_blobs => 1, with_oids => 1, }, unlike => { pg_dumpall_globals_clean => 1, section_data => 1, ! section_post_data => 1, }, }, 'REVOKE commands' => { # catch-all for REVOKE commands all_runs => 0, # catch-all
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > I wrote: > > I went looking and realized that actually what we're interested in here > > is the plpgsql extension, not the plpgsql language ... and in fact the > > behavior I was thinking of is already there, except for some reason it's > > only applied during binary upgrade. So I think we should give serious > > consideration to the attached, which just removes the binary_upgrade > > condition (and adjusts nearby comments). > > In further testing of that, I noticed that it made the behavior of our > other bugaboo, the public schema, rather inconsistent. With this > builtin-extensions hack, the plpgsql extension doesn't get dumped, > whether or not you say "clean". But the public schema does get > dropped and recreated if you say "clean". That's not helpful for > non-superuser users of pg_dump, so I think we should try to fix it. I'm not entirely sure about trying to also support --clean for non-superusers.. We've long had that the public schema is dropped and recreated with --clean and it seems likely that at least some users are depending on us doing that. In any case, it's certainly not a change that I think we could backpatch. Perhaps we could just change it moving forward (which would make me happier, really, since what I think we do with these initdb-time things currently is a bit bizarre). > Hence, the attached updated patch also makes selection of the public > schema use the DUMP_COMPONENT infrastructure rather than hardwired > code. > > I note btw that the removed bit in getNamespaces() is flat out wrong > independently of these other considerations, because it makes the SQL > put into an archive file vary depending on whether --clean was specified > at pg_dump time. That's not supposed to happen. Yes, having that in getNamespaces() isn't correct but we need to do something there, and I've been trying to figure out what. The reason it's there in the first place is that if you do --clean then the public schema is dropped and recreated *but* the initdb-time ACL isn't put back for it (and there's no ACL entries in the dump file, be it text or custom, if the user didn't change the ACL from the initdb-time one). What I was planning to do there was somehow inject the initdb-time ACL for public into the set of things to restore when we're asked to do a restore from a custom-format (or directory or other type which is handled by pg_restore) dump. We have to account for someone asking for pg_dump --clean --no-privileges also though. We would still need to do something for text-based output though.. Note that this fix really needs to be back-patched and that we had complaints about the --clean issue with the public schema on text-based pg_dump's (which is why the hack was added to getNamespaces() in the first place) and, more recently, for non-text based pg_dump's. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> In further testing of that, I noticed that it made the behavior of our >> other bugaboo, the public schema, rather inconsistent. With this >> builtin-extensions hack, the plpgsql extension doesn't get dumped, >> whether or not you say "clean". But the public schema does get >> dropped and recreated if you say "clean". That's not helpful for >> non-superuser users of pg_dump, so I think we should try to fix it. > I'm not entirely sure about trying to also support --clean for > non-superusers.. We've long had that the public schema is dropped and > recreated with --clean and it seems likely that at least some users are > depending on us doing that. In any case, it's certainly not a change > that I think we could backpatch. Perhaps we could just change it moving > forward (which would make me happier, really, since what I think we do > with these initdb-time things currently is a bit bizarre). Sure, I was not proposing this for back-patch --- it depends on the other stuff we've committed recently, anyway. > Yes, having that in getNamespaces() isn't correct but we need to do > something there, and I've been trying to figure out what. I claim this is what ;-) regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> In further testing of that, I noticed that it made the behavior of our > >> other bugaboo, the public schema, rather inconsistent. With this > >> builtin-extensions hack, the plpgsql extension doesn't get dumped, > >> whether or not you say "clean". But the public schema does get > >> dropped and recreated if you say "clean". That's not helpful for > >> non-superuser users of pg_dump, so I think we should try to fix it. > > > I'm not entirely sure about trying to also support --clean for > > non-superusers.. We've long had that the public schema is dropped and > > recreated with --clean and it seems likely that at least some users are > > depending on us doing that. In any case, it's certainly not a change > > that I think we could backpatch. Perhaps we could just change it moving > > forward (which would make me happier, really, since what I think we do > > with these initdb-time things currently is a bit bizarre). > > Sure, I was not proposing this for back-patch --- it depends on the > other stuff we've committed recently, anyway. > > > Yes, having that in getNamespaces() isn't correct but we need to do > > something there, and I've been trying to figure out what. > > I claim this is what ;-) Well, that would make things in v11 better, certainly. I suppose for back-patching, I can just go make the change to pg_restore that I outlined and that would deal with the complaints we've gotten there. I'm afraid we may still get some push-back from existing users of --clean since, with the change you're proposing, we wouldn't be cleaning up anything that's been done to the public schema when it comes to comment changes or ACL changes, right? A typical use-case of pg_dump with --clean being to 'reset' a system after, say, dumping out some subset of a production system and then loading it into the development environment that all of the devs have full access to, and where there might have been changes to the 'public' that you want to revert (to get it back to looking like how 'prod' was). In current versions this should result in at least the ACLs on public matching what they are on prod, along with the comment and any other changes done to it, but we would lose that if we stop doing drop/create of the public schema on --clean. Then again, the DROP SCHEMA will fail if any objects end up created in the public schema anyway, so it isn't like that's a complete solution regardless. I suppose it's a bit surprising that we haven't been asked for a --clean-cascade option. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > I'm afraid we may still get some push-back from existing users of > --clean since, with the change you're proposing, we wouldn't be cleaning > up anything that's been done to the public schema when it comes to > comment changes or ACL changes, right? No, if you have a nondefault ACL, that will still get applied. This arrangement would drop comment changes, but I can't get excited about that; it's certainly far less of an inconvenience in that scenario than dumping the comment is in non-superuser-restore scenarios. regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I'm afraid we may still get some push-back from existing users of > > --clean since, with the change you're proposing, we wouldn't be cleaning > > up anything that's been done to the public schema when it comes to > > comment changes or ACL changes, right? > > No, if you have a nondefault ACL, that will still get applied. This > arrangement would drop comment changes, but I can't get excited about > that; it's certainly far less of an inconvenience in that scenario > than dumping the comment is in non-superuser-restore scenarios. That nondefault ACL from the system the pg_dump was run on will get applied *over-top* of whatever the current ACL on the system that the restore is being run on, which may or may not be what's expected. That's different from the case where the public schema is dropped and recreated because, in that case, the schema will start out with an empty ACL and the resulting ACL should end up matching what was on the source system (ignoring the current issue with a non-clean pg_dump into a custom format dump being used with a pg_restore --clean). Just to be clear, I'm not suggesting that's a huge deal, but it's certainly a change and something we should at least recognize and contemplate. I'm also not really worried about losing the comment. I definitely don't think it's worth putting in the kind of infrastructure that we put in for init ACLs to handle comments being changed on initdb-time objects. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> No, if you have a nondefault ACL, that will still get applied. This >> arrangement would drop comment changes, but I can't get excited about >> that; it's certainly far less of an inconvenience in that scenario >> than dumping the comment is in non-superuser-restore scenarios. > That nondefault ACL from the system the pg_dump was run on will get > applied *over-top* of whatever the current ACL on the system that the > restore is being run on, which may or may not be what's expected. Fair point, but doesn't it apply equally to non-default ACLs on any other system objects? If you tweaked the permissions on say pg_ls_dir(), then dump, then tweak them some more, you're going to get uncertain results if you load that dump back into this database ... with or without --clean, because we certainly aren't going to drop pinned objects. I think we could jigger things so that we dump the definition of these special quasi-system objects only if their ACLs are not default, but it's not clear to me that that's really an improvement in the long run. Seems like it's just making them even wartier. regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> No, if you have a nondefault ACL, that will still get applied. This > >> arrangement would drop comment changes, but I can't get excited about > >> that; it's certainly far less of an inconvenience in that scenario > >> than dumping the comment is in non-superuser-restore scenarios. > > > That nondefault ACL from the system the pg_dump was run on will get > > applied *over-top* of whatever the current ACL on the system that the > > restore is being run on, which may or may not be what's expected. > > Fair point, but doesn't it apply equally to non-default ACLs on any > other system objects? If you tweaked the permissions on say pg_ls_dir(), > then dump, then tweak them some more, you're going to get uncertain > results if you load that dump back into this database ... with or without > --clean, because we certainly aren't going to drop pinned objects. Yes, that's certainly true, the public schema is the only "special" animal in this regard and making it less special (and more like pg_ls_dir()) would definitely be nice. > I think we could jigger things so that we dump the definition of these > special quasi-system objects only if their ACLs are not default, but > it's not clear to me that that's really an improvement in the long run. > Seems like it's just making them even wartier. Yeah, that would be worse, I agree. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Fair point, but doesn't it apply equally to non-default ACLs on any >> other system objects? If you tweaked the permissions on say pg_ls_dir(), >> then dump, then tweak them some more, you're going to get uncertain >> results if you load that dump back into this database ... with or without >> --clean, because we certainly aren't going to drop pinned objects. > Yes, that's certainly true, the public schema is the only "special" > animal in this regard and making it less special (and more like > pg_ls_dir()) would definitely be nice. I wonder if it'd be worth the trouble to invent a variant of REVOKE that means "restore this object's permissions to default" --- that is, either the ACL recorded in pg_init_privs if there is one, or NULL if there's no pg_init_privs entry. Then you could imagine pg_dump emitting that command before trying to assign an ACL to any object it hadn't created earlier in the run, rather than guessing about the current state of the object's ACL. (I'm not volunteering.) >> I think we could jigger things so that we dump the definition of these >> special quasi-system objects only if their ACLs are not default, but >> it's not clear to me that that's really an improvement in the long run. >> Seems like it's just making them even wartier. > Yeah, that would be worse, I agree. So are we at a consensus yet? regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> Fair point, but doesn't it apply equally to non-default ACLs on any > >> other system objects? If you tweaked the permissions on say pg_ls_dir(), > >> then dump, then tweak them some more, you're going to get uncertain > >> results if you load that dump back into this database ... with or without > >> --clean, because we certainly aren't going to drop pinned objects. > > > Yes, that's certainly true, the public schema is the only "special" > > animal in this regard and making it less special (and more like > > pg_ls_dir()) would definitely be nice. > > I wonder if it'd be worth the trouble to invent a variant of REVOKE > that means "restore this object's permissions to default" --- that is, > either the ACL recorded in pg_init_privs if there is one, or NULL if > there's no pg_init_privs entry. Then you could imagine pg_dump emitting > that command before trying to assign an ACL to any object it hadn't > created earlier in the run, rather than guessing about the current state > of the object's ACL. (I'm not volunteering.) I actually like that idea quite a bit.. Not really high on my priority list though. > >> I think we could jigger things so that we dump the definition of these > >> special quasi-system objects only if their ACLs are not default, but > >> it's not clear to me that that's really an improvement in the long run. > >> Seems like it's just making them even wartier. > > > Yeah, that would be worse, I agree. > > So are we at a consensus yet? You had me at "make public less special", I was just trying to make sure we all understand what that means. +1 from me for moving forward. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> So are we at a consensus yet? > You had me at "make public less special", I was just trying to make sure > we all understand what that means. > +1 from me for moving forward. Applying this patch will leave us with the original pg_dump misbehavior removed, AFAICS. So now we are hard up against the question of whether --no-comments can support its weight as a pure feature, rather than a workaround for a misbehavior/bug. I've been pretty much on the negative side of that question, but I just posted a patch here: https://postgr.es/m/32668.1516848577@sss.pgh.pa.us that will have the effect of causing pg_restore to print ACLs, comments, and seclabels in cases where it formerly didn't. If you want to get back to the old behavior in those cases, we have you covered with --no-acl and --no-security-labels ... but not so much for comments. Between that and Robert's point that we have --no-foo for every other subsidiary object property, so why not comments, I'm prepared to change my vote. This isn't a positive review of the contents of the patch, because I've not read it. But I'm on board with the goal now. regards, tom lane
Robins Tharakan <tharakan@gmail.com> writes: > Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall. I've reviewed and pushed this, after making some changes so that the switch works in pg_restore as well, and minor cosmetic adjustments. The changes in t/002_pg_dump.pl largely failed to apply, which is partially due to the age of the patch but IMO speaks more to the unmaintainability of that TAP test. It still didn't run after I'd manually fixed the merge failures, so I gave up in disgust and did not push any of the test changes. If someone is excited enough about testing this, they can submit a followon patch for that, but I judge it not really worth either the human effort or the future testing cycles. (Am I right in thinking that 002_pg_dump.pl is, by design, roughly O(N^2) in the number of features tested? Ick.) regards, tom lane
Tom Lane wrote: > The changes in t/002_pg_dump.pl largely failed to apply, which is > partially due to the age of the patch but IMO speaks more to the > unmaintainability of that TAP test. It still didn't run after I'd > manually fixed the merge failures, so I gave up in disgust and > did not push any of the test changes. If someone is excited enough > about testing this, they can submit a followon patch for that, > but I judge it not really worth either the human effort or the > future testing cycles. > > (Am I right in thinking that 002_pg_dump.pl is, by design, roughly > O(N^2) in the number of features tested? Ick.) Yeah, that 002 test is pretty nasty stuff. I think we only put up with it because it's the only idea we've come up with; maybe there are better ideas. Crazy idea: maybe a large fraction of that test could be replaced with comparisons of the "pg_restore -l" output file rather than pg_dump's text output (i.e. the TOC entry for each object, rather than the object's textual representation.) Sounds easier in code than current implementation. Separately, verify that textual representation for each TOC entry type is what we expect. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro, Tom, * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote: > Tom Lane wrote: > > > The changes in t/002_pg_dump.pl largely failed to apply, which is > > partially due to the age of the patch but IMO speaks more to the > > unmaintainability of that TAP test. It still didn't run after I'd > > manually fixed the merge failures, so I gave up in disgust and > > did not push any of the test changes. If someone is excited enough > > about testing this, they can submit a followon patch for that, > > but I judge it not really worth either the human effort or the > > future testing cycles. > > > > (Am I right in thinking that 002_pg_dump.pl is, by design, roughly > > O(N^2) in the number of features tested? Ick.) It certainly tries to cover a lot of combinations, but I did try to write it to not completely cross everything against everything but instead to try and cover the different code paths. > Yeah, that 002 test is pretty nasty stuff. I think we only put up with > it because it's the only idea we've come up with; maybe there are better > ideas. I've discussed the 'better idea' for it previously, which is to essentially break out the various parts of the big perl hash into multilpe independent files that would be a lot easier to work with and manage, and perhaps make them not be perl hashes but something else. In other words, make the core perl code similar to pg_regress and the input/output files likewise similar to how pg_regress works. Maybe we should rework the whole thing to work with pg_regress directly but we'd need to teach it about regexp's, I'd think.. > Crazy idea: maybe a large fraction of that test could be replaced with > comparisons of the "pg_restore -l" output file rather than pg_dump's > text output (i.e. the TOC entry for each object, rather than the > object's textual representation.) Sounds easier in code than current > implementation. Separately, verify that textual representation for each > TOC entry type is what we expect. I'm not sure how that's different..? We do check that the textual representation is what we expect, both directly from pg_dump and from pg_restore output, and using the exact same code to check both (which I generally think is a good thing since we want the results from both to more-or-less match up). What you're proposing here sounds like we're throwing that away in favor of keeping all the same code to test the textual representation but then only checking for listed contents from pg_restore instead of checking that the textual representation is correct. That doesn't actually reduce the amount of code though.. Thanks! Stephen
Attachment
Stephen Frost wrote: > Alvaro, Tom, > > * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote: > > Crazy idea: maybe a large fraction of that test could be replaced with > > comparisons of the "pg_restore -l" output file rather than pg_dump's > > text output (i.e. the TOC entry for each object, rather than the > > object's textual representation.) Sounds easier in code than current > > implementation. Separately, verify that textual representation for each > > TOC entry type is what we expect. > > I'm not sure how that's different..? We do check that the textual > representation is what we expect, both directly from pg_dump and from > pg_restore output, and using the exact same code to check both (which I > generally think is a good thing since we want the results from both to > more-or-less match up). What you're proposing here sounds like we're > throwing that away in favor of keeping all the same code to test the > textual representation but then only checking for listed contents from > pg_restore instead of checking that the textual representation is > correct. That doesn't actually reduce the amount of code though.. Well, the current implementation compares a dozen of pg_dump output text files, three hundred lines apiece, against a thousand of regexes (give or take). Whenever there is a mismatch, what you get is "this regexp failed to match <three hundred lines>" (or sometimes "matched when it should have not"), so looking for the mismatch is quite annoying. My proposal is that instead of looking at three hundred lines, you'd look for 50 lines of `pg_restore -l` output -- is element XYZ in there or not. Quite a bit simpler for the guy adding a new test. This tests combinations of pg_dump switches: are we dumping the right set of objects. *Separately* test that each individual TOC entry type ("table data", "index", "tablespace") is dumped as this or that SQL command, where you create a separate dump file for each object type. So you match a single TOC entry to a dozen lines of SQL, half of them comments -- pretty easy to see where a mismatch is. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Well, the current implementation compares a dozen of pg_dump output text > files, three hundred lines apiece, against a thousand of regexes (give > or take). Whenever there is a mismatch, what you get is "this regexp > failed to match <three hundred lines>" (or sometimes "matched when it > should have not"), so looking for the mismatch is quite annoying. Yeah, I've been getting my nose rubbed in that over the past couple days of pg_dump hacking: when you get a failure, the output is really unfriendly, even by the seriously low standards of the rest of our TAP tests. I'm not sure if Alvaro's idea is the best way to improve that, but it's something to think about. The thing that was annoying me though is that it seems like every test case interacts with every other test case, or at least way more of them than one could wish, due to the construction of that big hash constant. That's what I find unmaintainable. regards, tom lane
Alvaro, * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote: > Stephen Frost wrote: > > Alvaro, Tom, > > > > * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote: > > > > Crazy idea: maybe a large fraction of that test could be replaced with > > > comparisons of the "pg_restore -l" output file rather than pg_dump's > > > text output (i.e. the TOC entry for each object, rather than the > > > object's textual representation.) Sounds easier in code than current > > > implementation. Separately, verify that textual representation for each > > > TOC entry type is what we expect. > > > > I'm not sure how that's different..? We do check that the textual > > representation is what we expect, both directly from pg_dump and from > > pg_restore output, and using the exact same code to check both (which I > > generally think is a good thing since we want the results from both to > > more-or-less match up). What you're proposing here sounds like we're > > throwing that away in favor of keeping all the same code to test the > > textual representation but then only checking for listed contents from > > pg_restore instead of checking that the textual representation is > > correct. That doesn't actually reduce the amount of code though.. > > Well, the current implementation compares a dozen of pg_dump output text > files, three hundred lines apiece, against a thousand of regexes (give > or take). Whenever there is a mismatch, what you get is "this regexp > failed to match <three hundred lines>" (or sometimes "matched when it > should have not"), so looking for the mismatch is quite annoying. Yeah, the big output isn't fun, I agree with that. > My proposal is that instead of looking at three hundred lines, you'd > look for 50 lines of `pg_restore -l` output -- is element XYZ in there > or not. Quite a bit simpler for the guy adding a new test. This tests > combinations of pg_dump switches: are we dumping the right set of > objects. That might be possible to do, though I'm not sure that it'd really be only 50 lines. > *Separately* test that each individual TOC entry type ("table data", > "index", "tablespace") is dumped as this or that SQL command, where you > create a separate dump file for each object type. So you match a single > TOC entry to a dozen lines of SQL, half of them comments -- pretty easy > to see where a mismatch is. Yikes. If you're thinking that we would call pg_restore independently for each object, I'd be worried about the runtime increasing quite a bit. There's also a few cases where we check dependencies between objects that we'd need to be mindful of, though those we might be able to take care of, if we can keep the runtime down. Certainly, part of the way the existing code works was specifically to try and minimize the runtime. Perhaps the approach taken of minimizing the invocations and building a bunch of regexps to run against the resulting output is more expensive than running pg_restore a bunch of times, but I'd want to see that to be sure as some really simple timings locally on a 6.5k -Fc dump file seems to take 30-40ms just to produce the TOC. This also doesn't really help with the big perl hash, but these two ideas don't seem to conflict, so perhaps we could possibly still tackle the big perl hash with the approach I described up-thread and also find a way to implement this, if it doesn't cause the regression tests to be too much slower. Thanks! Stephen
Attachment
On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > My proposal is that instead of looking at three hundred lines, you'd > look for 50 lines of `pg_restore -l` output -- is element XYZ in there > or not. Quite a bit simpler for the guy adding a new test. This tests > combinations of pg_dump switches: are we dumping the right set of > objects. My counter-proposal is that we remove the test entirely. It looks like an unmaintainable and undocumented mess to me, and I doubt whether the testing value is sufficient to justify the effort of updating it every time anyone wants to change something in pg_dump. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, * Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > My proposal is that instead of looking at three hundred lines, you'd > > look for 50 lines of `pg_restore -l` output -- is element XYZ in there > > or not. Quite a bit simpler for the guy adding a new test. This tests > > combinations of pg_dump switches: are we dumping the right set of > > objects. > > My counter-proposal is that we remove the test entirely. It looks > like an unmaintainable and undocumented mess to me, and I doubt > whether the testing value is sufficient to justify the effort of > updating it every time anyone wants to change something in pg_dump. Considering it turned up multiple serious bugs, particularly in the binary upgrade path, I can't disagree more. If you have a counter proposal which actually results in better test coverage, that'd be fantastic, but I wholly reject the notion that we should be considering reducing our test coverage of pg_dump. Thanks! Stephen
Attachment
On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> > My proposal is that instead of looking at three hundred lines, you'd >> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there >> > or not. Quite a bit simpler for the guy adding a new test. This tests >> > combinations of pg_dump switches: are we dumping the right set of >> > objects. >> >> My counter-proposal is that we remove the test entirely. It looks >> like an unmaintainable and undocumented mess to me, and I doubt >> whether the testing value is sufficient to justify the effort of >> updating it every time anyone wants to change something in pg_dump. > > Considering it turned up multiple serious bugs, particularly in the > binary upgrade path, I can't disagree more. If you have a counter > proposal which actually results in better test coverage, that'd be > fantastic, but I wholly reject the notion that we should be considering > reducing our test coverage of pg_dump. I figured you would, but it's still my opinion. I guess my basic objection here is to the idea that we somehow know that the 6000+ line test case file actually contains only correct tests. That vastly exceeds the ability of any normal human being to verify correctness, especially given what's already been said about the interdependencies between different parts of the file and the lack of adequate documentation. For example, I notice that the file contains 166 copies of only_dump_test_schema => 1 (with 4 different variations as to how much whitespace is included). Some of those are in the "like" clause and some are in the "unlike" clause. If one of them were misplaced, and pg_dump is actually producing the wrong output, the two errors would cancel out and I suspect nobody would notice. If somebody makes a change to pg_dump that changes which things get dumped when --schema is used, then a lot of those lines would need to moved around. That would be a huge nuisance. If the author of such a patch just stuck those lines where they make the tests pass, they might fail to notice if one of them were actually in the wrong place, and a bug would go undiscovered. Some people probably have the mental stamina to audit 166 separate cases and make sure that each one is properly positioned, but some people might not; and that's really just one test of many. Really, every pg_dump change that alters behavior needs to individually reconsider the position of every one of ~6000 lines in this file, and nobody is really going to do that. There's some rule that articulates what the effect of --schema is supposed to be. I don't know the exact rule, but it's probably something like "global objects shouldn't get dumped and schema objects should only get dumped if they're in that schema". That rule ought to be encoded into this file in some recognizable form. It's encoded there, but only in the positioning of hundreds of separate lines of code that look identical but must be positioned differently based on a human programmer's interpretation of how that rule applies to each object type. That's not a good way of implementing the rule; nobody can look at this and say "oh, well I changed the rules for --schema, so here's which things need to be updated". You're not going to be able to look at the diff somebody produces and have any idea whether it's right, or at least not without a lot of very time-consuming manual verification. If you had just saved the output of pg_dump in a file (maybe with OIDs and other variable bits stripped out) and compared the new output against the old, it would give you just as much code coverage but probably be a lot easier to maintain. If you had implemented the rules programmatically instead of encoding a giant manually precomputed data structure in the test case file it would be a lot more clearly correct and again easier to maintain. I think you've chosen a terrible design and ought to throw the whole thing away and start over. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, * Robert Haas (robertmhaas@gmail.com) wrote: > On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost <sfrost@snowman.net> wrote: > > * Robert Haas (robertmhaas@gmail.com) wrote: > >> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> > My proposal is that instead of looking at three hundred lines, you'd > >> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there > >> > or not. Quite a bit simpler for the guy adding a new test. This tests > >> > combinations of pg_dump switches: are we dumping the right set of > >> > objects. > >> > >> My counter-proposal is that we remove the test entirely. It looks > >> like an unmaintainable and undocumented mess to me, and I doubt > >> whether the testing value is sufficient to justify the effort of > >> updating it every time anyone wants to change something in pg_dump. > > > > Considering it turned up multiple serious bugs, particularly in the > > binary upgrade path, I can't disagree more. If you have a counter > > proposal which actually results in better test coverage, that'd be > > fantastic, but I wholly reject the notion that we should be considering > > reducing our test coverage of pg_dump. > > I figured you would, but it's still my opinion. I guess my basic > objection here is to the idea that we somehow know that the 6000+ line > test case file actually contains only correct tests. That vastly > exceeds the ability of any normal human being to verify correctness, > especially given what's already been said about the interdependencies > between different parts of the file and the lack of adequate > documentation. I don't mean to discount your opinion at all, but I'm quite concerned about the code coverage of pg_dump. I certainly agree that the testing of pg_dump can and should be improved and I'd like to find time to work on that, but simply throwing this out strikes me as a step backwards, not forwards, even for its difficulty. > For example, I notice that the file contains 166 copies of > only_dump_test_schema => 1 (with 4 different variations as to how > much whitespace is included). Some of those are in the "like" clause > and some are in the "unlike" clause. If one of them were misplaced, > and pg_dump is actually producing the wrong output, the two errors > would cancel out and I suspect nobody would notice. If somebody makes > a change to pg_dump that changes which things get dumped when --schema > is used, then a lot of those lines would need to moved around. That > would be a huge nuisance. If the author of such a patch just stuck > those lines where they make the tests pass, they might fail to notice > if one of them were actually in the wrong place, and a bug would go > undiscovered. Some people probably have the mental stamina to audit > 166 separate cases and make sure that each one is properly positioned, > but some people might not; and that's really just one test of many. > Really, every pg_dump change that alters behavior needs to > individually reconsider the position of every one of ~6000 lines in > this file, and nobody is really going to do that. > > There's some rule that articulates what the effect of --schema is > supposed to be. I don't know the exact rule, but it's probably > something like "global objects shouldn't get dumped and schema objects > should only get dumped if they're in that schema". That rule ought to > be encoded into this file in some recognizable form. It's encoded > there, but only in the positioning of hundreds of separate lines of > code that look identical but must be positioned differently based on a > human programmer's interpretation of how that rule applies to each > object type. That's not a good way of implementing the rule; nobody > can look at this and say "oh, well I changed the rules for --schema, > so here's which things need to be updated". You're not going to be > able to look at the diff somebody produces and have any idea whether > it's right, or at least not without a lot of very time-consuming > manual verification. If you had just saved the output of pg_dump in a > file (maybe with OIDs and other variable bits stripped out) and > compared the new output against the old, it would give you just as > much code coverage but probably be a lot easier to maintain. If you > had implemented the rules programmatically instead of encoding a giant > manually precomputed data structure in the test case file it would be > a lot more clearly correct and again easier to maintain. The existing code already has some of these type of rules encoded in it, specifically to reduce the number of like/unlike cases where it's possible to do so- see the discussion of catch-all tests, described on about line 282. While there may be other such rules which can be encoded, I don't believe they'd end up being nearly as clean as you're suggesting unless we go through and make changes to pg_dump itself to consistently behave according to those rules. Perhaps that's a way to go, I'll admit that I was focusing more on making sure that the documented behavior is what pg_dump was actually doing, and so I hadn't considered how we could simplify the testing if pg_dump was stipulated to operate in a more specific manner. The notion of using the TOC and then splitting up the output as suggested by Alvaro seems like a good approach to me. Perhaps there's a way to do that and to also encode more rules about how pg_dump is expected to operate. > I think you've chosen a terrible design and ought to throw the whole > thing away and start over. I'll all for throwing away the existing test once we've got something that covers at least what it does (ideally more, of course). Thanks! Stephen
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > I figured you would, but it's still my opinion. I guess my basic > objection here is to the idea that we somehow know that the 6000+ line > test case file actually contains only correct tests. That vastly > exceeds the ability of any normal human being to verify correctness, > especially given what's already been said about the interdependencies > between different parts of the file and the lack of adequate > documentation. Yeah, that's a problem. In the last two times I touched that file, I just moved things between "like" and "unlike" categories until the test passed. If there were anything useful it had to tell me, it was a complete failure at doing so. I frankly won't even think about adding new test cases to it, either. I don't know how to make it better exactly, but I concur with Robert that that test needs fundamental redesign of some kind to be maintainable. regards, tom lane
On Fri, Jan 26, 2018 at 11:56 AM, Stephen Frost <sfrost@snowman.net> wrote: >> I think you've chosen a terrible design and ought to throw the whole >> thing away and start over. > > I'll all for throwing away the existing test once we've got something > that covers at least what it does (ideally more, of course). I'm for throwing this away now. It's a nuisance for other people to maintain, and as Tom's reply makes clear (and it matches my suspicions), they are maintaining it without really knowing whether the updates are making are *correct*, just knowing that they *make the tests pass*. It's nice to make things turn green on the code coverage report, but if we're not really verifying that the results are correct, we're just kidding ourselves. We'd get the same amount of green on the code coverage report by running the pg_dump commands and sending the output to /dev/null, and it would be a lot less work to keep up to date. I'm glad this helped you find some bugs. It is only worth keeping if it prevents other hackers from introducing bugs in the future. I doubt that it will have that effect. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, * Robert Haas (robertmhaas@gmail.com) wrote: > On Fri, Jan 26, 2018 at 11:56 AM, Stephen Frost <sfrost@snowman.net> wrote: > >> I think you've chosen a terrible design and ought to throw the whole > >> thing away and start over. > > > > I'll all for throwing away the existing test once we've got something > > that covers at least what it does (ideally more, of course). > > I'm for throwing this away now. It's a nuisance for other people to > maintain, and as Tom's reply makes clear (and it matches my > suspicions), they are maintaining it without really knowing whether > the updates are making are *correct*, just knowing that they *make the > tests pass*. It's nice to make things turn green on the code coverage > report, but if we're not really verifying that the results are > correct, we're just kidding ourselves. We'd get the same amount of > green on the code coverage report by running the pg_dump commands and > sending the output to /dev/null, and it would be a lot less work to > keep up to date. There's not much to argue about if committers are simply hacking away at it without actually considering if the test are correct or not. I suspect it's not quite as bad as being outright wrong- if individuals are at least testing their changes thoroughly first and then making sure that the tests pass then it's at least more likely that what's recorded in the test suite is actually correct, but that's not great. Further, having contributors include updates to the test suite which are then willfully thrown away is outright bad. If that's worse than not having this test coverage for pg_dump, I'm not sure. What strikes me as likely is that removing this test from pg_dump will just result in bugs being introduced because, for as much as updating these tests are painful, actually testing all of the different variations of pg_dump by hand for even a simple change is quite a bit of work too. I don't have any doubt that the reason bugs were found with this is because there were permutations of pg_dump that weren't being tested, either by individuals making the change or through the other regression tests we have. Indeed, the tests included were specifically to get code coverage of cases which weren't already being tested. > I'm glad this helped you find some bugs. It is only worth keeping if > it prevents other hackers from introducing bugs in the future. I > doubt that it will have that effect. I'm not convinced that it won't, but I agree that we want something that everyone will feel comfortable in maintaining and improving moving forward. I'll work on it and either submit a rework to make it easier to maintain or a patch to remove it before the next CF. Thanks! Stephen