Thread: use pg_get_functiondef() in pg_dump
Here is a patch to have pg_dump use pg_get_functiondef() instead of assembling the CREATE FUNCTION/PROCEDURE commands itself. This should save on maintenance effort in the future. It's also a prerequisite for being able to dump functions with SQL-standard function body discussed in [0]. pg_get_functiondef() was meant for psql's \ef command, so its defaults are slightly different from what pg_dump would like, so this adds a few optional parameters for tweaking the behavior. The naming of the parameters is up for discussion. [0]: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Aug 12, 2020 at 4:25 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Here is a patch to have pg_dump use pg_get_functiondef() instead of > assembling the CREATE FUNCTION/PROCEDURE commands itself. This should > save on maintenance effort in the future. It's also a prerequisite for > being able to dump functions with SQL-standard function body discussed > in [0]. > > pg_get_functiondef() was meant for psql's \ef command, so its defaults > are slightly different from what pg_dump would like, so this adds a few > optional parameters for tweaking the behavior. The naming of the > parameters is up for discussion. One problem with this, which I think Tom pointed out before, is that it might make it to handle some forward-compatibility problems. In other words, if something that the server is generating needs to be modified for compatibility with a future release, it's not easy to do that. Like if we needed to quote something we weren't previously quoting, for example. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2020-08-12 21:54, Robert Haas wrote: > One problem with this, which I think Tom pointed out before, is that > it might make it to handle some forward-compatibility problems. In > other words, if something that the server is generating needs to be > modified for compatibility with a future release, it's not easy to do > that. Like if we needed to quote something we weren't previously > quoting, for example. We already use a lot of other pg_get_*def functions in pg_dump. Does this one introduce any fundamentally new problems? A hypothetical change where syntax that we accept now would no longer be accepted in a (near-)future version would create a lot of upsetness. I don't think we'd do it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2020-08-12 21:54, Robert Haas wrote: >> One problem with this, which I think Tom pointed out before, is that >> it might make it to handle some forward-compatibility problems. In >> other words, if something that the server is generating needs to be >> modified for compatibility with a future release, it's not easy to do >> that. Like if we needed to quote something we weren't previously >> quoting, for example. > We already use a lot of other pg_get_*def functions in pg_dump. Does > this one introduce any fundamentally new problems? I wouldn't say that it's *fundamentally* new, but nonethless it disturbs me that this proposal has pg_dump assembling CREATE FUNCTION commands in very different ways depending on the server version. I'd rather see us continuing to build the bulk of the command the same as before, and introduce new behavior only for deparsing the function body. We've talked before about what a mess it is that some aspects of pg_dump's output are built on the basis of what pg_dump sees in its stable snapshot but others are built by ruleutils.c on the basis of up-to-the-minute catalog contents. While I don't insist that this patch fix that, I'm worried that it may be making things worse, or at least getting in the way of ever fixing that. Perhaps these concerns are unfounded, but I'd like to see some arguments why before we go down this path. regards, tom lane
I wrote: > I wouldn't say that it's *fundamentally* new, but nonethless it disturbs > me that this proposal has pg_dump assembling CREATE FUNCTION commands in > very different ways depending on the server version. I'd rather see us > continuing to build the bulk of the command the same as before, and > introduce new behavior only for deparsing the function body. BTW, a concrete argument for doing it that way is that if you make a backend function that does the whole CREATE-FUNCTION-building job in exactly the way pg_dump wants it, that function is nigh useless for any other client with slightly different requirements. A trivial example here is that I don't think we want to become locked into the proposition that psql's \ef and \sf must print functions exactly the same way that pg_dump would. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > I wrote: > > I wouldn't say that it's *fundamentally* new, but nonethless it disturbs > > me that this proposal has pg_dump assembling CREATE FUNCTION commands in > > very different ways depending on the server version. I'd rather see us > > continuing to build the bulk of the command the same as before, and > > introduce new behavior only for deparsing the function body. > > BTW, a concrete argument for doing it that way is that if you make a > backend function that does the whole CREATE-FUNCTION-building job in > exactly the way pg_dump wants it, that function is nigh useless for > any other client with slightly different requirements. A trivial > example here is that I don't think we want to become locked into > the proposition that psql's \ef and \sf must print functions exactly > the same way that pg_dump would. The fact that the need that psql has and that which pg_dump has are at least somewhat similar really argues that we should have put this code into libpgcommon in the first place and not in the backend, and then had both psql and pg_dump use that. I'm sure there's a lot of folks who'd like to see more of the logic we have in pg_dump for building objects from the catalog available to more tools through libpgcommon- psql being one of the absolute first use-cases for exactly that (there's certainly no shortage of people who've asked how they can get a CREATE TABLE statement for a table by using psql...). Thanks, Stephen
Attachment
I'm sure there's a lot of folks who'd like to see more of the logic we
have in pg_dump for building objects from the catalog available to more
tools through libpgcommon- psql being one of the absolute first
use-cases for exactly that (there's certainly no shortage of people
who've asked how they can get a CREATE TABLE statement for a table by
using psql...).
I count myself among those folks (see https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com for discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server side or client side).
I'm all for having this as "just" as set of pg_get_*def functions, because they allow for the results to be used in queries. Granted, the shape of the result set may not be stable, but that's the sort of thing we can warn for the same way we have warnings for changes to pg_stat_activity. At that point any DESCRIBE/SHOW CREATE server side functions essentially become just shells around the pg_get_*def(), with no particular requirement to make those new commands work inside a SELECT.
Would it be totally out of left field to have the functions have an optional "version" parameter, defaulted to null, that would be used to give backwards compatible results if and when we do make a breaking change?
Greetings, * Corey Huinker (corey.huinker@gmail.com) wrote: > > I'm sure there's a lot of folks who'd like to see more of the logic we > > have in pg_dump for building objects from the catalog available to more > > tools through libpgcommon- psql being one of the absolute first > > use-cases for exactly that (there's certainly no shortage of people > > who've asked how they can get a CREATE TABLE statement for a table by > > using psql...). > > I count myself among those folks (see > https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com > for > discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server > side or client side). > > I'm all for having this as "just" as set of pg_get_*def functions, because > they allow for the results to be used in queries. Granted, the shape of the > result set may not be stable, but that's the sort of thing we can warn for > the same way we have warnings for changes to pg_stat_activity. At that > point any DESCRIBE/SHOW CREATE server side functions essentially become > just shells around the pg_get_*def(), with no particular requirement to > make those new commands work inside a SELECT. Another advantage of having this in libpgcommon is that the backend *and* the frontend could then use it. > Would it be totally out of left field to have the functions have an > optional "version" parameter, defaulted to null, that would be used to give > backwards compatible results if and when we do make a breaking change? So.. the code that's in pg_dump today works to go from "whatever the connected server's version is" to "whatever the version is of the pg_dump command itself". If we had the code in libpgcommon, and functions in the backend to get at it along with psql having that code, you could then, using the code we have today, go from a bunch of 'source' versions to 'target' version of either the version of the psql command, or that of the server. That is, consider a future where this is all done and all that crazy version-specific code in pg_dump has been moved to libpgcommon in v14, and then you have a v15 psql, so: psql v15 connected to PG v14: You do: \dct mytable -- psql internal command to get 'create table' Result: a CREATE TABLE that works for v15 You do: DESCRIBE mytable; -- PG backend function to get 'create table' Result: a CREATE TABLE that works for v14 Without having to add anything to what we're already doing (yes, yes, beyond the complications of moving this stuff into libpgcommon, but at least we're not having to create some kind of matrix of "source PG version 10, target PG version 12" into PG14). A bit crazy, sure, but would certainly be pretty useful. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > So.. the code that's in pg_dump today works to go from "whatever the > connected server's version is" to "whatever the version is of the > pg_dump command itself". If we had the code in libpgcommon, and > functions in the backend to get at it along with psql having that code, > you could then, using the code we have today, go from a bunch of > 'source' versions to 'target' version of either the version of the psql > command, or that of the server. At this point, I think I need a high-power telescope even to see the goalposts :-( If we actually want to do something like this, we need a plan not just some handwaving. Let's start by enumerating the concerns that would have to be solved. I can think of: * Execution context. Stephen seems to be envisioning code that could be compiled into the backend not just the frontend, but is that really worth the trouble? Could we share such code across FE/BE at all (it'd certainly be a far more ambitious exercise in common code than we've done to date)? What's the backend version actually doing, issuing queries over SPI? (I suppose if you were rigid about that, it could offer a guarantee that the results match your snapshot, which is pretty attractive.) * Global vs. per-object activity. pg_dump likes to query the entire state of the database to start with, and then follow up by grabbing additional details about objects it's going to dump. That's not an operating mode that most other clients would want, but if for no other reason than performance, I don't think we can walk away from it for pg_dump --- indeed, I think pg_dump probably needs to be fixed to do less per-object querying, not more. Meanwhile applications such as psql \d would only want to investigate one object at a time. What design can we create that will handle that? If there is persistent state involved, what in the world does that mean for the case of a backend-side library? * Context in which the output is valid. Target server version was already mentioned, but a quick examination of pg_dump output scripts will remind you that there's a bunch more assumptions: SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'en_US.utf8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET xmloption = content; SET client_min_messages = warning; SET row_security = off; not to mention special hackery for object ownership and tablespaces. Some of these things probably don't matter for other use-cases, but others definitely do. In particular, I really doubt that psql and other clients would find it acceptable to force search_path to a particular thing. Which brings us to * Security. How robust do the output commands need to be, and what will we have to do that pg_dump doesn't need to? * No doubt there are some other topics I didn't think of. This certainly would be attractive if we had it, but the task seems dauntingly large. It's not going to happen without some fairly serious investment of time. regards, tom lane
On 2020-08-15 16:36, Tom Lane wrote: > I wrote: >> I wouldn't say that it's *fundamentally* new, but nonethless it disturbs >> me that this proposal has pg_dump assembling CREATE FUNCTION commands in >> very different ways depending on the server version. I'd rather see us >> continuing to build the bulk of the command the same as before, and >> introduce new behavior only for deparsing the function body. > > BTW, a concrete argument for doing it that way is that if you make a > backend function that does the whole CREATE-FUNCTION-building job in > exactly the way pg_dump wants it, that function is nigh useless for > any other client with slightly different requirements. A trivial > example here is that I don't think we want to become locked into > the proposition that psql's \ef and \sf must print functions exactly > the same way that pg_dump would. That's why the patch adds optional arguments to the function to choose the behavior that is appropriate for the situation. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-08-15 16:23, Tom Lane wrote: > I wouldn't say that it's*fundamentally* new, but nonethless it disturbs > me that this proposal has pg_dump assembling CREATE FUNCTION commands in > very different ways depending on the server version. I'd rather see us > continuing to build the bulk of the command the same as before, and > introduce new behavior only for deparsing the function body. OK, I'll work on something like that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services