Thread: use pg_get_functiondef() in pg_dump

use pg_get_functiondef() in pg_dump

From

Peter Eisentraut

Date:

12 August 2020, 08:25:31

Here is a patch to have pg_dump use pg_get_functiondef() instead of 
assembling the CREATE FUNCTION/PROCEDURE commands itself.  This should 
save on maintenance effort in the future.  It's also a prerequisite for 
being able to dump functions with SQL-standard function body discussed 
in [0].

pg_get_functiondef() was meant for psql's \ef command, so its defaults 
are slightly different from what pg_dump would like, so this adds a few
optional parameters for tweaking the behavior.  The naming of the 
parameters is up for discussion.


[0]: 
https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: use pg_get_functiondef() in pg_dump

From

Robert Haas

Date:

12 August 2020, 19:54:40

On Wed, Aug 12, 2020 at 4:25 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Here is a patch to have pg_dump use pg_get_functiondef() instead of
> assembling the CREATE FUNCTION/PROCEDURE commands itself.  This should
> save on maintenance effort in the future.  It's also a prerequisite for
> being able to dump functions with SQL-standard function body discussed
> in [0].
>
> pg_get_functiondef() was meant for psql's \ef command, so its defaults
> are slightly different from what pg_dump would like, so this adds a few
> optional parameters for tweaking the behavior.  The naming of the
> parameters is up for discussion.

One problem with this, which I think Tom pointed out before, is that
it might make it to handle some forward-compatibility problems. In
other words, if something that the server is generating needs to be
modified for compatibility with a future release, it's not easy to do
that. Like if we needed to quote something we weren't previously
quoting, for example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: use pg_get_functiondef() in pg_dump

From

Peter Eisentraut

Date:

15 August 2020, 09:49:53

On 2020-08-12 21:54, Robert Haas wrote:
> One problem with this, which I think Tom pointed out before, is that
> it might make it to handle some forward-compatibility problems. In
> other words, if something that the server is generating needs to be
> modified for compatibility with a future release, it's not easy to do
> that. Like if we needed to quote something we weren't previously
> quoting, for example.

We already use a lot of other pg_get_*def functions in pg_dump.  Does 
this one introduce any fundamentally new problems?

A hypothetical change where syntax that we accept now would no longer be 
accepted in a (near-)future version would create a lot of upsetness.  I 
don't think we'd do it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: use pg_get_functiondef() in pg_dump

From

Tom Lane

Date:

15 August 2020, 14:23:55

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2020-08-12 21:54, Robert Haas wrote:
>> One problem with this, which I think Tom pointed out before, is that
>> it might make it to handle some forward-compatibility problems. In
>> other words, if something that the server is generating needs to be
>> modified for compatibility with a future release, it's not easy to do
>> that. Like if we needed to quote something we weren't previously
>> quoting, for example.

> We already use a lot of other pg_get_*def functions in pg_dump.  Does 
> this one introduce any fundamentally new problems?

I wouldn't say that it's *fundamentally* new, but nonethless it disturbs
me that this proposal has pg_dump assembling CREATE FUNCTION commands in
very different ways depending on the server version.  I'd rather see us
continuing to build the bulk of the command the same as before, and
introduce new behavior only for deparsing the function body.

We've talked before about what a mess it is that some aspects of pg_dump's
output are built on the basis of what pg_dump sees in its stable snapshot
but others are built by ruleutils.c on the basis of up-to-the-minute
catalog contents.  While I don't insist that this patch fix that, I'm
worried that it may be making things worse, or at least getting in the
way of ever fixing that.

Perhaps these concerns are unfounded, but I'd like to see some arguments
why before we go down this path.

            regards, tom lane

Re: use pg_get_functiondef() in pg_dump

From

Tom Lane

Date:

15 August 2020, 14:36:51

I wrote:
> I wouldn't say that it's *fundamentally* new, but nonethless it disturbs
> me that this proposal has pg_dump assembling CREATE FUNCTION commands in
> very different ways depending on the server version.  I'd rather see us
> continuing to build the bulk of the command the same as before, and
> introduce new behavior only for deparsing the function body.

BTW, a concrete argument for doing it that way is that if you make a
backend function that does the whole CREATE-FUNCTION-building job in
exactly the way pg_dump wants it, that function is nigh useless for
any other client with slightly different requirements.  A trivial
example here is that I don't think we want to become locked into
the proposition that psql's \ef and \sf must print functions exactly
the same way that pg_dump would.

            regards, tom lane

Re: use pg_get_functiondef() in pg_dump

From

Stephen Frost

Date:

15 August 2020, 17:39:40

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I wrote:
> > I wouldn't say that it's *fundamentally* new, but nonethless it disturbs
> > me that this proposal has pg_dump assembling CREATE FUNCTION commands in
> > very different ways depending on the server version.  I'd rather see us
> > continuing to build the bulk of the command the same as before, and
> > introduce new behavior only for deparsing the function body.
>
> BTW, a concrete argument for doing it that way is that if you make a
> backend function that does the whole CREATE-FUNCTION-building job in
> exactly the way pg_dump wants it, that function is nigh useless for
> any other client with slightly different requirements.  A trivial
> example here is that I don't think we want to become locked into
> the proposition that psql's \ef and \sf must print functions exactly
> the same way that pg_dump would.

The fact that the need that psql has and that which pg_dump has are at
least somewhat similar really argues that we should have put this code
into libpgcommon in the first place and not in the backend, and then had
both psql and pg_dump use that.

I'm sure there's a lot of folks who'd like to see more of the logic we
have in pg_dump for building objects from the catalog available to more
tools through libpgcommon- psql being one of the absolute first
use-cases for exactly that (there's certainly no shortage of people
who've asked how they can get a CREATE TABLE statement for a table by
using psql...).

Thanks,

Stephen

Attachment

signature.asc

Re: use pg_get_functiondef() in pg_dump

From

Corey Huinker

Date:

17 August 2020, 23:54:20

I'm sure there's a lot of folks who'd like to see more of the logic we
have in pg_dump for building objects from the catalog available to more
tools through libpgcommon- psql being one of the absolute first
use-cases for exactly that (there's certainly no shortage of people
who've asked how they can get a CREATE TABLE statement for a table by
using psql...).

I count myself among those folks (see https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com for discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server side or client side).

I'm all for having this as "just" as set of pg_get_*def functions, because they allow for the results to be used in queries. Granted, the shape of the result set may not be stable, but that's the sort of thing we can warn for the same way we have warnings for changes to pg_stat_activity. At that point any DESCRIBE/SHOW CREATE server side functions essentially become just shells around the pg_get_*def(), with no particular requirement to make those new commands work inside a SELECT.

Would it be totally out of left field to have the functions have an optional "version" parameter, defaulted to null, that would be used to give backwards compatible results if and when we do make a breaking change?

Re: use pg_get_functiondef() in pg_dump

From

Stephen Frost

Date:

18 August 2020, 13:18:03

Greetings,

* Corey Huinker (corey.huinker@gmail.com) wrote:
> > I'm sure there's a lot of folks who'd like to see more of the logic we
> > have in pg_dump for building objects from the catalog available to more
> > tools through libpgcommon- psql being one of the absolute first
> > use-cases for exactly that (there's certainly no shortage of people
> > who've asked how they can get a CREATE TABLE statement for a table by
> > using psql...).
>
> I count myself among those folks (see
> https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com
> for
> discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server
> side or client side).
>
> I'm all for having this as "just" as set of pg_get_*def functions, because
> they allow for the results to be used in queries. Granted, the shape of the
> result set may not be stable, but that's the sort of thing we can warn for
> the same way we have warnings for changes to pg_stat_activity. At that
> point any DESCRIBE/SHOW CREATE server side functions essentially become
> just shells around the pg_get_*def(), with no particular requirement to
> make those new commands work inside a SELECT.

Another advantage of having this in libpgcommon is that the backend
*and* the frontend could then use it.

> Would it be totally out of left field to have the functions have an
> optional "version" parameter, defaulted to null, that would be used to give
> backwards compatible results if and when we do make a breaking change?

So..  the code that's in pg_dump today works to go from "whatever the
connected server's version is" to "whatever the version is of the
pg_dump command itself".  If we had the code in libpgcommon, and
functions in the backend to get at it along with psql having that code,
you could then, using the code we have today, go from a bunch of
'source' versions to 'target' version of either the version of the psql
command, or that of the server.

That is, consider a future where this is all done and all that crazy
version-specific code in pg_dump has been moved to libpgcommon in v14,
and then you have a v15 psql, so:

psql v15 connected to PG v14:

You do: \dct mytable -- psql internal command to get 'create table'
Result: a CREATE TABLE that works for v15

You do: DESCRIBE mytable; -- PG backend function to get 'create table'
Result: a CREATE TABLE that works for v14

Without having to add anything to what we're already doing (yes, yes,
beyond the complications of moving this stuff into libpgcommon, but at
least we're not having to create some kind of matrix of "source PG
version 10, target PG version 12" into PG14).

A bit crazy, sure, but would certainly be pretty useful.

Thanks,

Stephen

Attachment

signature.asc

Re: use pg_get_functiondef() in pg_dump

From

Tom Lane

Date:

18 August 2020, 17:03:03

Stephen Frost <sfrost@snowman.net> writes:
> So..  the code that's in pg_dump today works to go from "whatever the
> connected server's version is" to "whatever the version is of the
> pg_dump command itself".  If we had the code in libpgcommon, and
> functions in the backend to get at it along with psql having that code,
> you could then, using the code we have today, go from a bunch of
> 'source' versions to 'target' version of either the version of the psql
> command, or that of the server.

At this point, I think I need a high-power telescope even to see the
goalposts :-(

If we actually want to do something like this, we need a plan not just
some handwaving.  Let's start by enumerating the concerns that would
have to be solved.  I can think of:

* Execution context.  Stephen seems to be envisioning code that could be
compiled into the backend not just the frontend, but is that really worth
the trouble?  Could we share such code across FE/BE at all (it'd certainly
be a far more ambitious exercise in common code than we've done to date)?
What's the backend version actually doing, issuing queries over SPI?
(I suppose if you were rigid about that, it could offer a guarantee
that the results match your snapshot, which is pretty attractive.)

* Global vs. per-object activity.  pg_dump likes to query the entire state
of the database to start with, and then follow up by grabbing additional
details about objects it's going to dump.  That's not an operating mode
that most other clients would want, but if for no other reason than
performance, I don't think we can walk away from it for pg_dump ---
indeed, I think pg_dump probably needs to be fixed to do less per-object
querying, not more.  Meanwhile applications such as psql \d would only
want to investigate one object at a time.  What design can we create that
will handle that?  If there is persistent state involved, what in the
world does that mean for the case of a backend-side library?

* Context in which the output is valid.  Target server version was already
mentioned, but a quick examination of pg_dump output scripts will remind
you that there's a bunch more assumptions:

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'en_US.utf8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

not to mention special hackery for object ownership and tablespaces.
Some of these things probably don't matter for other use-cases, but
others definitely do.  In particular, I really doubt that psql and
other clients would find it acceptable to force search_path to a
particular thing.  Which brings us to

* Security.  How robust do the output commands need to be, and
what will we have to do that pg_dump doesn't need to?

* No doubt there are some other topics I didn't think of.

This certainly would be attractive if we had it, but the task
seems dauntingly large.  It's not going to happen without some
fairly serious investment of time.

            regards, tom lane

Re: use pg_get_functiondef() in pg_dump

From

Peter Eisentraut

Date:

26 August 2020, 11:13:22

On 2020-08-15 16:36, Tom Lane wrote:
> I wrote:
>> I wouldn't say that it's *fundamentally* new, but nonethless it disturbs
>> me that this proposal has pg_dump assembling CREATE FUNCTION commands in
>> very different ways depending on the server version.  I'd rather see us
>> continuing to build the bulk of the command the same as before, and
>> introduce new behavior only for deparsing the function body.
> 
> BTW, a concrete argument for doing it that way is that if you make a
> backend function that does the whole CREATE-FUNCTION-building job in
> exactly the way pg_dump wants it, that function is nigh useless for
> any other client with slightly different requirements.  A trivial
> example here is that I don't think we want to become locked into
> the proposition that psql's \ef and \sf must print functions exactly
> the same way that pg_dump would.

That's why the patch adds optional arguments to the function to choose 
the behavior that is appropriate for the situation.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: use pg_get_functiondef() in pg_dump

From

Peter Eisentraut

Date:

26 August 2020, 11:14:41

On 2020-08-15 16:23, Tom Lane wrote:
> I wouldn't say that it's*fundamentally*  new, but nonethless it disturbs
> me that this proposal has pg_dump assembling CREATE FUNCTION commands in
> very different ways depending on the server version.  I'd rather see us
> continuing to build the bulk of the command the same as before, and
> introduce new behavior only for deparsing the function body.

OK, I'll work on something like that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services