Re: deparsing utility commands - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: deparsing utility commands
Date
Msg-id 20150224004843.GE29780@tamriel.snowman.net
Whole thread Raw
In response to Re: deparsing utility commands  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: deparsing utility commands  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2015-02-21 14:51:32 -0500, Stephen Frost wrote:
> > It'd be *really* nice to be able to pass an object identifier to some
> > function and get back the CREATE (in particular, though perhaps DROP, or
> > whatever) command for it.  This gets us *awful* close to that without
> > actually giving it to us and that's bugging me.  The issue is the
> > parsetree, which I understand may be required in some cases (ALTER,
> > primairly, it seems?), but isn't always.
>
> > The USING and WITH CHECK quals can be extracted from the polForm also,
> > of course, it's what psql and pg_dump are doing, after all.
> >
> > So, why depend on the parsetree?  What about have another argument which
> > a user could use without the parsetree, for things that it's absolutely
> > required for today (eg: ALTER sub-command differentiation)?
>
> I'm really not wild about pretty massively expanding the scope at the
> eleventh hour.  There's a fair number of commands where this the
> deparsing command will give you a bunch of commands - look at CREATE
> TABLE and CREATE SCHEMA ... for the most extreme examples. For those
> there's no chance to do that without the parse tree available.

For my 2c, I don't agree with either of your assumptions above.  I don't
see this as a massive expansion of the scope, nor that we're in the 11th
hour at this point.  Agreed, it's the last commitfest, but we're near
the beginning of it and Alvaro has already made modifications to the
patch set, as is generally expected to happen for any patch in a
commitfest, without much trouble.  The changes which I'm suggesting are
nearly trivial for the larger body of work and only slightly more than
trivial for the more complicated pieces.

> Yes, it might be possible to use the same code for a bunch of minor
> commands, but not for the interesting/complex stuff.

We can clearly rebuild at least CREATE commands for all objects without
access to the parse tree, obviously pg_dump manages somehow.  I didn't
specify that a single command had to be used.  Further, the actual
deparse_CreateStmt doesn't look like it'd have a terribly hard time
producing something without access to the parsetree.

The whole premise of this approach is that we're using the results of
the catalog changes as the basis for what's needed to recreate the
command.  The places where we're referring to the parsetree look more
like a crutch of convenience than for any particular good reason to.
Consider this part of 0011 which includes deparse_CreateStmt:

/** Process table elements: column definitions and constraints.  Only* the column definitions are obtained from the
parsenode itself.  To* get constraints we rely on pg_constraint, because the parse node* might be missing some things
suchas the name of the constraints.*/ 

and then looking down through to deparse_ColumnDef, we see that the
ColumnDef passed in is used almost exclusivly for the *column name*
(which is used to look up the entry in pg_attribute..).  Looping over
the pg_attribute entries for the table in the attnum order would
certainly return the same result, or something has gone wrong.

Beyond the inheiritance consideration, the only other reference actually
leads to what you argued (correctly, in my view) was a mistake in the
code- where a NOT NULL is omitted for the primary key case.  Weren't you
also complaining about the 'OF type' form being a potential issue?
That'd also go away with the approach I'm advocating.  In short, I
suspect that if this approach had been taken originally, at least some
of the concerns and issued levied against the current implementation
wouldn't exist.  Thankfully, the way the code has been developed, the
majority of the code is general infrastructure and the changes I'm
suggesting are all in code which is simpler, thanks to that
infrastructure already being there.

All I'm suggesting is that we focus on collecting the information from
the catalog and avoid using the parsetree.  For at least the CREATE and
DROP cases, that should be entirely possible.  This does end up being a
bit different from the original goal, which was closer to "reproduce
exactly the command that was specified," but as shown above, that
probably wasn't what we ever really wanted.  The original command is
ambiguous on a number of levels and even where it isn't we can get the
canonical information we need straight from the catalog.

> > Maybe that's possible and maybe it isn't, but at least for the CREATE
> > cases we should be able to avoid forcing a user to provide a built
> > parsetree, and that'd be *really* nice and open up this feature to
> > quite a few other use-cases that I can think of.  I'd further suggest
> > that we provide these command to the SQL level, and then have a
> > wrapper which will take the name of an object, resolve it to Oid, and
> > then pass back the CREATE command for it.
>
> I think this is a different feature that might end up sharing some of
> the infrastructure, but not more.

I know you've looked through this code also and I really don't get the
comment that only "some" of this infrastructure would be shared.  As I
tried to point out, for the 'CREATE POLICY' case, a few minor
modifications would have it provide exactly what I'm suggesting and I'm
sure that most of the cases would be similar.  Simply looking through
the code with an eye towards avoiding the parsetree in favor of pulling
information from the catalog would illustrate the point I'm making, I
believe.

> > For as much commentary as there has been about event triggers and
> > replication and BDR, and about how this is going to end up being a
> > little-used side-feature, I'm amazed that there has been very little
> > discussion about how this would finally put into the backend the ability
> > to build up CREATE commands for objects and remove the need to use
> > pg_dump for that (something which isn't exactly fun for GUI applications
> > and other use-cases).
>
> I don't think it's all that related, so I'm not surprised. If you
> execute a CREATE TABLE(id serial primary key); you'll get a bunch of
> commands with this facility - it'd be much less clear what exactly shall
> be dumped in the case of some hypothetical GUI when you want to dump the
> table.

I really don't think it's all that strange or complicated to consider
and we've got a rather nice example of what a good approach would be.

This may only apply to the CREATE and DROP cases, but that's no small
thing.

Apologies for taking so long to reply, it wasn't intentional.
Unfortunately, I'm not going to have a lot of time tomorrow either but I
hope to find time later in the week to resume review and perhaps to
provide code to back the approach I'm advocating.
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Abbreviated keys for text cost model fix
Next
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT UPDATE and logical decoding