Thread: machine-parseable object descriptions

machine-parseable object descriptions

From
Alvaro Herrera
Date:
For Dimitri's patch to add support for dropped objects in event
triggers, there's an open question about how to report objects that are
being dropped in a tabular format.  What I proposed last had three
columns: (type, schema, identity).  The "type" is a description of the
object class; I propose the following list:

table
view
materialized view
foreign table
sequence
...
function
aggregate
type
cast
collation
table constraint
domain constraint
default value
...
operator of access method
function of access method
...
table column
view column
materialized view column
...
default acl for relations
default acl for sequences
default acl for functions
default acl for types

Ellipses hide uninteresting examples.  Opinions on this please?


After the object type we have the schema, which would be NULL for
objects that don't belong to schemas (extensions, access methods, casts,
languages, etc).

Then we have the identity, which is a texualt representation of the
other stuff needed to uniquely identify the object being referred to.
For most stuff it's just the name, but for other things it is a bit more
complex.  For example, for a function it'd be the function signature;
for a table (and for most other object classes) it's the table name.
For columns, it's the qualified column name i.e.
<tablename>.<columnname>.  Note the schema name never appears here, even
if the object is not in path; the whole point of this is to be
machine-parseable and we have the schema separately anyway.  So some
tweaks to functions such as format_type_internal are necessary, to
supress schema-qualifying when not wanted.

So the main thing here is about the particular format for each specific
object class.  For example, it seems to me we need to schema-qualify
type names in function signatures.  For casts I propose "(%s as %s)",
where each %s is a schema-qualified type name.  For constraints we can
use "%s on %s" where there's no schema qualification (the schema column
carries the table's schema, for the constraint must always belong to the
same schema).  For operators, we use "%s(%s, %s)" where the first %s is
the operator name and the other two are schema-qualified type names.
For default values (pg_attrdef tuples) we use "for %s" where the %s is
the column's identity (i.e. <tabname>.<colname>; again not
schema-qualified because the attrdef "is" in the same schema as the
table)).  For operator classes and families we use "%s for %s" where the
first one is the opclass/opfam name, and the second is the AM name.  For
pg_amop and pg_amproc objects we use "%d (%s, %s) of %s": strategy
number, qualified type names, and AM name.  For rules and triggers, "%s
on %s": rule name and relation name (unqualified).  For all the rest, we
simply use the unqualified object name.

pg_default_acl objects are the funniest of all, and they end up as
"belonging to role %s in schema %s".  I am especially open to
suggestions in this one.

I think this kind of machine-parseable object representation would be
useful for cases other than event triggers, so I am thinking in
committing this part before the pg_dropped_objects patch.  I haven't
split it out yet from the event triggers patch; I will be sending a
patch later, in the meantime I welcome comments on the above.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: machine-parseable object descriptions

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> For Dimitri's patch to add support for dropped objects in event
> triggers, there's an open question about how to report objects that are
> being dropped in a tabular format.  What I proposed last had three
> columns: (type, schema, identity).  The "type" is a description of the
> object class; I propose the following list:

I'm okay with the proposed type names.

> After the object type we have the schema, which would be NULL for
> objects that don't belong to schemas (extensions, access methods, casts,
> languages, etc).

> Then we have the identity, which is a texualt representation of the
> other stuff needed to uniquely identify the object being referred to.

I think you should seriously consider dropping the separate "schema"
column and instead plugging the schema in at the appropriate place
in the identity string (which, evidently, is not always going to be
at the front).  Otherwise, how will client code know how to assemble the
schema onto the identity to make a usable name?  It's also pretty weird
that some of the names appearing in an identity will be fully qualified
but the most important one isn't.

I could also live with keeping the schema column as proposed, if people
think it has a use, but letting it be redundant with a schema name
included in the identity string.  But it seems like a bad idea to try to
shear schema off of identity.
        regards, tom lane



Re: machine-parseable object descriptions

From
Dimitri Fontaine
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> For Dimitri's patch to add support for dropped objects in event
> triggers, there's an open question about how to report objects that are
> being dropped in a tabular format.  What I proposed last had three
> columns: (type, schema, identity).  The "type" is a description of the
> object class; I propose the following list:
>
> ...
>
> Ellipses hide uninteresting examples.  Opinions on this please?

In my view, it needs to be the same thing as the second part of the
command tag used for commands on the given type of objects.

> Then we have the identity, which is a texualt representation of the
> other stuff needed to uniquely identify the object being referred to.
> For most stuff it's just the name, but for other things it is a bit more
> complex.  For example, for a function it'd be the function signature;
> for a table (and for most other object classes) it's the table name.
> For columns, it's the qualified column name i.e.
> <tablename>.<columnname>.  Note the schema name never appears here, even
> if the object is not in path; the whole point of this is to be
> machine-parseable and we have the schema separately anyway.  So some
> tweaks to functions such as format_type_internal are necessary, to
> supress schema-qualifying when not wanted.

As long as we keep also the name in a separate field, I'm in. Also, that
needs to be properly quoted by the backend (MixedCase, extra dots, etc)
as we just said on an external communication channel.

> I think this kind of machine-parseable object representation would be
> useful for cases other than event triggers, so I am thinking in
> committing this part before the pg_dropped_objects patch.  I haven't
> split it out yet from the event triggers patch; I will be sending a
> patch later, in the meantime I welcome comments on the above.

+1

Maybe that can also set the debate for which information to pass down in
Event Trigger User Functions and how. Still don't want to make that a
datatype so that we expose a single variable per object?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: machine-parseable object descriptions

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> I could also live with keeping the schema column as proposed, if people
> think it has a use, but letting it be redundant with a schema name
> included in the identity string.  But it seems like a bad idea to try to
> shear schema off of identity.

+1

Use case for keeping the extra column: replication to a federated
database where you want to tweak the target schema.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: machine-parseable object descriptions

From
Darren Duncan
Date:
On 2013.03.18 1:03 PM, Alvaro Herrera wrote:
> For Dimitri's patch to add support for dropped objects in event
> triggers, there's an open question about how to report objects that are
> being dropped in a tabular format.

Now that JSON is a built-in type with 9.2+, could we not perhaps use that to 
represent some things in a more natural manner than a tabular format?  JSON is 
designed to be machine-parseable, and some objects such as routine definitions 
are naturally trees of arbitrary depth. -- Darren Duncan





Re: machine-parseable object descriptions

From
Alvaro Herrera
Date:
Dimitri Fontaine wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > I could also live with keeping the schema column as proposed, if people
> > think it has a use, but letting it be redundant with a schema name
> > included in the identity string.  But it seems like a bad idea to try to
> > shear schema off of identity.
>
> +1
>
> Use case for keeping the extra column: replication to a federated
> database where you want to tweak the target schema.

If I understood our IM discussion correctly, you were saying that for
federated database replication you wanted a separate "name" column, from
which you could extract a table name easily; not that you wanted a
separate schema column.  Anyway the schema column is also present.  We
can easily remove columns before commit, if we decide we don't want
them.

In the attached patch, we have these three columns: a "name" column,
which is the object's bare name; a "schema" column, which is the schema;
and an "identity" column, which is the whole thing, with all the schema
qualifications that apply.  There's also the type, of course.

I added the name column because it seems to me that it is genuinely
useful for some use cases.  However, there are two problems: first, the
original implementation had a slight bug in the case of column objects
(it displayed the name of the relation, not the name of the column), and
two I was afraid it'd be easily misused.  One way to attack both things
at once would to be make it NULL except in the cases where it's a truly
unique identifier (tables, schemas, etc).  To avoid this being a
standalone "whitelist" of catalogs (which would get outdated quickly, I
fear), I propose to add a new boolean option in ObjectProperty, which
specifies whether the name can be used as an identifier.  I have
implemented it that way in the attached patch.


The new identity column is amazingly verbose on things like pg_amproc entries:
10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist:
pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal)10651|
2(pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist:
pg_catalog.gist_box_union(pg_catalog.internal,pg_catalog.internal)10652| 3 (pg_catalog.point, pg_catalog.point) of
pg_catalog.point_opsfor gist: pg_catalog.gist_point_compress(pg_catalog.internal)10653 | 4 (pg_catalog.point,
pg_catalog.point)of pg_catalog.point_ops for gist: pg_catalog.gist_box_decompress(pg_catalog.internal)10654 | 5
(pg_catalog.point,pg_catalog.point) of pg_catalog.point_ops for gist:
pg_catalog.gist_box_penalty(pg_catalog.internal,pg_catalog.internal,pg_catalog.internal)10655| 6 (pg_catalog.point,
pg_catalog.point)of pg_catalog.point_ops for gist:
pg_catalog.gist_box_picksplit(pg_catalog.internal,pg_catalog.internal)10656| 7 (pg_catalog.point, pg_catalog.point) of
pg_catalog.point_opsfor gist: pg_catalog.gist_box_same(pg_catalog.box,pg_catalog.box,pg_catalog.internal)10657 | 8
(pg_catalog.point,pg_catalog.point) of pg_catalog.point_ops for gist:
pg_catalog.gist_point_distance(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid)

Also, note how types with standard-specified names ("integer") are not
qualified (which is the right thing, AFAICT).   Here's another interesting
example:

alvherre=# create type public.integer as enum ('uno', 'dos');
CREATE TYPE

alvherre=# select * from pg_identify_object('pg_type'::regclass, 'integer'::regtype, 0);type |   schema   | name |
identity 
------+------------+------+----------type | pg_catalog | int4 | integer
(1 fila)

alvherre=# select * from pg_identify_object('pg_type'::regclass, 'public.integer'::regtype, 0);type | schema |   name
|     identity      
------+--------+-----------+------------------type | public | "integer" | public."integer"
(1 fila)

If you create a public.int4 type, there's no confusion either, so it's
all consistent.

Here's another bit of sample output, from pg_depend contents (at the
left there's the referencing object, at the right the referenced
object):

alvherre=# select deptype, refd.*, ref.* from pg_depend, lateral (select * from pg_identify_object(classid, objid,
objsubid)) refd, lateral (select * from pg_identify_object(refclassid, refobjid, refobjsubid)) ref where classid <> 0
andrefd.schema <> 'pg_catalog' and ref.schema <> 'information_schema' and refd.schema <> 'pg_toast';deptype |
type       |    schema    |     name     |                           identity                           |     type
|   schema    |    name     |                 identity                   

---------+-------------------+--------------+--------------+--------------------------------------------------------------+--------------+--------------+-------------+-------------------------------------------a
     | domain constraint | public       |              | "my constr" on public.mydom                                  |
type        | public       | mydom       | public.mydomi       | type              | "the schema" | "the table"  | "the
schema"."thetable"                                     | table        | "the schema" | "the table" | "the schema"."the
table"i      | type              | "the schema" | "_the table" | "the schema"."the table"[]
     | type         | "the schema" | "the table" | "the schema"."the table"i       | type              | public       |
qx          | public.qx                                                    | table        | public       | qx
|public.qxi       | type              | public       | _qx          | public.qx[]
          | type         | public       | qx          | public.qxa       | table constraint  | "the schema" |
  | "the table_another column_check" on "the schema"."the table" | table column | "the schema" | "the table" | "the
schema"."thetable"."another column"n       | table constraint  | "the schema" |              | "the table_another
column_check"on "the schema"."the table" | table column | "the schema" | "the table" | "the schema"."the
table"."anothercolumn"i       | type              | public       | _integer     | public."integer"[]
                      | type         | public       | "integer"   | public."integer"i       | type              |
public      | _int4        | public.int4[]                                                | type         | public
|int4        | public.int4 
(9 filas)


alvherre=# \d "the schema"."the table"       Tabla «the schema.the table»   Columna     |  Tipo   | Modificadores
----------------+---------+---------------the column     | integer | another column | integer |
Restricciones CHECK:   "the table_another column_check" CHECK ("another column" > 0)

All in all, I'm happy with this and I'm considering committing it as
soon as we agree on the set of columns.  I'm mildly on the side of
removing the separate schema column and keeping name, so we'd have
type/name/identity.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: machine-parseable object descriptions

From
Alvaro Herrera
Date:
.. and here's the patch.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: machine-parseable object descriptions

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> .. and here's the patch.

I forgot an example involving the funniest of all object classes:
default ACLs.  Here it is.

alvherre=# create role rol1;
CREATE ROLE
alvherre=# create role rol2;
CREATE ROLE
alvherre=# create schema rol1 authorization rol1;
CREATE SCHEMA
alvherre=# alter default privileges for role rol1, rol2 grant insert on tables to alvherre;
ALTER DEFAULT PRIVILEGES
alvherre=# alter default privileges for role rol1 in schema rol1 grant insert on tables to alvherre;
ALTER DEFAULT PRIVILEGES
alvherre=# select oid,foo.* from pg_default_acl, lateral (select * from pg_identify_object(tableoid, oid, 0)) foo ; oid
|    type     | schema | name |              identity                
-------+-------------+--------+------+-------------------------------------48839 | default acl |        |      | for
rolerol1 on tables48840 | default acl |        |      | for role rol2 on tables48844 | default acl |        |      |
forrole rol1 in schema rol1 on tables 
(4 filas)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: machine-parseable object descriptions

From
Dimitri Fontaine
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> All in all, I'm happy with this and I'm considering committing it as
> soon as we agree on the set of columns.  I'm mildly on the side of
> removing the separate schema column and keeping name, so we'd have
> type/name/identity.

I would prefer that we keep the schema column, for easier per-schema
processing or filtering. It's way eaiser to provide it in a separate
column than to ask people to parse it back from the identity column.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: machine-parseable object descriptions

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> The new identity column is amazingly verbose on things like pg_amproc entries:

>  10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist:
pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal)

Uh ... isn't that confusing the *identity* of the pg_amproc entry with
its *content*?  I would say that the function reference doesn't belong
there.  You do need the rest.  I would also suggest that you prepend
the word "function" (or "operator" for pg_amop), so that it reads like
"function 1 (typename, typename) of opfamilyname for amname".
        regards, tom lane



Re: machine-parseable object descriptions

From
Alvaro Herrera
Date:
Dimitri Fontaine wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > All in all, I'm happy with this and I'm considering committing it as
> > soon as we agree on the set of columns.  I'm mildly on the side of
> > removing the separate schema column and keeping name, so we'd have
> > type/name/identity.
>
> I would prefer that we keep the schema column, for easier per-schema
> processing or filtering. It's way eaiser to provide it in a separate
> column than to ask people to parse it back from the identity column.

Okay.

Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > The new identity column is amazingly verbose on things like pg_amproc entries:
>
> >  10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist:
pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal)
>
> Uh ... isn't that confusing the *identity* of the pg_amproc entry with
> its *content*?  I would say that the function reference doesn't belong
> there.  You do need the rest.  I would also suggest that you prepend
> the word "function" (or "operator" for pg_amop), so that it reads like
> "function 1 (typename, typename) of opfamilyname for amname".

Uhm, yeah, fixed.

So here's a final version of the patch.  With docs too.

One change I made was to move all the new code from dependency.c into
objectaddress.c.  The only reason it was in dependency.c was that
getObjectDescription was there in the first place; but it doesn't seem
to me that it really belongs there.  (Back when it was first created,
there was no objectaddress.c at all, and dependency.c was the only user
of it.)  If there were no backpatching considerations, I would suggest
we move getObjectDescription() to objectaddress.c as well, but I'm not
sure it's worth the trouble, but I'm not wedded to that if somebody
thinks both things should be kept together.

Finally: it'd be nice to be able to get pg_am identities with these
functions too.  Then you could use a simple query to get object
identities + descriptions from pg_description (right now you have to
exclude that catalog specifically, otherwise the query bombs out).  But
it'd be a lot of trouble, and since these objects are not really
pluggable, I'm not bothering.  We can always add it later if there's
more interesting use for it.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: machine-parseable object descriptions

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> One change I made was to move all the new code from dependency.c into
> objectaddress.c.  The only reason it was in dependency.c was that
> getObjectDescription was there in the first place; but it doesn't seem
> to me that it really belongs there.  (Back when it was first created,
> there was no objectaddress.c at all, and dependency.c was the only user
> of it.)  If there were no backpatching considerations, I would suggest
> we move getObjectDescription() to objectaddress.c as well, but I'm not
> sure it's worth the trouble, but I'm not wedded to that if somebody
> thinks both things should be kept together.

+1 for moving getObjectDescription to objectaddress.c.  As you say,
that's probably where it would've been if that file had existed at
the time.  I don't recall that we've had to back-patch many changes
in that function, so I don't think that concern is major.

> Finally: it'd be nice to be able to get pg_am identities with these
> functions too.  Then you could use a simple query to get object
> identities + descriptions from pg_description (right now you have to
> exclude that catalog specifically, otherwise the query bombs out).  But
> it'd be a lot of trouble, and since these objects are not really
> pluggable, I'm not bothering.  We can always add it later if there's
> more interesting use for it.

I think that would be a good thing to add, but no objection to leaving
it for a follow-on patch.
        regards, tom lane



Re: machine-parseable object descriptions

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > One change I made was to move all the new code from dependency.c into
> > objectaddress.c.  The only reason it was in dependency.c was that
> > getObjectDescription was there in the first place; but it doesn't seem
> > to me that it really belongs there.  (Back when it was first created,
> > there was no objectaddress.c at all, and dependency.c was the only user
> > of it.)  If there were no backpatching considerations, I would suggest
> > we move getObjectDescription() to objectaddress.c as well, but I'm not
> > sure it's worth the trouble, but I'm not wedded to that if somebody
> > thinks both things should be kept together.
>
> +1 for moving getObjectDescription to objectaddress.c.  As you say,
> that's probably where it would've been if that file had existed at
> the time.  I don't recall that we've had to back-patch many changes
> in that function, so I don't think that concern is major.

Okay, I have pushed it with that change.  Thanks for the quick feedback.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services