Thread: machine-parseable object descriptions
For Dimitri's patch to add support for dropped objects in event triggers, there's an open question about how to report objects that are being dropped in a tabular format. What I proposed last had three columns: (type, schema, identity). The "type" is a description of the object class; I propose the following list: table view materialized view foreign table sequence ... function aggregate type cast collation table constraint domain constraint default value ... operator of access method function of access method ... table column view column materialized view column ... default acl for relations default acl for sequences default acl for functions default acl for types Ellipses hide uninteresting examples. Opinions on this please? After the object type we have the schema, which would be NULL for objects that don't belong to schemas (extensions, access methods, casts, languages, etc). Then we have the identity, which is a texualt representation of the other stuff needed to uniquely identify the object being referred to. For most stuff it's just the name, but for other things it is a bit more complex. For example, for a function it'd be the function signature; for a table (and for most other object classes) it's the table name. For columns, it's the qualified column name i.e. <tablename>.<columnname>. Note the schema name never appears here, even if the object is not in path; the whole point of this is to be machine-parseable and we have the schema separately anyway. So some tweaks to functions such as format_type_internal are necessary, to supress schema-qualifying when not wanted. So the main thing here is about the particular format for each specific object class. For example, it seems to me we need to schema-qualify type names in function signatures. For casts I propose "(%s as %s)", where each %s is a schema-qualified type name. For constraints we can use "%s on %s" where there's no schema qualification (the schema column carries the table's schema, for the constraint must always belong to the same schema). For operators, we use "%s(%s, %s)" where the first %s is the operator name and the other two are schema-qualified type names. For default values (pg_attrdef tuples) we use "for %s" where the %s is the column's identity (i.e. <tabname>.<colname>; again not schema-qualified because the attrdef "is" in the same schema as the table)). For operator classes and families we use "%s for %s" where the first one is the opclass/opfam name, and the second is the AM name. For pg_amop and pg_amproc objects we use "%d (%s, %s) of %s": strategy number, qualified type names, and AM name. For rules and triggers, "%s on %s": rule name and relation name (unqualified). For all the rest, we simply use the unqualified object name. pg_default_acl objects are the funniest of all, and they end up as "belonging to role %s in schema %s". I am especially open to suggestions in this one. I think this kind of machine-parseable object representation would be useful for cases other than event triggers, so I am thinking in committing this part before the pg_dropped_objects patch. I haven't split it out yet from the event triggers patch; I will be sending a patch later, in the meantime I welcome comments on the above. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > For Dimitri's patch to add support for dropped objects in event > triggers, there's an open question about how to report objects that are > being dropped in a tabular format. What I proposed last had three > columns: (type, schema, identity). The "type" is a description of the > object class; I propose the following list: I'm okay with the proposed type names. > After the object type we have the schema, which would be NULL for > objects that don't belong to schemas (extensions, access methods, casts, > languages, etc). > Then we have the identity, which is a texualt representation of the > other stuff needed to uniquely identify the object being referred to. I think you should seriously consider dropping the separate "schema" column and instead plugging the schema in at the appropriate place in the identity string (which, evidently, is not always going to be at the front). Otherwise, how will client code know how to assemble the schema onto the identity to make a usable name? It's also pretty weird that some of the names appearing in an identity will be fully qualified but the most important one isn't. I could also live with keeping the schema column as proposed, if people think it has a use, but letting it be redundant with a schema name included in the identity string. But it seems like a bad idea to try to shear schema off of identity. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > For Dimitri's patch to add support for dropped objects in event > triggers, there's an open question about how to report objects that are > being dropped in a tabular format. What I proposed last had three > columns: (type, schema, identity). The "type" is a description of the > object class; I propose the following list: > > ... > > Ellipses hide uninteresting examples. Opinions on this please? In my view, it needs to be the same thing as the second part of the command tag used for commands on the given type of objects. > Then we have the identity, which is a texualt representation of the > other stuff needed to uniquely identify the object being referred to. > For most stuff it's just the name, but for other things it is a bit more > complex. For example, for a function it'd be the function signature; > for a table (and for most other object classes) it's the table name. > For columns, it's the qualified column name i.e. > <tablename>.<columnname>. Note the schema name never appears here, even > if the object is not in path; the whole point of this is to be > machine-parseable and we have the schema separately anyway. So some > tweaks to functions such as format_type_internal are necessary, to > supress schema-qualifying when not wanted. As long as we keep also the name in a separate field, I'm in. Also, that needs to be properly quoted by the backend (MixedCase, extra dots, etc) as we just said on an external communication channel. > I think this kind of machine-parseable object representation would be > useful for cases other than event triggers, so I am thinking in > committing this part before the pg_dropped_objects patch. I haven't > split it out yet from the event triggers patch; I will be sending a > patch later, in the meantime I welcome comments on the above. +1 Maybe that can also set the debate for which information to pass down in Event Trigger User Functions and how. Still don't want to make that a datatype so that we expose a single variable per object? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Tom Lane <tgl@sss.pgh.pa.us> writes: > I could also live with keeping the schema column as proposed, if people > think it has a use, but letting it be redundant with a schema name > included in the identity string. But it seems like a bad idea to try to > shear schema off of identity. +1 Use case for keeping the extra column: replication to a federated database where you want to tweak the target schema. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 2013.03.18 1:03 PM, Alvaro Herrera wrote: > For Dimitri's patch to add support for dropped objects in event > triggers, there's an open question about how to report objects that are > being dropped in a tabular format. Now that JSON is a built-in type with 9.2+, could we not perhaps use that to represent some things in a more natural manner than a tabular format? JSON is designed to be machine-parseable, and some objects such as routine definitions are naturally trees of arbitrary depth. -- Darren Duncan
Dimitri Fontaine wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > I could also live with keeping the schema column as proposed, if people > > think it has a use, but letting it be redundant with a schema name > > included in the identity string. But it seems like a bad idea to try to > > shear schema off of identity. > > +1 > > Use case for keeping the extra column: replication to a federated > database where you want to tweak the target schema. If I understood our IM discussion correctly, you were saying that for federated database replication you wanted a separate "name" column, from which you could extract a table name easily; not that you wanted a separate schema column. Anyway the schema column is also present. We can easily remove columns before commit, if we decide we don't want them. In the attached patch, we have these three columns: a "name" column, which is the object's bare name; a "schema" column, which is the schema; and an "identity" column, which is the whole thing, with all the schema qualifications that apply. There's also the type, of course. I added the name column because it seems to me that it is genuinely useful for some use cases. However, there are two problems: first, the original implementation had a slight bug in the case of column objects (it displayed the name of the relation, not the name of the column), and two I was afraid it'd be easily misused. One way to attack both things at once would to be make it NULL except in the cases where it's a truly unique identifier (tables, schemas, etc). To avoid this being a standalone "whitelist" of catalogs (which would get outdated quickly, I fear), I propose to add a new boolean option in ObjectProperty, which specifies whether the name can be used as an identifier. I have implemented it that way in the attached patch. The new identity column is amazingly verbose on things like pg_amproc entries: 10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal)10651| 2(pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_box_union(pg_catalog.internal,pg_catalog.internal)10652| 3 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_opsfor gist: pg_catalog.gist_point_compress(pg_catalog.internal)10653 | 4 (pg_catalog.point, pg_catalog.point)of pg_catalog.point_ops for gist: pg_catalog.gist_box_decompress(pg_catalog.internal)10654 | 5 (pg_catalog.point,pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_box_penalty(pg_catalog.internal,pg_catalog.internal,pg_catalog.internal)10655| 6 (pg_catalog.point, pg_catalog.point)of pg_catalog.point_ops for gist: pg_catalog.gist_box_picksplit(pg_catalog.internal,pg_catalog.internal)10656| 7 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_opsfor gist: pg_catalog.gist_box_same(pg_catalog.box,pg_catalog.box,pg_catalog.internal)10657 | 8 (pg_catalog.point,pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_point_distance(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid) Also, note how types with standard-specified names ("integer") are not qualified (which is the right thing, AFAICT). Here's another interesting example: alvherre=# create type public.integer as enum ('uno', 'dos'); CREATE TYPE alvherre=# select * from pg_identify_object('pg_type'::regclass, 'integer'::regtype, 0);type | schema | name | identity ------+------------+------+----------type | pg_catalog | int4 | integer (1 fila) alvherre=# select * from pg_identify_object('pg_type'::regclass, 'public.integer'::regtype, 0);type | schema | name | identity ------+--------+-----------+------------------type | public | "integer" | public."integer" (1 fila) If you create a public.int4 type, there's no confusion either, so it's all consistent. Here's another bit of sample output, from pg_depend contents (at the left there's the referencing object, at the right the referenced object): alvherre=# select deptype, refd.*, ref.* from pg_depend, lateral (select * from pg_identify_object(classid, objid, objsubid)) refd, lateral (select * from pg_identify_object(refclassid, refobjid, refobjsubid)) ref where classid <> 0 andrefd.schema <> 'pg_catalog' and ref.schema <> 'information_schema' and refd.schema <> 'pg_toast';deptype | type | schema | name | identity | type | schema | name | identity ---------+-------------------+--------------+--------------+--------------------------------------------------------------+--------------+--------------+-------------+-------------------------------------------a | domain constraint | public | | "my constr" on public.mydom | type | public | mydom | public.mydomi | type | "the schema" | "the table" | "the schema"."thetable" | table | "the schema" | "the table" | "the schema"."the table"i | type | "the schema" | "_the table" | "the schema"."the table"[] | type | "the schema" | "the table" | "the schema"."the table"i | type | public | qx | public.qx | table | public | qx |public.qxi | type | public | _qx | public.qx[] | type | public | qx | public.qxa | table constraint | "the schema" | | "the table_another column_check" on "the schema"."the table" | table column | "the schema" | "the table" | "the schema"."thetable"."another column"n | table constraint | "the schema" | | "the table_another column_check"on "the schema"."the table" | table column | "the schema" | "the table" | "the schema"."the table"."anothercolumn"i | type | public | _integer | public."integer"[] | type | public | "integer" | public."integer"i | type | public | _int4 | public.int4[] | type | public |int4 | public.int4 (9 filas) alvherre=# \d "the schema"."the table" Tabla «the schema.the table» Columna | Tipo | Modificadores ----------------+---------+---------------the column | integer | another column | integer | Restricciones CHECK: "the table_another column_check" CHECK ("another column" > 0) All in all, I'm happy with this and I'm considering committing it as soon as we agree on the set of columns. I'm mildly on the side of removing the separate schema column and keeping name, so we'd have type/name/identity. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
.. and here's the patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Alvaro Herrera wrote: > .. and here's the patch. I forgot an example involving the funniest of all object classes: default ACLs. Here it is. alvherre=# create role rol1; CREATE ROLE alvherre=# create role rol2; CREATE ROLE alvherre=# create schema rol1 authorization rol1; CREATE SCHEMA alvherre=# alter default privileges for role rol1, rol2 grant insert on tables to alvherre; ALTER DEFAULT PRIVILEGES alvherre=# alter default privileges for role rol1 in schema rol1 grant insert on tables to alvherre; ALTER DEFAULT PRIVILEGES alvherre=# select oid,foo.* from pg_default_acl, lateral (select * from pg_identify_object(tableoid, oid, 0)) foo ; oid | type | schema | name | identity -------+-------------+--------+------+-------------------------------------48839 | default acl | | | for rolerol1 on tables48840 | default acl | | | for role rol2 on tables48844 | default acl | | | forrole rol1 in schema rol1 on tables (4 filas) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > All in all, I'm happy with this and I'm considering committing it as > soon as we agree on the set of columns. I'm mildly on the side of > removing the separate schema column and keeping name, so we'd have > type/name/identity. I would prefer that we keep the schema column, for easier per-schema processing or filtering. It's way eaiser to provide it in a separate column than to ask people to parse it back from the identity column. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > The new identity column is amazingly verbose on things like pg_amproc entries: > 10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal) Uh ... isn't that confusing the *identity* of the pg_amproc entry with its *content*? I would say that the function reference doesn't belong there. You do need the rest. I would also suggest that you prepend the word "function" (or "operator" for pg_amop), so that it reads like "function 1 (typename, typename) of opfamilyname for amname". regards, tom lane
Dimitri Fontaine wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > All in all, I'm happy with this and I'm considering committing it as > > soon as we agree on the set of columns. I'm mildly on the side of > > removing the separate schema column and keeping name, so we'd have > > type/name/identity. > > I would prefer that we keep the schema column, for easier per-schema > processing or filtering. It's way eaiser to provide it in a separate > column than to ask people to parse it back from the identity column. Okay. Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > The new identity column is amazingly verbose on things like pg_amproc entries: > > > 10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal) > > Uh ... isn't that confusing the *identity* of the pg_amproc entry with > its *content*? I would say that the function reference doesn't belong > there. You do need the rest. I would also suggest that you prepend > the word "function" (or "operator" for pg_amop), so that it reads like > "function 1 (typename, typename) of opfamilyname for amname". Uhm, yeah, fixed. So here's a final version of the patch. With docs too. One change I made was to move all the new code from dependency.c into objectaddress.c. The only reason it was in dependency.c was that getObjectDescription was there in the first place; but it doesn't seem to me that it really belongs there. (Back when it was first created, there was no objectaddress.c at all, and dependency.c was the only user of it.) If there were no backpatching considerations, I would suggest we move getObjectDescription() to objectaddress.c as well, but I'm not sure it's worth the trouble, but I'm not wedded to that if somebody thinks both things should be kept together. Finally: it'd be nice to be able to get pg_am identities with these functions too. Then you could use a simple query to get object identities + descriptions from pg_description (right now you have to exclude that catalog specifically, otherwise the query bombs out). But it'd be a lot of trouble, and since these objects are not really pluggable, I'm not bothering. We can always add it later if there's more interesting use for it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > One change I made was to move all the new code from dependency.c into > objectaddress.c. The only reason it was in dependency.c was that > getObjectDescription was there in the first place; but it doesn't seem > to me that it really belongs there. (Back when it was first created, > there was no objectaddress.c at all, and dependency.c was the only user > of it.) If there were no backpatching considerations, I would suggest > we move getObjectDescription() to objectaddress.c as well, but I'm not > sure it's worth the trouble, but I'm not wedded to that if somebody > thinks both things should be kept together. +1 for moving getObjectDescription to objectaddress.c. As you say, that's probably where it would've been if that file had existed at the time. I don't recall that we've had to back-patch many changes in that function, so I don't think that concern is major. > Finally: it'd be nice to be able to get pg_am identities with these > functions too. Then you could use a simple query to get object > identities + descriptions from pg_description (right now you have to > exclude that catalog specifically, otherwise the query bombs out). But > it'd be a lot of trouble, and since these objects are not really > pluggable, I'm not bothering. We can always add it later if there's > more interesting use for it. I think that would be a good thing to add, but no objection to leaving it for a follow-on patch. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > One change I made was to move all the new code from dependency.c into > > objectaddress.c. The only reason it was in dependency.c was that > > getObjectDescription was there in the first place; but it doesn't seem > > to me that it really belongs there. (Back when it was first created, > > there was no objectaddress.c at all, and dependency.c was the only user > > of it.) If there were no backpatching considerations, I would suggest > > we move getObjectDescription() to objectaddress.c as well, but I'm not > > sure it's worth the trouble, but I'm not wedded to that if somebody > > thinks both things should be kept together. > > +1 for moving getObjectDescription to objectaddress.c. As you say, > that's probably where it would've been if that file had existed at > the time. I don't recall that we've had to back-patch many changes > in that function, so I don't think that concern is major. Okay, I have pushed it with that change. Thanks for the quick feedback. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services