Thread: Should PG backend know how to represent metadata?
This question comes out of my work on pg_dump. AFAICT, the only way of showing, eg, the SQL for a procedure definition (other than 'select prosrc from pg_procs, or whatever'), is to use pg_dump. This seems strange to me, since I often want to look at a procedure within psql, and running 'select' on system tables is not my first thought. I would have thought that the database itself should be the tool used to display SQL, and if not the database, then one of the interface libraries. If it were separated from pg_dump, then psql could more easily have a new "\D table table-name" and "\D rule rule-name" to dump object definitions, or "\D rules", to dump the names of all rules etc. The separation would have the further advantage that when a new language feature is added the person adding it does not have to remember to update pg_dump, psql etc. And the task might be a little easier, since I would hope that the code to dump the definition would be close to the code to parse it. Does this sound resonable/sensible/worth doing? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Something like... # \D table foo create table foo ( bar text, baz integer ); ? Sounds pretty good. Philip Warner wrote: > > This question comes out of my work on pg_dump. AFAICT, the only way of > showing, eg, the SQL for a procedure definition (other than 'select prosrc > from pg_procs, or whatever'), is to use pg_dump. > > This seems strange to me, since I often want to look at a procedure within > psql, and running 'select' on system tables is not my first thought. > > I would have thought that the database itself should be the tool used to > display SQL, and if not the database, then one of the interface libraries. > > If it were separated from pg_dump, then psql could more easily have a new > "\D table table-name" and "\D rule rule-name" to dump object definitions, > or "\D rules", to dump the names of all rules etc. > > The separation would have the further advantage that when a new language > feature is added the person adding it does not have to remember to update > pg_dump, psql etc. And the task might be a little easier, since I would > hope that the code to dump the definition would be close to the code to > parse it. > > Does this sound resonable/sensible/worth doing? > > ---------------------------------------------------------------- > Philip Warner | __---_____ > Albatross Consulting Pty. Ltd. |----/ - \ > (A.C.N. 008 659 498) | /(@) ______---_ > Tel: (+61) 0500 83 82 81 | _________ \ > Fax: (+61) 0500 83 82 82 | ___________ | > Http://www.rhyme.com.au | / \| > | --________-- > PGP key available upon request, | / > and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > I would have thought that the database itself should be the tool used to > display SQL, and if not the database, then one of the interface libraries. You might be on to something. We have bits and pieces of that, such as the rule reverse-lister and the code Peter just added to create a readable version of a type name, but maybe some more consolidation is in order. > The separation would have the further advantage that when a new language > feature is added the person adding it does not have to remember to update > pg_dump, psql etc. And the task might be a little easier, since I would > hope that the code to dump the definition would be close to the code to > parse it. No, not really. The only advantage would be in centralizing the display capability and having just one copy instead of several. That is a substantial advantage, but you only get it if you make sure the backend display capability is defined in a way that lets all these apps use it. That might take some careful thought. For example, does the definition of a table include associated constraints and indexes? pg_dump would want them separate, other apps perhaps not. Also, psql's \d command doesn't display the schema of a table in the form of a CREATE command to recreate it, and I don't think it should. Certainly you don't want to condemn every app that wants to know "what are the columns of this table" to have to include a full SQL parser to make sense of the answer. So I think some thought is needed to figure out what a general-purpose representation would be like. regards, tom lane
At 16:30 17/07/00 +1000, Chris Bitmead wrote: > >Something like... ># \D table foo > >create table foo ( > bar text, > baz integer >); Exactly. Also, "\D table" would list all tables. The idea is to have [a replacement for] the 'dump' code from pg_dump in the backend or in a library (libpq? libdump?). It might also be worth allowing: \D ALL to do something close to what pg_dump does at the moment (ie. raw text dump of schema), although I would probably be inclined to sort the output in logical order (all tables together etc). As has already been commented, the changes I have made to pg_dump are non-trivial, so now might be a good time to make these extra changes. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 02:43 17/07/00 -0400, Tom Lane wrote: >That is a >substantial advantage, but you only get it if you make sure the backend >display capability is defined in a way that lets all these apps use it. >That might take some careful thought. For example, does the definition >of a table include associated constraints and indexes? You need to separate the API from what is displayed (eg. in psql). Suggestion: I would envisage the API consisting of a custom dump routine for each object type. In the case of the table dumper API, it would return a table definition with no indexes or constraints and a list of related entities consisting of (object-type, object-oid) pairs suitable for passing back to the dumper API. psql could display as little or as much as it desired, pg_dump could ferret the extra items away for later use etc. For those items that can not be separated out, then they obviously have to go into the main definition. >Also, psql's \d command >doesn't display the schema of a table in the form of a CREATE command >to recreate it, and I don't think it should. I agree. \D is not to replace \d or \df etc. >Certainly you don't want >to condemn every app that wants to know "what are the columns of this >table" This is where we need to decide what the dumper code is for. I don't know much about the other things you have mentioned, so perhaps you could expand. But, in my original plan, this suggestion was intended for human readable dumps from pg_dump and psql. It would be great if it could be made to work elsewhere. > to have to include a full SQL parser to make sense of the >answer. So I think some thought is needed to figure out what a >general-purpose representation would be like. And where it would go... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 17:06 17/07/00 +1000, Philip Warner wrote: > >Suggestion: > >I would envisage the API consisting of a custom dump routine for each >object type. In the case of the table dumper API, it would return a table >definition with no indexes or constraints and a list of related entities >consisting of (object-type, object-oid) pairs suitable for passing back to >the dumper API. psql could display as little or as much as it desired, >pg_dump could ferret the extra items away for later use etc. For those >items that can not be separated out, then they obviously have to go into >the main definition. > Just took the dog for a walk, and had another thought. If we really want this to have the maximum usability, then we should make it available from SQL. ie. select pg_dump('table', 'foo') where pg_dump returns (possibly) multiple rows, the first being the most basic definition, and subsequent rows being additional items & their name/id, eg: 'Create Table Foo(Bar int);" NULL 'index' 'foo_ix1' 'constraint' 'foo_pk' etc. I don't think we have functions that return multiple rows, and a 'select' without a 'from' is not strictly legal, but aside from that, an SQL-based solution mught be a nice idea. Which brings me to my next idea: select defn from pg_dump where type='table and name = 'foo' or select related_items from pg_dump where type='table and name = 'foo' where pg_dump can be implemented via a rewrite rule....maybe. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Mon, 17 Jul 2000, Philip Warner wrote: > I would have thought that the database itself should be the tool used to > display SQL, and if not the database, then one of the interface libraries. SQL is only one of the many formats that people might want meta data in. psql and pgaccess, for example, have somewhat different requirements. The SQL standard defines a large set of information schema views which provide the database metadata in a portable fashion, from there it should be a relatively short distance to the format of your choice, and the maintainance problem across releases is decreased. Of course without schema support these views would intolerably clutter the user name space, but I could think of a couple of ways to work around that for the moment. Btw., in your wheeling and dealing in pg_dump you might want to look at the format_type function I added, which is a step in this direction. (examples in psql/describe.c) -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Philip Warner <pjw@rhyme.com.au> writes: > I don't think we have functions that return multiple rows, We do, although they're a bit unwieldy to use; might be better to avoid that feature. I'd be inclined to avoid the issue, and just have the function return one result (which might contain newlines for readability of course). > and a 'select' > without a 'from' is not strictly legal, It is in postgres, and this is hardly SQL-standard-based anyway... > Which brings me to my next idea: > select defn from pg_dump where type='table and name = 'foo' > or > select related_items from pg_dump where type='table and name = 'foo' > where pg_dump can be implemented via a rewrite rule....maybe. The rewrite rule couldn't do any of the heavy lifting; it'd still end up calling a function. A view like pg_rules might not be a bad idea, but you should plan on exposing the underlying function for flexibility. regards, tom lane
At 10:46 17/07/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> I don't think we have functions that return multiple rows, > >We do, although they're a bit unwieldy to use; might be better to avoid >that feature. I'd be inclined to avoid the issue, and just have the >function return one result (which might contain newlines for readability >of course). Not sure that this has the flexibility needed for tables; I'd like the calling application to be able to get just the base table definition with no constraints, and also request the related items (constraints, indexes etc). I also want to avoid the caller from having to parse the output in any way. Perhaps it is best left as an API-only feature, but now I have thought of an SQL interface, I do like the idea. > >> Which brings me to my next idea: > >> select defn from pg_dump where type='table and name = 'foo' >> or >> select related_items from pg_dump where type='table and name = 'foo' > >> where pg_dump can be implemented via a rewrite rule....maybe. > >The rewrite rule couldn't do any of the heavy lifting; it'd still end >up calling a function. A view like pg_rules might not be a bad idea, >but you should plan on exposing the underlying function for >flexibility. Sounds fine. Only I'm not sure that a rule can do it - AFAICT I still need some underlying table to select from when I use a rule...unless I can fake a result set for a 'special' table? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 10:45 17/07/00 -0400, eisentrp@csis.gvsu.edu wrote: >On Mon, 17 Jul 2000, Philip Warner wrote: > >> I would have thought that the database itself should be the tool used to >> display SQL, and if not the database, then one of the interface libraries. > >SQL is only one of the many formats that people might want meta data in. >psql and pgaccess, for example, have somewhat different requirements. I would have thought that pgaccess would still need to display table definitions in SQL, but I have not looked at it closely enough. At the lowest level I suspect pgaccess will always have to use direct access to pg_* tables. >The SQL standard defines a large set of information schema views which >provide the database metadata in a portable fashion, from there it should >be a relatively short distance to the format of your choice, and the >maintainance problem across releases is decreased. This sounds good; where are they defined in the spec? >Of course without schema support these views would intolerably clutter the >user name space, but I could think of a couple of ways to work around that >for the moment. Presumably they could be called pg_*... >Btw., in your wheeling and dealing in pg_dump you might want to look at >the format_type function I added, which is a step in this direction. >(examples in psql/describe.c) This is the sort of thing I'd like to see, but on a more general level: format_object('table', <oid>) would return the base definition of the table. But I'd also like some kind of interface that allowed related items (& their type) to be returned, which is where I came from with a 'select' expression returning multiple rows. The functional interface could be written as (ignoring names!): typedef {int relationship;char* objType;Oid oid } objRef; formatObject(const char* objType, Oid oid, char* objDefn, int *defnLen, objRef *refs[], int *numRefs) where formatObject is passed a type and an Oid, and returns a definition and an array of references to other objects. Note that the fields of the objRef structure match the input args of formatObjects. One could also call formatObject with a null oid to get a list of objects of the given type, and call it with a null objTye and oid to get a list of available types to dump...perhaps I am overloading the function just a little, but does this sound reasonable? If desired, the 'relationship' field could be used to indicate te parent table for an index, or the 'child' indexes for a table, but it might be better to have a separate list for parent (one only?) and children? Any suggestions would be appreciated.... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner writes: > I would have thought that pgaccess would still need to display table > definitions in SQL, but I have not looked at it closely enough. At the > lowest level I suspect pgaccess will always have to use direct access to > pg_* tables. I thought it was your intention to get rid of this fact. We should certainly be thinking in terms of all client applications. [Information Schema] > This sounds good; where are they defined in the spec? Part 2, chapter 20, if that helps you. It's not really possible to implement all of these at this point because many are quite complex and depend on outer joins and other fancy features, or contain meta-information on features that don't exist yet. Actually, we probably need the full-blown TOAST before some of these will fit at all. > Presumably they could be called pg_*... We could name them pg_IS_* for the moment and add simplistic parser support for schemas that wiil pick up these tables if the information_schema is referenced. > This is the sort of thing I'd like to see, but on a more general level: > > format_object('table', <oid>) > > would return the base definition of the table. I'm not sure if we want to move the entire pg_dump functionality into the backend. For example, if someone wants to move SQL dumps to a not-quite-SQL or a much-more-SQL database and the format is slightly wrong, then there's no way to amend that short of patching the backend. Then we could as well have the backend returns pre-formatted output for psql. A human-oriented layer over the system catalogs (which are implementation oriented) could go a long way toward maximum flexibility. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Philip Warner wrote: > This is the sort of thing I'd like to see, but on a more general level: > > format_object('table', <oid>) > > would return the base definition of the table. format_object(<oid>) should be sufficient.
At 09:09 18/07/00 +1000, Chris Bitmead wrote: >Philip Warner wrote: > >> This is the sort of thing I'd like to see, but on a more general level: >> >> format_object('table', <oid>) >> >> would return the base definition of the table. > >format_object(<oid>) should be sufficient. > Technically yes, but my belief is that identifying what the oid points to is actually a matter of searching everywhere, which is probably to be avoided. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 00:29 18/07/00 +0200, Peter Eisentraut wrote: >Philip Warner writes: > >> I would have thought that pgaccess would still need to display table >> definitions in SQL, but I have not looked at it closely enough. At the >> lowest level I suspect pgaccess will always have to use direct access to >> pg_* tables. > >I thought it was your intention to get rid of this fact. We should >certainly be thinking in terms of all client applications. I agree, but it seems we have a gain if we can get psql-compliant sql out of a single library. I'm quite open to making a more general implementation, but I'd need to know what pgaccess needs over and above a psql-compliant SQL output. The reason I think pgaccess will probably have to continue with internal knowledge is that it is a low level manager for the database; at the simplest level, getting tables and their columns would be great, but it probably also needs to know what the primary key is, and even understand constraints (at least non-NULL ones). This is a very different problem, and definitely related to the SQL information schemas. Perhaps what I do here can be structured to be useful to whoever implements information schemas when they come along. >[Information Schema] >> This sounds good; where are they defined in the spec? > >Part 2, chapter 20, if that helps you. It's not really possible to >implement all of these at this point because many are quite complex and >depend on outer joins and other fancy features, or contain >meta-information on features that don't exist yet. Actually, we probably >need the full-blown TOAST before some of these will fit at all. I agree. At best we could implement things like COLUMNS, and even then the various 'schema' columns would be meaningless (until schemas come along). > >> This is the sort of thing I'd like to see, but on a more general level: >> >> format_object('table', <oid>) >> >> would return the base definition of the table. > >I'm not sure if we want to move the entire pg_dump functionality into the >backend. For example, if someone wants to move SQL dumps to a >not-quite-SQL or a much-more-SQL database and the format is slightly >wrong, then there's no way to amend that short of patching the backend. >Then we could as well have the backend returns pre-formatted output for >psql. > >A human-oriented layer over the system catalogs (which are implementation >oriented) could go a long way toward maximum flexibility. You may be right, but being able to 'select' a table or field definition is very appealing. Can it be made a little cleaner by being implemented as a dynamically linked function (as per user defined functions). That would seem to reduce the problem you have with releasing a new backend, at least. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/