Thread: Should PG backend know how to represent metadata?

Should PG backend know how to represent metadata?

From
Philip Warner
Date:
This question comes out of my work on pg_dump. AFAICT, the only way of
showing, eg, the SQL for a procedure definition (other than 'select prosrc
from pg_procs, or whatever'), is to use pg_dump.

This seems strange to me, since I often want to look at a procedure within
psql, and running 'select' on system tables is not my first thought.

I would have thought that the database itself should be the tool used to
display SQL, and if not the database, then one of the interface libraries. 

If it were separated from pg_dump, then psql could more easily have a new
"\D table table-name" and "\D rule rule-name" to dump object definitions,
or "\D rules", to dump the names of all rules etc.

The separation would have the further advantage that when a new language
feature is added the person adding it does not have to remember to update
pg_dump, psql etc. And the task might be a little easier, since I would
hope that the code to dump the definition would be close to the code to
parse it.

Does this sound resonable/sensible/worth doing?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
Chris Bitmead
Date:
Something like...
# \D table foo

create table foo ( bar text, baz integer
);

?

Sounds pretty good.

Philip Warner wrote:
> 
> This question comes out of my work on pg_dump. AFAICT, the only way of
> showing, eg, the SQL for a procedure definition (other than 'select prosrc
> from pg_procs, or whatever'), is to use pg_dump.
> 
> This seems strange to me, since I often want to look at a procedure within
> psql, and running 'select' on system tables is not my first thought.
> 
> I would have thought that the database itself should be the tool used to
> display SQL, and if not the database, then one of the interface libraries.
> 
> If it were separated from pg_dump, then psql could more easily have a new
> "\D table table-name" and "\D rule rule-name" to dump object definitions,
> or "\D rules", to dump the names of all rules etc.
> 
> The separation would have the further advantage that when a new language
> feature is added the person adding it does not have to remember to update
> pg_dump, psql etc. And the task might be a little easier, since I would
> hope that the code to dump the definition would be close to the code to
> parse it.
> 
> Does this sound resonable/sensible/worth doing?
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.C.N. 008 659 498)             |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                  |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> I would have thought that the database itself should be the tool used to
> display SQL, and if not the database, then one of the interface libraries. 

You might be on to something.  We have bits and pieces of that, such as
the rule reverse-lister and the code Peter just added to create a
readable version of a type name, but maybe some more consolidation is
in order.

> The separation would have the further advantage that when a new language
> feature is added the person adding it does not have to remember to update
> pg_dump, psql etc. And the task might be a little easier, since I would
> hope that the code to dump the definition would be close to the code to
> parse it.

No, not really.  The only advantage would be in centralizing the display
capability and having just one copy instead of several.  That is a
substantial advantage, but you only get it if you make sure the backend
display capability is defined in a way that lets all these apps use it.
That might take some careful thought.  For example, does the definition
of a table include associated constraints and indexes?  pg_dump would
want them separate, other apps perhaps not.  Also, psql's \d command
doesn't display the schema of a table in the form of a CREATE command
to recreate it, and I don't think it should.  Certainly you don't want
to condemn every app that wants to know "what are the columns of this
table" to have to include a full SQL parser to make sense of the
answer.   So I think some thought is needed to figure out what a
general-purpose representation would be like.
        regards, tom lane


Re: Should PG backend know how to represent metadata?

From
Philip Warner
Date:
At 16:30 17/07/00 +1000, Chris Bitmead wrote:
>
>Something like...
># \D table foo
>
>create table foo (
>  bar text,
>  baz integer
>);

Exactly. Also, "\D table" would list all tables.

The idea is to have [a replacement for] the 'dump' code from pg_dump in the
backend or in a library (libpq? libdump?). It might also be worth allowing: 
  \D ALL

to do something close to what pg_dump does at the moment (ie. raw text dump
of schema), although I would probably be inclined to sort the output in
logical order (all tables together etc).

As has already been commented, the changes I have made to pg_dump are
non-trivial, so now might be a good time to make these extra changes.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
Philip Warner
Date:
At 02:43 17/07/00 -0400, Tom Lane wrote:
>That is a
>substantial advantage, but you only get it if you make sure the backend
>display capability is defined in a way that lets all these apps use it.
>That might take some careful thought.  For example, does the definition
>of a table include associated constraints and indexes?

You need to separate the API from what is displayed (eg. in psql). 

Suggestion:

I would envisage the API consisting of a custom dump routine for each
object type. In the case of the table dumper API, it would return a table
definition with no indexes or constraints and a list of related entities
consisting of (object-type, object-oid) pairs suitable for passing back to
the dumper API. psql could display as little or as much as it  desired,
pg_dump could ferret the extra items away for later use etc. For those
items that can not be separated out, then they obviously have to go into
the main definition.


>Also, psql's \d command
>doesn't display the schema of a table in the form of a CREATE command
>to recreate it, and I don't think it should.  

I agree. \D is not to replace \d or \df etc.


>Certainly you don't want
>to condemn every app that wants to know "what are the columns of this
>table"

This is where we need to decide what the dumper code is for. I don't know
much about the other things you have mentioned, so perhaps you could
expand. But, in my original plan, this suggestion was intended for human
readable dumps from pg_dump and psql. It would be great if it could be made
to work elsewhere.

> to have to include a full SQL parser to make sense of the
>answer.   So I think some thought is needed to figure out what a
>general-purpose representation would be like.

And where it would go...





----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
Philip Warner
Date:
At 17:06 17/07/00 +1000, Philip Warner wrote:
>
>Suggestion:
>
>I would envisage the API consisting of a custom dump routine for each
>object type. In the case of the table dumper API, it would return a table
>definition with no indexes or constraints and a list of related entities
>consisting of (object-type, object-oid) pairs suitable for passing back to
>the dumper API. psql could display as little or as much as it  desired,
>pg_dump could ferret the extra items away for later use etc. For those
>items that can not be separated out, then they obviously have to go into
>the main definition.
>

Just took the dog for a walk, and had another thought. If we really want
this to have the maximum usability, then we should make it available from SQL.

ie.
select pg_dump('table', 'foo')

where pg_dump returns (possibly) multiple rows, the first being the most
basic definition, and subsequent rows being additional items & their
name/id, eg:

'Create Table Foo(Bar int);" NULL
'index' 'foo_ix1'
'constraint' 'foo_pk'

etc.

I don't think we have functions that return multiple rows, and a 'select'
without a 'from' is not strictly legal, but aside from that, an SQL-based
solution mught be a nice idea. Which brings me to my next idea:
   select defn from pg_dump where type='table and name = 'foo'
or   select related_items from pg_dump where type='table and name = 'foo'

where pg_dump can be implemented via a rewrite rule....maybe.





----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
eisentrp@csis.gvsu.edu
Date:
On Mon, 17 Jul 2000, Philip Warner wrote:

> I would have thought that the database itself should be the tool used to
> display SQL, and if not the database, then one of the interface libraries. 

SQL is only one of the many formats that people might want meta data in.
psql and pgaccess, for example, have somewhat different requirements.

The SQL standard defines a large set of information schema views which
provide the database metadata in a portable fashion, from there it should
be a relatively short distance to the format of your choice, and the
maintainance problem across releases is decreased.

Of course without schema support these views would intolerably clutter the
user name space, but I could think of a couple of ways to work around that
for the moment.

Btw., in your wheeling and dealing in pg_dump you might want to look at
the format_type function I added, which is a step in this direction.
(examples in psql/describe.c)

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Should PG backend know how to represent metadata?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> I don't think we have functions that return multiple rows,

We do, although they're a bit unwieldy to use; might be better to avoid
that feature.  I'd be inclined to avoid the issue, and just have the
function return one result (which might contain newlines for readability
of course).

> and a 'select'
> without a 'from' is not strictly legal,

It is in postgres, and this is hardly SQL-standard-based anyway...

> Which brings me to my next idea:

>     select defn from pg_dump where type='table and name = 'foo'
> or
>     select related_items from pg_dump where type='table and name = 'foo'

> where pg_dump can be implemented via a rewrite rule....maybe.

The rewrite rule couldn't do any of the heavy lifting; it'd still end
up calling a function.  A view like pg_rules might not be a bad idea,
but you should plan on exposing the underlying function for
flexibility.
        regards, tom lane


Re: Should PG backend know how to represent metadata?

From
Philip Warner
Date:
At 10:46 17/07/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> I don't think we have functions that return multiple rows,
>
>We do, although they're a bit unwieldy to use; might be better to avoid
>that feature.  I'd be inclined to avoid the issue, and just have the
>function return one result (which might contain newlines for readability
>of course).

Not sure that this has the flexibility needed for tables; I'd like the
calling application to be able to get just the base table definition with
no constraints, and also request the related items (constraints, indexes
etc). I also want to avoid the caller from having to parse the output in
any way.

Perhaps it is best left as an API-only feature, but now I have thought of
an SQL interface, I do like the idea.


>
>> Which brings me to my next idea:
>
>>     select defn from pg_dump where type='table and name = 'foo'
>> or
>>     select related_items from pg_dump where type='table and name = 'foo'
>
>> where pg_dump can be implemented via a rewrite rule....maybe.
>
>The rewrite rule couldn't do any of the heavy lifting; it'd still end
>up calling a function.  A view like pg_rules might not be a bad idea,
>but you should plan on exposing the underlying function for
>flexibility.

Sounds fine. Only I'm not sure that a rule can do it - AFAICT I still need
some underlying table to select from when I use a rule...unless I can fake
a result set for a 'special' table?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
Philip Warner
Date:
At 10:45 17/07/00 -0400, eisentrp@csis.gvsu.edu wrote:
>On Mon, 17 Jul 2000, Philip Warner wrote:
>
>> I would have thought that the database itself should be the tool used to
>> display SQL, and if not the database, then one of the interface libraries. 
>
>SQL is only one of the many formats that people might want meta data in.
>psql and pgaccess, for example, have somewhat different requirements.

I would have thought that pgaccess would still need to display table
definitions in SQL, but I have not looked at it closely enough. At the
lowest level I suspect pgaccess will always have to use direct access to
pg_* tables.


>The SQL standard defines a large set of information schema views which
>provide the database metadata in a portable fashion, from there it should
>be a relatively short distance to the format of your choice, and the
>maintainance problem across releases is decreased.

This sounds good; where are they defined in the spec?


>Of course without schema support these views would intolerably clutter the
>user name space, but I could think of a couple of ways to work around that
>for the moment.

Presumably they could be called pg_*...


>Btw., in your wheeling and dealing in pg_dump you might want to look at
>the format_type function I added, which is a step in this direction.
>(examples in psql/describe.c)

This is the sort of thing I'd like to see, but on a more general level:
   format_object('table', <oid>)

would return the base definition of the table.

But I'd also like some kind of interface that allowed related items (&
their type) to be returned, which is where I came from with a 'select'
expression returning multiple rows. The functional interface could be
written as (ignoring names!):
   typedef {int    relationship;char*    objType;Oid    oid   } objRef;
   formatObject(const char* objType, Oid oid,         char* objDefn, int *defnLen, objRef *refs[], int *numRefs)

where formatObject is passed a type and an Oid, and returns a definition
and an array of references to other objects. Note that the fields of the
objRef structure match the input args of formatObjects.

One could also call formatObject with a null oid to get a list of objects
of the given type, and call it with a null objTye and oid to get a list of
available types to dump...perhaps I am overloading the function just a
little, but does this sound reasonable?

If desired, the 'relationship' field could be used to indicate te parent
table for an index, or the 'child' indexes for a table, but it might be
better to have a separate list for parent (one only?) and children?

Any suggestions would be appreciated....


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
Peter Eisentraut
Date:
Philip Warner writes:

> I would have thought that pgaccess would still need to display table
> definitions in SQL, but I have not looked at it closely enough. At the
> lowest level I suspect pgaccess will always have to use direct access to
> pg_* tables.

I thought it was your intention to get rid of this fact. We should
certainly be thinking in terms of all client applications.

[Information Schema]
> This sounds good; where are they defined in the spec?

Part 2, chapter 20, if that helps you. It's not really possible to
implement all of these at this point because many are quite complex and
depend on outer joins and other fancy features, or contain
meta-information on features that don't exist yet. Actually, we probably
need the full-blown TOAST before some of these will fit at all.

> Presumably they could be called pg_*...

We could name them pg_IS_* for the moment and add simplistic parser
support for schemas that wiil pick up these tables if the
information_schema is referenced.


> This is the sort of thing I'd like to see, but on a more general level:
> 
>     format_object('table', <oid>)
> 
> would return the base definition of the table.

I'm not sure if we want to move the entire pg_dump functionality into the
backend. For example, if someone wants to move SQL dumps to a
not-quite-SQL or a much-more-SQL database and the format is slightly
wrong, then there's no way to amend that short of patching the backend.
Then we could as well have the backend returns pre-formatted output for
psql.

A human-oriented layer over the system catalogs (which are implementation
oriented) could go a long way toward maximum flexibility.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Should PG backend know how to representmetadata?

From
Chris Bitmead
Date:
Philip Warner wrote:

> This is the sort of thing I'd like to see, but on a more general level:
> 
>     format_object('table', <oid>)
> 
> would return the base definition of the table.

format_object(<oid>) should be sufficient.


Re: Should PG backend know how to representmetadata?

From
Philip Warner
Date:
At 09:09 18/07/00 +1000, Chris Bitmead wrote:
>Philip Warner wrote:
>
>> This is the sort of thing I'd like to see, but on a more general level:
>> 
>>     format_object('table', <oid>)
>> 
>> would return the base definition of the table.
>
>format_object(<oid>) should be sufficient.
>

Technically yes, but my belief is that identifying what the oid points to
is actually a matter of searching everywhere, which is probably to be avoided.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Should PG backend know how to represent metadata?

From
Philip Warner
Date:
At 00:29 18/07/00 +0200, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> I would have thought that pgaccess would still need to display table
>> definitions in SQL, but I have not looked at it closely enough. At the
>> lowest level I suspect pgaccess will always have to use direct access to
>> pg_* tables.
>
>I thought it was your intention to get rid of this fact. We should
>certainly be thinking in terms of all client applications.

I agree, but it seems we have a gain if we can get psql-compliant sql out
of a single library. I'm quite open to making a more general
implementation, but I'd need to know what pgaccess needs over and above a
psql-compliant SQL output.

The reason I think pgaccess will probably have to continue with internal
knowledge is that it is a low level manager for the database; at the
simplest level, getting tables and their columns would be great, but it
probably also needs to know what the primary key is, and even understand
constraints (at least non-NULL ones). This is a very different problem, and
definitely related to the SQL information schemas.

Perhaps what I do here can be structured to be useful to whoever implements
information schemas when they come along.


>[Information Schema]
>> This sounds good; where are they defined in the spec?
>
>Part 2, chapter 20, if that helps you. It's not really possible to
>implement all of these at this point because many are quite complex and
>depend on outer joins and other fancy features, or contain
>meta-information on features that don't exist yet. Actually, we probably
>need the full-blown TOAST before some of these will fit at all.

I agree. At best we could implement things like COLUMNS, and even then the
various 'schema' columns would be meaningless (until schemas come along).


>
>> This is the sort of thing I'd like to see, but on a more general level:
>> 
>>     format_object('table', <oid>)
>> 
>> would return the base definition of the table.
>
>I'm not sure if we want to move the entire pg_dump functionality into the
>backend. For example, if someone wants to move SQL dumps to a
>not-quite-SQL or a much-more-SQL database and the format is slightly
>wrong, then there's no way to amend that short of patching the backend.
>Then we could as well have the backend returns pre-formatted output for
>psql.
>
>A human-oriented layer over the system catalogs (which are implementation
>oriented) could go a long way toward maximum flexibility.

You may be right, but being able to 'select' a table or field definition is
very appealing. Can it be made a little cleaner by being implemented as a
dynamically linked function (as per user defined functions). That would
seem to reduce the problem you have with releasing a new backend, at least.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/