Thread: pg_dump, libdump, dump API, & backend again

pg_dump, libdump, dump API, & backend again

From
Philip Warner
Date:
Executive summary: 

I have discarded the idea of (partially) implementing Info Schemas as part
of a pg_dump/psql enhancement, and returned to my original plan. You can
now skip to 'Questions', below.

----------- Discussion --------
Having now waded through the SQL standards rather more than is good for my
sanity, I have concluded that even if I could implement the entire info
schema system in PG, I would still have to write SQL dumping code that
would look quite similar to the SQL dumping code in pg_dump. What is more,
it would not support rules, or any non-standard features of PG (at least I
think the latter are true).

Being therefore forced to look at the intersection of the info schemas and
PG, I would probably end up just supporting COLUMNS, VIEWS, & TABLES. And
even with this work complete, I would have to revise pg_dump to use it, by
which time I would have sucessfully implemented the current functionality
of pg_dump.

As a result I am inclined back to the more modest goal of making the code
to dump individual database items more generally available. Perhaps then at
some later stage we can modify the internals to use info schemas, when they
are implemented. The way I see it at the moment, what I plan to implement
is still a layer above either info schemas or the pg_* tables.

As to pgaccess, if an API layer is needed for ease of update & portability,
I would be interested to know if anyone had considered the practicality of
using the ODBC interface? ODBC allows interrogation of metadata, execution
of DDL statements, execution of queries etc. It seems to me that if
pgaccess were implemented on top of ODBC, the authors would have a very
useful general purpose tool. But I have not looked at what special features
it has, so I have no idea how practical the suggestion is.

----------- Questions --------

1. Can anyone think of an existing utility or library in PG that would
profit from a more convenient interface for querying metadata than is
available from ODBC or the pg_* tables? If so, can you give some details &
pointers. [This will be useful in modifying any resulting API].

2. Can anybody suggest the most appropriate (polite) location for any SQL
extraction code (library? backend? external function?). Some people have
commented on it being a bad idea that it goes into the backend, and I have
observed that it would be nice to extract valid SQL definitions via SQL,
thus making the code available to most users with minimal effort. Currently
I would expect to implement an API and (ideally) an SQL interface. The SQL
side of things would probably just call the API, but the details need to be
worked out.

3. My current plan/idea/suggestion is to implement an API that, given an
object type (eg. 'table'), will return the most basic definition for the
object and a list of related objects. This API could be used by both psql
and pg_dump to create SQL for display & backup purposes.

4. Even if I achieve nothing more than breaking 'libdump' out of pg_dump,
and adding a \D command to psql, does anybody object to this as an objective? 


Any comments and suggestions would be greatly appreciated...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump, libdump, dump API, & backend again

From
Peter Eisentraut
Date:
Philip Warner writes:

> 1. Can anyone think of an existing utility or library in PG that would
> profit from a more convenient interface for querying metadata than is
> available from ODBC or the pg_* tables?

People do ask all the time how they can reproduce psql's, say, "\d" output
for themselves. Notice that they don't ask how to reproduce pg_dump's
output, pg_dump does that just fine, but people want to get at the
metadata and process it for themselves. I imagine people have web tools
for modelling their databases, etc. or change tables on the fly. I'd say
there is a big market for this sort of thing, and the current state of
affairs that every system catalog change is in some way liable to break
all these applications is a non-negligible problem.

It is therefore my opinion that making SQL dumps widely available is,
while certainly useful, not the real core of the problem. Getting to the
data that underlies the dump is. Anything that does that would be a step
ahead.

Some things to consider:

* A backend-side implementation: we already have a couple of views to that
effect (pg_rules, pg_views). You can always add more.

* A C API: How many people do use C? How to propagate this to other
interfaces? Are we really up to creating and maintaining another
hand-crafted API?


> 4. Even if I achieve nothing more than breaking 'libdump' out of pg_dump,
> and adding a \D command to psql, does anybody object to this as an objective? 

If that's your concern that you can just link a couple of pg_dump's object
files into psql. You could perhaps create a de facto API within pg_dump,
which would also go a long way in cleaning up pg_dump's source code.


(When I was wheeling and dealing in psql last fall I was faced with the
same questions regarding the table printing facilities. People are not
really interested in getting pretty printed ASCII versions of their query
results; their web browser or widget set can do that much better. And even
those that do wouldn't like the particular formatting choices anyway. So I
withdrew psql's printing functions from the public. You can still link in
print.o into your program if you want to, but please don't complain if I
change the spacing and your application breaks.)


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: pg_dump, libdump, dump API, & backend again

From
Philip Warner
Date:
At 03:47 22/07/00 +0200, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> 1. Can anyone think of an existing utility or library in PG that would
>> profit from a more convenient interface for querying metadata than is
>> available from ODBC or the pg_* tables?
>
>People do ask all the time how they can reproduce psql's, say, "\d" output
>for themselves. Notice that they don't ask how to reproduce pg_dump's
>output, pg_dump does that just fine, but people want to get at the
>metadata and process it for themselves. I imagine people have web tools
>for modelling their databases, etc. or change tables on the fly. I'd say
>there is a big market for this sort of thing, and the current state of
>affairs that every system catalog change is in some way liable to break
>all these applications is a non-negligible problem.

This is very true; and to satisfy this issue, I think you are right in
suggesting using (a subset of) the Info schemas. It's possibly even
worthwhile creating something like:
   Create View TABLES(TABLE_NAME) as Select whatever...   (this can be identical to pg_tables, for example)

and
   Create View COLUMNS(TABLE_NAME, COLUMN_NAME, DATA_TYPE,...)  as
Select...whatever

to satisfy some of these needs; ie. we don't even need to provide all the
standard columns, so long as we only provide ones that are in the standard,
and we also obey the visibility rules from the standard.

Just these two would, I think, provide the \d function in psql. Not that
I'm volunteering; pg_dump and 'insert/update...returning' are going to take
up enough time right now...


>It is therefore my opinion that making SQL dumps widely available is,
>while certainly useful, not the real core of the problem. Getting to the
>data that underlies the dump is. Anything that does that would be a step
>ahead.

Yes & no. There are two quite different problems, and the solution to one
does not provide a solution to the other, it just provides a layer on which
the other can be built:

1. How can we provide a stable interface for examining DB definitions that
works across releases.

You have answered this (to some extent) by suggesting Info schemas. Note
that these will still not work for non-standard features, but we could use
pg_* views for pg-specific structures. Perhaps whoever works on pgaccess
might have a good definition of the minimum requirements from an initial
implementation of info schemas?

It *might* be worth naming the guaranteed public views differently from the
normal pg_* tables; maybe pg_std_rules, then we can tell users that any
table/view that starts with pg_std_* will exist across versions, and always
at least have the columns present in the current release.


2. How can we provide a function in psql that displays definitions of all
things that a user can create with SQL?

This is the problem I set out to address, which is why I looked at using
pg_dump code. Which then led me to the philosophical question of what
should be responsible for 'decompiling' the internal representation of the
metadata. My feeling is that the backend 'compiles' SQL into metadata, so
it should also decompile it.


>Some things to consider:
>
>* A backend-side implementation: we already have a couple of views to that
>effect (pg_rules, pg_views). You can always add more.

I'll have a look at implementing something using views & functions as per
pg_views.


>* A C API: How many people do use C? How to propagate this to other
>interfaces? Are we really up to creating and maintaining another
>hand-crafted API?

I don't really want to be left maintaining such a thing and I'm sure nobody
else does, but as someone (Tom?) suggested, I'll have to develop a private
API to do the work, so making it public might be a good idea...maybe.


>
>> 4. Even if I achieve nothing more than breaking 'libdump' out of pg_dump,
>> and adding a \D command to psql, does anybody object to this as an
objective? 
>
>If that's your concern that you can just link a couple of pg_dump's object
>files into psql. You could perhaps create a de facto API within pg_dump,
>which would also go a long way in cleaning up pg_dump's source code.

This is what I want to do as a minimum. I don't really want to create
pg_get_tablesql, pg_get_constraintsql, pg_get_typesql, pg_get_viewsql etc,
but maybe that's the best option after all. Maybe to avoid conflicts with
pg_get_ruledef etc, I should use names formatted as pg_dump_sql_*, but the
idea is the same.


>(When I was wheeling and dealing in psql last fall I was faced with the
>same questions regarding the table printing facilities. People are not
>really interested in getting pretty printed ASCII versions of their query
>results; their web browser or widget set can do that much better. And even
>those that do wouldn't like the particular formatting choices anyway. So I
>withdrew psql's printing functions from the public. You can still link in
>print.o into your program if you want to, but please don't complain if I
>change the spacing and your application breaks.)

It's a bit different with definitions; AFAICT there is no way of (simply)
getting the definition of a constraint while in psql. *I* want to be able
to do this, and personal need is a pretty good motivator...

So, perhaps I need to implement the metadata dumping code from pg_dump as a
set of functions that can be added to the backend and which are called in
one or more view definitions which pg_dump can use.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump, libdump, dump API, & backend again

From
Karel Zak
Date:
On Sat, 22 Jul 2000, Philip Warner wrote:
> 
> 3. My current plan/idea/suggestion is to implement an API that, given an
> object type (eg. 'table'), will return the most basic definition for the
> object and a list of related objects. This API could be used by both psql
> and pg_dump to create SQL for display & backup purposes.
It's sounds good for me. If your API will good it is usable for more
programs than for psql and pg_dump. And a "C" API is included-able to
more langs (Python, PHP..)
A question is, what will returns from this API, full SQL statements in  
strings only or some structs too? If will here some common mechanism how 
obtain definition for tables (etc.) not is a problem start development 
on advanced client programs like Oracle Designer or make programs for 
SQL DB to XML conversion and DB schema visualization (for example in the 
nice program dia)... (don't care I know pgaccess).
IMHO all OpenSource SQL are behind in clients part. 
                    Karel



Re: pg_dump, libdump, dump API, & backend again

From
Peter Eisentraut
Date:
Philip Warner writes:

> 2. How can we provide a function in psql that displays definitions of all
> things that a user can create with SQL?
> 
> This is the problem I set out to address, which is why I looked at using
> pg_dump code. Which then led me to the philosophical question of what
> should be responsible for 'decompiling' the internal representation of the
> metadata. My feeling is that the backend 'compiles' SQL into metadata, so
> it should also decompile it.

There is certainly an intellectual merit in making the backend spit out
the SQL it took in. But, just for kicks, there is also an intellectual
merit in *not* doing it: Whatever you dump represents the state of a
database, but SQL only provides a method for modifying the state, not for
representing it. The backend has, by and large, no idea how it got to the
current state. Which is the philosophical counter-argument to the notion
that the backend should also provide the inverse operation to its query
parser -- there is no inverse.

Getting of that high horse, the practical problems I see is that there are
about a thousand parameters which would affect how you dump the SQL:

* indentation
* capitalization
* quotes
* standards compliance (to migrate to a different RDBMS)
* order
* view as view, view as rule?
* with/without psql specific stuff (\connect ...)
* How to resolve circular dependencies?

These kinds of decisions are better off with a client application, ISTM.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: pg_dump, libdump, dump API, & backend again

From
Philip Warner
Date:
At 18:06 24/07/00 +0200, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> 2. How can we provide a function in psql that displays definitions of all
>> things that a user can create with SQL?
>> 
>> This is the problem I set out to address, which is why I looked at using
>> pg_dump code. Which then led me to the philosophical question of what
>> should be responsible for 'decompiling' the internal representation of the
>> metadata. My feeling is that the backend 'compiles' SQL into metadata, so
>> it should also decompile it.
>
>There is certainly an intellectual merit in making the backend spit out
>the SQL it took in. But, just for kicks, there is also an intellectual
>merit in *not* doing it: Whatever you dump represents the state of a
>database, but SQL only provides a method for modifying the state, not for
>representing it. The backend has, by and large, no idea how it got to the
>current state. Which is the philosophical counter-argument to the notion
>that the backend should also provide the inverse operation to its query
>parser -- there is no inverse.

No, but any DB state must be representable, and the DB may be the best
arbiter of how to represent it. If psql is used to transform the database
from state 'A' to state 'B', then the database should be able to describe
state 'B', ISTM.

I don't know if PG allows table columns to depend on  columns in other
tables, but if they can then dumping a valid schema becomes almost
impossible. [Dec/Rdb allows 'computed by' columns which can contain
arithmetic operations on columns in the current table, or even 'select..'
statements. This means that defining a pair of cross-referenced tables can
be a multi-step process]. 

However, the fact the problem is hard/impossible, is not really an argument
for placing it in a FE or BE, AFAICT...


>Getting of that high horse, the practical problems I see is that there are
>about a thousand parameters which would affect how you dump the SQL:
>
>* indentation

This would be picked arbitrarily.


>* capitalization
>* quotes

These are related. pg_dump already has to handle this, and the client
should be able to specify how quotes are handled (needs to be in the
API/SQL/View).


>* standards compliance (to migrate to a different RDBMS)

This is currently a problem in pg_dump, and could be left out, I think. To
dump valid SQL-nn format output, we probably want to wqait for info
schemas. Also, we *can't* produce standard SQL for some PG features, I think.


>* order

This is an issue for the client. I am not proposing a
'Dump_The_Entire_Datbase' call; just a
'Dump_One_Thing_And_A_List_Of_Dependencies' call.


>* view as view, view as rule?

Also an issue with pg_dump currently; I'd prefer to see it as a view.


>* with/without psql specific stuff (\connect ...)

Another issue for the client.


>* How to resolve circular dependencies?

Issue for the client.

>These kinds of decisions are better off with a client application, ISTM.

Most of them would be; the only things handled at the backend would be
'formatting', quotes (based on client request), and SQL generation. The
client has to put the SQL together in a meaningful way including \connect
statements and handling dependencies.

I agree that this does not sit well with the "DB must be able to dump it's
state" argument; perhaps I need to just clean up pg_dump a little, link it
into psql, then  get on with something I actually want to do...(ie.
insert/update...returning).



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: pg_dump, libdump, dump API, & backend again

From
Peter Eisentraut
Date:
Philip Warner writes:

> >* order
> 
> This is an issue for the client. I am not proposing a
> 'Dump_The_Entire_Datbase' call; just a
> 'Dump_One_Thing_And_A_List_Of_Dependencies' call.

How? Functions can only return one value reliably.

> >* How to resolve circular dependencies?
> 
> Issue for the client.

Not really. If you have a circular dependency `table --(column default)-->
function --(reads)--> table' then you need to dump the table differently
(perhaps leave out the constraint and add it after the function is
created).


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: pg_dump, libdump, dump API, & backend again

From
Philip Warner
Date:
At 22:41 25/07/00 +0200, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> >* order
>> 
>> This is an issue for the client. I am not proposing a
>> 'Dump_The_Entire_Datbase' call; just a
>> 'Dump_One_Thing_And_A_List_Of_Dependencies' call.
>
>How? Functions can only return one value reliably.

The API is easy, and in the case of the database, it would take the form of
multiple rows in a result set, or possibly two querues - one returning a
single value (the definition), and the oter returning the dependants in
multiple rows. I guess another option would be to return an array.


>> >* How to resolve circular dependencies?
>> 
>> Issue for the client.
>
>Not really. If you have a circular dependency `table --(column default)-->
>function --(reads)--> table' then you need to dump the table differently
>(perhaps leave out the constraint and add it after the function is
>created).

I would expect ALL constraints would be left off the table definition - it
would dump the most basic definition possible, with references to the
constraints appearing in the dependency list. That is why I say it's a
problem for the client. For the purpose of displaying the deinition in
psql, circularity does not matter, and in the case of pg_dump, it actually
wants to separate constraints and the table definition.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/