Thread: pg_dump, libdump, dump API, & backend again
Executive summary: I have discarded the idea of (partially) implementing Info Schemas as part of a pg_dump/psql enhancement, and returned to my original plan. You can now skip to 'Questions', below. ----------- Discussion -------- Having now waded through the SQL standards rather more than is good for my sanity, I have concluded that even if I could implement the entire info schema system in PG, I would still have to write SQL dumping code that would look quite similar to the SQL dumping code in pg_dump. What is more, it would not support rules, or any non-standard features of PG (at least I think the latter are true). Being therefore forced to look at the intersection of the info schemas and PG, I would probably end up just supporting COLUMNS, VIEWS, & TABLES. And even with this work complete, I would have to revise pg_dump to use it, by which time I would have sucessfully implemented the current functionality of pg_dump. As a result I am inclined back to the more modest goal of making the code to dump individual database items more generally available. Perhaps then at some later stage we can modify the internals to use info schemas, when they are implemented. The way I see it at the moment, what I plan to implement is still a layer above either info schemas or the pg_* tables. As to pgaccess, if an API layer is needed for ease of update & portability, I would be interested to know if anyone had considered the practicality of using the ODBC interface? ODBC allows interrogation of metadata, execution of DDL statements, execution of queries etc. It seems to me that if pgaccess were implemented on top of ODBC, the authors would have a very useful general purpose tool. But I have not looked at what special features it has, so I have no idea how practical the suggestion is. ----------- Questions -------- 1. Can anyone think of an existing utility or library in PG that would profit from a more convenient interface for querying metadata than is available from ODBC or the pg_* tables? If so, can you give some details & pointers. [This will be useful in modifying any resulting API]. 2. Can anybody suggest the most appropriate (polite) location for any SQL extraction code (library? backend? external function?). Some people have commented on it being a bad idea that it goes into the backend, and I have observed that it would be nice to extract valid SQL definitions via SQL, thus making the code available to most users with minimal effort. Currently I would expect to implement an API and (ideally) an SQL interface. The SQL side of things would probably just call the API, but the details need to be worked out. 3. My current plan/idea/suggestion is to implement an API that, given an object type (eg. 'table'), will return the most basic definition for the object and a list of related objects. This API could be used by both psql and pg_dump to create SQL for display & backup purposes. 4. Even if I achieve nothing more than breaking 'libdump' out of pg_dump, and adding a \D command to psql, does anybody object to this as an objective? Any comments and suggestions would be greatly appreciated... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner writes: > 1. Can anyone think of an existing utility or library in PG that would > profit from a more convenient interface for querying metadata than is > available from ODBC or the pg_* tables? People do ask all the time how they can reproduce psql's, say, "\d" output for themselves. Notice that they don't ask how to reproduce pg_dump's output, pg_dump does that just fine, but people want to get at the metadata and process it for themselves. I imagine people have web tools for modelling their databases, etc. or change tables on the fly. I'd say there is a big market for this sort of thing, and the current state of affairs that every system catalog change is in some way liable to break all these applications is a non-negligible problem. It is therefore my opinion that making SQL dumps widely available is, while certainly useful, not the real core of the problem. Getting to the data that underlies the dump is. Anything that does that would be a step ahead. Some things to consider: * A backend-side implementation: we already have a couple of views to that effect (pg_rules, pg_views). You can always add more. * A C API: How many people do use C? How to propagate this to other interfaces? Are we really up to creating and maintaining another hand-crafted API? > 4. Even if I achieve nothing more than breaking 'libdump' out of pg_dump, > and adding a \D command to psql, does anybody object to this as an objective? If that's your concern that you can just link a couple of pg_dump's object files into psql. You could perhaps create a de facto API within pg_dump, which would also go a long way in cleaning up pg_dump's source code. (When I was wheeling and dealing in psql last fall I was faced with the same questions regarding the table printing facilities. People are not really interested in getting pretty printed ASCII versions of their query results; their web browser or widget set can do that much better. And even those that do wouldn't like the particular formatting choices anyway. So I withdrew psql's printing functions from the public. You can still link in print.o into your program if you want to, but please don't complain if I change the spacing and your application breaks.) -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 03:47 22/07/00 +0200, Peter Eisentraut wrote: >Philip Warner writes: > >> 1. Can anyone think of an existing utility or library in PG that would >> profit from a more convenient interface for querying metadata than is >> available from ODBC or the pg_* tables? > >People do ask all the time how they can reproduce psql's, say, "\d" output >for themselves. Notice that they don't ask how to reproduce pg_dump's >output, pg_dump does that just fine, but people want to get at the >metadata and process it for themselves. I imagine people have web tools >for modelling their databases, etc. or change tables on the fly. I'd say >there is a big market for this sort of thing, and the current state of >affairs that every system catalog change is in some way liable to break >all these applications is a non-negligible problem. This is very true; and to satisfy this issue, I think you are right in suggesting using (a subset of) the Info schemas. It's possibly even worthwhile creating something like: Create View TABLES(TABLE_NAME) as Select whatever... (this can be identical to pg_tables, for example) and Create View COLUMNS(TABLE_NAME, COLUMN_NAME, DATA_TYPE,...) as Select...whatever to satisfy some of these needs; ie. we don't even need to provide all the standard columns, so long as we only provide ones that are in the standard, and we also obey the visibility rules from the standard. Just these two would, I think, provide the \d function in psql. Not that I'm volunteering; pg_dump and 'insert/update...returning' are going to take up enough time right now... >It is therefore my opinion that making SQL dumps widely available is, >while certainly useful, not the real core of the problem. Getting to the >data that underlies the dump is. Anything that does that would be a step >ahead. Yes & no. There are two quite different problems, and the solution to one does not provide a solution to the other, it just provides a layer on which the other can be built: 1. How can we provide a stable interface for examining DB definitions that works across releases. You have answered this (to some extent) by suggesting Info schemas. Note that these will still not work for non-standard features, but we could use pg_* views for pg-specific structures. Perhaps whoever works on pgaccess might have a good definition of the minimum requirements from an initial implementation of info schemas? It *might* be worth naming the guaranteed public views differently from the normal pg_* tables; maybe pg_std_rules, then we can tell users that any table/view that starts with pg_std_* will exist across versions, and always at least have the columns present in the current release. 2. How can we provide a function in psql that displays definitions of all things that a user can create with SQL? This is the problem I set out to address, which is why I looked at using pg_dump code. Which then led me to the philosophical question of what should be responsible for 'decompiling' the internal representation of the metadata. My feeling is that the backend 'compiles' SQL into metadata, so it should also decompile it. >Some things to consider: > >* A backend-side implementation: we already have a couple of views to that >effect (pg_rules, pg_views). You can always add more. I'll have a look at implementing something using views & functions as per pg_views. >* A C API: How many people do use C? How to propagate this to other >interfaces? Are we really up to creating and maintaining another >hand-crafted API? I don't really want to be left maintaining such a thing and I'm sure nobody else does, but as someone (Tom?) suggested, I'll have to develop a private API to do the work, so making it public might be a good idea...maybe. > >> 4. Even if I achieve nothing more than breaking 'libdump' out of pg_dump, >> and adding a \D command to psql, does anybody object to this as an objective? > >If that's your concern that you can just link a couple of pg_dump's object >files into psql. You could perhaps create a de facto API within pg_dump, >which would also go a long way in cleaning up pg_dump's source code. This is what I want to do as a minimum. I don't really want to create pg_get_tablesql, pg_get_constraintsql, pg_get_typesql, pg_get_viewsql etc, but maybe that's the best option after all. Maybe to avoid conflicts with pg_get_ruledef etc, I should use names formatted as pg_dump_sql_*, but the idea is the same. >(When I was wheeling and dealing in psql last fall I was faced with the >same questions regarding the table printing facilities. People are not >really interested in getting pretty printed ASCII versions of their query >results; their web browser or widget set can do that much better. And even >those that do wouldn't like the particular formatting choices anyway. So I >withdrew psql's printing functions from the public. You can still link in >print.o into your program if you want to, but please don't complain if I >change the spacing and your application breaks.) It's a bit different with definitions; AFAICT there is no way of (simply) getting the definition of a constraint while in psql. *I* want to be able to do this, and personal need is a pretty good motivator... So, perhaps I need to implement the metadata dumping code from pg_dump as a set of functions that can be added to the backend and which are called in one or more view definitions which pg_dump can use. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Sat, 22 Jul 2000, Philip Warner wrote: > > 3. My current plan/idea/suggestion is to implement an API that, given an > object type (eg. 'table'), will return the most basic definition for the > object and a list of related objects. This API could be used by both psql > and pg_dump to create SQL for display & backup purposes. It's sounds good for me. If your API will good it is usable for more programs than for psql and pg_dump. And a "C" API is included-able to more langs (Python, PHP..) A question is, what will returns from this API, full SQL statements in strings only or some structs too? If will here some common mechanism how obtain definition for tables (etc.) not is a problem start development on advanced client programs like Oracle Designer or make programs for SQL DB to XML conversion and DB schema visualization (for example in the nice program dia)... (don't care I know pgaccess). IMHO all OpenSource SQL are behind in clients part. Karel
Philip Warner writes: > 2. How can we provide a function in psql that displays definitions of all > things that a user can create with SQL? > > This is the problem I set out to address, which is why I looked at using > pg_dump code. Which then led me to the philosophical question of what > should be responsible for 'decompiling' the internal representation of the > metadata. My feeling is that the backend 'compiles' SQL into metadata, so > it should also decompile it. There is certainly an intellectual merit in making the backend spit out the SQL it took in. But, just for kicks, there is also an intellectual merit in *not* doing it: Whatever you dump represents the state of a database, but SQL only provides a method for modifying the state, not for representing it. The backend has, by and large, no idea how it got to the current state. Which is the philosophical counter-argument to the notion that the backend should also provide the inverse operation to its query parser -- there is no inverse. Getting of that high horse, the practical problems I see is that there are about a thousand parameters which would affect how you dump the SQL: * indentation * capitalization * quotes * standards compliance (to migrate to a different RDBMS) * order * view as view, view as rule? * with/without psql specific stuff (\connect ...) * How to resolve circular dependencies? These kinds of decisions are better off with a client application, ISTM. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 18:06 24/07/00 +0200, Peter Eisentraut wrote: >Philip Warner writes: > >> 2. How can we provide a function in psql that displays definitions of all >> things that a user can create with SQL? >> >> This is the problem I set out to address, which is why I looked at using >> pg_dump code. Which then led me to the philosophical question of what >> should be responsible for 'decompiling' the internal representation of the >> metadata. My feeling is that the backend 'compiles' SQL into metadata, so >> it should also decompile it. > >There is certainly an intellectual merit in making the backend spit out >the SQL it took in. But, just for kicks, there is also an intellectual >merit in *not* doing it: Whatever you dump represents the state of a >database, but SQL only provides a method for modifying the state, not for >representing it. The backend has, by and large, no idea how it got to the >current state. Which is the philosophical counter-argument to the notion >that the backend should also provide the inverse operation to its query >parser -- there is no inverse. No, but any DB state must be representable, and the DB may be the best arbiter of how to represent it. If psql is used to transform the database from state 'A' to state 'B', then the database should be able to describe state 'B', ISTM. I don't know if PG allows table columns to depend on columns in other tables, but if they can then dumping a valid schema becomes almost impossible. [Dec/Rdb allows 'computed by' columns which can contain arithmetic operations on columns in the current table, or even 'select..' statements. This means that defining a pair of cross-referenced tables can be a multi-step process]. However, the fact the problem is hard/impossible, is not really an argument for placing it in a FE or BE, AFAICT... >Getting of that high horse, the practical problems I see is that there are >about a thousand parameters which would affect how you dump the SQL: > >* indentation This would be picked arbitrarily. >* capitalization >* quotes These are related. pg_dump already has to handle this, and the client should be able to specify how quotes are handled (needs to be in the API/SQL/View). >* standards compliance (to migrate to a different RDBMS) This is currently a problem in pg_dump, and could be left out, I think. To dump valid SQL-nn format output, we probably want to wqait for info schemas. Also, we *can't* produce standard SQL for some PG features, I think. >* order This is an issue for the client. I am not proposing a 'Dump_The_Entire_Datbase' call; just a 'Dump_One_Thing_And_A_List_Of_Dependencies' call. >* view as view, view as rule? Also an issue with pg_dump currently; I'd prefer to see it as a view. >* with/without psql specific stuff (\connect ...) Another issue for the client. >* How to resolve circular dependencies? Issue for the client. >These kinds of decisions are better off with a client application, ISTM. Most of them would be; the only things handled at the backend would be 'formatting', quotes (based on client request), and SQL generation. The client has to put the SQL together in a meaningful way including \connect statements and handling dependencies. I agree that this does not sit well with the "DB must be able to dump it's state" argument; perhaps I need to just clean up pg_dump a little, link it into psql, then get on with something I actually want to do...(ie. insert/update...returning). ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner writes: > >* order > > This is an issue for the client. I am not proposing a > 'Dump_The_Entire_Datbase' call; just a > 'Dump_One_Thing_And_A_List_Of_Dependencies' call. How? Functions can only return one value reliably. > >* How to resolve circular dependencies? > > Issue for the client. Not really. If you have a circular dependency `table --(column default)--> function --(reads)--> table' then you need to dump the table differently (perhaps leave out the constraint and add it after the function is created). -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 22:41 25/07/00 +0200, Peter Eisentraut wrote: >Philip Warner writes: > >> >* order >> >> This is an issue for the client. I am not proposing a >> 'Dump_The_Entire_Datbase' call; just a >> 'Dump_One_Thing_And_A_List_Of_Dependencies' call. > >How? Functions can only return one value reliably. The API is easy, and in the case of the database, it would take the form of multiple rows in a result set, or possibly two querues - one returning a single value (the definition), and the oter returning the dependants in multiple rows. I guess another option would be to return an array. >> >* How to resolve circular dependencies? >> >> Issue for the client. > >Not really. If you have a circular dependency `table --(column default)--> >function --(reads)--> table' then you need to dump the table differently >(perhaps leave out the constraint and add it after the function is >created). I would expect ALL constraints would be left off the table definition - it would dump the most basic definition possible, with references to the constraints appearing in the dependency list. That is why I say it's a problem for the client. For the purpose of displaying the deinition in psql, circularity does not matter, and in the case of pg_dump, it actually wants to separate constraints and the table definition. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/