Thread: pg_dump as a bunch of PostgreSQL functions
Hello folks, I have an idea, to break pg_dump into functions within PostgreSQL. We currently have some functions like pg_get_viewdef, but it would be great to have a function or functions to dump the SQL definition of any object. This would allow easy dumping of any object from admin interfaces, and allow totally customized consistent dumps from within a transaction. In combination with dblink, it would also allow easy replication of objects between databases. From there we could add diff functions, that take two similar objects and produces the SQL commands to transform one into the other. This would help greatly when it comes to development, migrating the schema, etc. Would this be practical, would it be possible to break the pg_dump code out into PostgreSQL functions? Anyone have any ideas/thoughts/suggestions on this? Cheers -- Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk> Web Developer & Database Admin Cromwell Tools Ltd. Leicester, England.
> Would this be practical, would it be possible to break the pg_dump > code out into PostgreSQL functions? > > Anyone have any ideas/thoughts/suggestions on this? It's been my plan for sometime, I'm time-starved at the moment however. Chris
At 06:00 PM 14/09/2004, Mark Gibson wrote: >I have an idea, to break pg_dump into functions within PostgreSQL. This has been suggested before, and I think been generally accepted as the right broad approach (the basic idea that the pg backend should know how to describe itself). Recent versions of pg_dump have started using backend functions to dump database structures (eg. type definitions). As time goes by more functions will be written, but I don't think it's the highest priority on anybody's list. There are also the information schemas which are the ISO way of getting database definitions; these can/should be used where possible. However, there are some complications because pg_dump is also the upgrade tool; the backed can only know how to describe itself for the current dialect of SQL accepted by PG. As we upgrade and improve the SQL, and add features, pg_dump needs to talk to old backends and dump prior versions in a format compatible with current (new) versions. This means that for some purposes it will not be able to use backend functions, or at least will have to have it's own mutant version of them. There are other differences; for reasons of performance and atomicity, we try to keep the items dumped as simple as possible. eg. in 8.0, a table definition will ultimately be dumped as: 1. set default_tablespace=xxx 1. set search_path=xxx 2. create table (no constraints, tablespace or namespace clauses) 4. load table data 3. alter table add constraint... 5. set table acls A 'friendly' definition would at least contain the namespace & constraints, but pg_dump does not want that. So it's not a simple as it sounds. <random_idea> Perhaps it would be nice if, in each new version we created a library that could be built against old versions to provide the functions needed by pg_dump to upgrade, and a similar library would form part of the new version as well. Kind of a 'pg_dump translation plugin'. This may be way too expensive an option, when a few 'if' statements inside pg_dump will achieve almost the same result. It would remove/reduce bloat in pg_dump and make the functions available more generally, at the expense of duplicating lots of code for each supported version. </random_idea> ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Mark Gibson <gibsonm@cromwell.co.uk> writes: > I have an idea, to break pg_dump into functions within PostgreSQL. This strikes me as largely useless, because the problems that are actually hard for pg_dump are not at the level of individual objects; they have to do with problems like determining a safe dump order and altering definitions to break circularities in the dump requirements. I don't think that extending the pg_get_xxx family of functions would make pg_dump's life easier by any measurable amount. There is also a fundamental problem with the current pg_get_xxx functions, which is that they rely on backend-internal catalog operations that generally use SnapshotNow semantics. This is not what we want to guarantee that pg_dump dumps a consistent snapshot --- we need to look at catalog rows that existed as of pg_dump's serializable snapshot, instead. We have gotten away with it so far because pg_dump starts by taking read locks on every table in sight, and that is sufficient to block schema changes on the tables. But extending the pg_get_xxx approach to non-table-related objects would be seriously dangerous. (I think pg_get_viewdef is already broken, actually, since you can't lock a view.) regards, tom lane
Yes, IMHO this looks like a god idea. I planed to do something like that, but to return some kind of description language (XML for example),so restore could decide at runtime what exactly to do with that metadata ( depending on what user wants - create database or not, create users or not, reset serializators or not, replace existing objects or just create objects that are not existing at database, even do a "pgpatch" on existing objects), but this needs more time than I currently have :-( I still hope I'll do it some day... Regards ! ----- Original Message ----- From: "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> To: "Mark Gibson" <gibsonm@cromwell.co.uk> Cc: "Hackers (PostgreSQL)" <pgsql-hackers@postgresql.org> Sent: Tuesday, September 14, 2004 10:44 AM Subject: Re: [HACKERS] pg_dump as a bunch of PostgreSQL functions > > Would this be practical, would it be possible to break the pg_dump > > code out into PostgreSQL functions? > > > > Anyone have any ideas/thoughts/suggestions on this? > > It's been my plan for sometime, I'm time-starved at the moment however. > > Chris > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
Mark Gibson <gibsonm@cromwell.co.uk> writes: >>I have an idea, to break pg_dump into functions within PostgreSQL. Philip Warner wrote: > However, there are some complications because pg_dump is also the > upgrade tool; the backed can only know how to describe itself for the > current dialect of SQL accepted by PG. As we upgrade and improve the > SQL, and add features, pg_dump needs to talk to old backends and dump > prior versions in a format compatible with current (new) versions. This > means that for some purposes it will not be able to use backend > functions, or at least will have to have it's own mutant version of them. Tom Lane wrote: > This strikes me as largely useless, because the problems that are > actually hard for pg_dump are not at the level of individual objects; > they have to do with problems like determining a safe dump order and > altering definitions to break circularities in the dump requirements. > I don't think that extending the pg_get_xxx family of functions would > make pg_dump's life easier by any measurable amount. I was thinking really of very low-level dumping functions, that just dump simple single statements. Its mainly for development purposes, we generally develop db apps in three or four stages: 1, In a developer only db, each developer has a private schema to experiment in (prototypes, proof-of-concepts, alpha stage, etc). 2, Then its moved into its own schema on the same developer server for further stabilization (alpha/beta stage). 3, Once it reaches beta, it moves onto a mirror of the production server, for further testing by users (beta/pre-release stage). 4, For release, it's moved onto a production server for general consumption. So there is lots of dumping and reloading of schemas, the only way to do it is by using pg_dump and then hand editing the dump (or with sed/awk scripts etc.), which is a bit of a pain. I was thinking that if all the dumping functionality was available as PostgreSQL functions, then a simple PL/pgSQL function could be written to dump exactly what is required, and using dblink, fed directly from one db to another. Nice :) Dependency functions could be developed to determine the ideal dump order for higher-level dumping, and would also be useful for displaying dependency graphs in admin interfaces (phpPgAdmin, pgAdmin III, etc.). > There is also a fundamental problem with the current pg_get_xxx > functions, which is that they rely on backend-internal catalog > operations that generally use SnapshotNow semantics. This is not what > we want to guarantee that pg_dump dumps a consistent snapshot --- we > need to look at catalog rows that existed as of pg_dump's serializable > snapshot, instead. We have gotten away with it so far because pg_dump > starts by taking read locks on every table in sight, and that is > sufficient to block schema changes on the tables. But extending the > pg_get_xxx approach to non-table-related objects would be seriously > dangerous. (I think pg_get_viewdef is already broken, actually, > since you can't lock a view.) So, are pg_catalog tables subject to MVCC in the same way as user tables? ie. If I BEGIN a transaction, will the pg_catalog data remain consistent thoughout the transaction regardless of any DDL commands in a parallel session? Philip Warner wrote:> Perhaps it would be nice if, in each new version we created a library> that could be built againstold versions to provide the functions needed> by pg_dump to upgrade, and a similar library would form part of thenew> version as well. Kind of a 'pg_dump translation plugin'. This may be way> too expensive an option, when a few 'if'statements inside pg_dump will> achieve almost the same result. It would remove/reduce bloat in pg_dump> and make thefunctions available more generally, at the expense of> duplicating lots of code for each supported version. I was thinking of sharing a library (or just code base) between pg_dump and built-in functions, so that pg_dump doesn't have to rely on old built-in functions of a db you are trying to upgrade. -- Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk> Web Developer & Database Admin Cromwell Tools Ltd. Leicester, England.