Thread: pg_dump as a bunch of PostgreSQL functions

pg_dump as a bunch of PostgreSQL functions

From

Mark Gibson

Date:

14 September 2004, 09:40:12

Hello folks,

I have an idea, to break pg_dump into functions within PostgreSQL.

We currently have some functions like pg_get_viewdef, but it would
be great to have a function or functions to dump the SQL definition
of any object.

This would allow easy dumping of any object from admin interfaces,
and allow totally customized consistent dumps from within a
transaction.
In combination with dblink, it would also allow easy replication of
objects between databases.
From there we could add diff functions, that take two similar
objects and produces the SQL commands to transform one into the
other.
This would help greatly when it comes to development, migrating
the schema, etc.

Would this be practical, would it be possible to break the pg_dump
code out into PostgreSQL functions?

Anyone have any ideas/thoughts/suggestions on this?

Cheers
-- 
Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk>
Web Developer & Database Admin
Cromwell Tools Ltd.
Leicester, England.

Re: pg_dump as a bunch of PostgreSQL functions

From

Christopher Kings-Lynne

Date:

14 September 2004, 09:43:15

> Would this be practical, would it be possible to break the pg_dump
> code out into PostgreSQL functions?
> 
> Anyone have any ideas/thoughts/suggestions on this?

It's been my plan for sometime, I'm time-starved at the moment however.

Chris

Re: pg_dump as a bunch of PostgreSQL functions

From

Philip Warner

Date:

14 September 2004, 14:12:55

At 06:00 PM 14/09/2004, Mark Gibson wrote:
>I have an idea, to break pg_dump into functions within PostgreSQL.

This has been suggested before, and I think been generally accepted as the
right broad approach (the basic idea that the pg backend should know how to
describe itself).

Recent versions of pg_dump have started using backend functions to dump
database structures (eg. type definitions). As time goes by more functions
will be written, but I don't think it's the highest priority on anybody's
list. There are also the information schemas which are the ISO way of
getting database definitions; these can/should be used where possible.

However, there are some complications because pg_dump is also the upgrade
tool; the backed can only know how to describe itself for the current
dialect of SQL accepted by PG. As we upgrade and improve the SQL, and add
features, pg_dump needs to talk to old backends and dump prior versions in
a format compatible with current (new) versions. This means that for some
purposes it will not be able to use backend functions, or at least will
have to have it's own mutant version of them.

There are other differences; for reasons of performance and atomicity, we
try to keep the items dumped as simple as possible. eg. in 8.0, a table
definition will ultimately be dumped as:
1. set default_tablespace=xxx 1. set search_path=xxx 2. create table (no constraints, tablespace or namespace
clauses) 4. load table data 3. alter table add constraint... 5. set table acls

A 'friendly' definition would at least contain the namespace & constraints,
but pg_dump does not want that.

So it's not a simple as it sounds.

<random_idea>
Perhaps it would be nice if, in each new version we created a library that
could be built against old versions to provide the functions needed by
pg_dump to upgrade, and a similar library would form part of the new
version as well. Kind of a 'pg_dump translation plugin'. This may be way
too expensive an option, when a few 'if' statements inside pg_dump will
achieve almost the same result. It would remove/reduce bloat in pg_dump and
make the functions available more generally, at the expense of duplicating
lots of code for each supported version.
</random_idea>

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \| | --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

Re: pg_dump as a bunch of PostgreSQL functions

From

Tom Lane

Date:

14 September 2004, 15:19:41

Mark Gibson <gibsonm@cromwell.co.uk> writes:
> I have an idea, to break pg_dump into functions within PostgreSQL.

This strikes me as largely useless, because the problems that are 
actually hard for pg_dump are not at the level of individual objects;
they have to do with problems like determining a safe dump order and
altering definitions to break circularities in the dump requirements.
I don't think that extending the pg_get_xxx family of functions would
make pg_dump's life easier by any measurable amount.

There is also a fundamental problem with the current pg_get_xxx
functions, which is that they rely on backend-internal catalog
operations that generally use SnapshotNow semantics.  This is not what
we want to guarantee that pg_dump dumps a consistent snapshot --- we
need to look at catalog rows that existed as of pg_dump's serializable
snapshot, instead.  We have gotten away with it so far because pg_dump
starts by taking read locks on every table in sight, and that is
sufficient to block schema changes on the tables.  But extending the
pg_get_xxx approach to non-table-related objects would be seriously
dangerous.  (I think pg_get_viewdef is already broken, actually,
since you can't lock a view.)
        regards, tom lane

Re: pg_dump as a bunch of PostgreSQL functions

From

"Darko Prenosil"

Date:

14 September 2004, 15:45:30

Yes, IMHO this looks like a god idea. I planed to do something like that,
but to return some kind of description language (XML for example),so restore
could decide at runtime what exactly to do with that metadata ( depending on
what user wants - create database or not, create users or not, reset
serializators or not, replace existing objects or just create objects that
are not existing at database, even do a "pgpatch" on existing objects), but
this needs more time than I currently have :-(
I still hope I'll do it some day...

Regards !

----- Original Message -----
From: "Christopher Kings-Lynne" <chriskl@familyhealth.com.au>
To: "Mark Gibson" <gibsonm@cromwell.co.uk>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers@postgresql.org>
Sent: Tuesday, September 14, 2004 10:44 AM
Subject: Re: [HACKERS] pg_dump as a bunch of PostgreSQL functions


> > Would this be practical, would it be possible to break the pg_dump
> > code out into PostgreSQL functions?
> >
> > Anyone have any ideas/thoughts/suggestions on this?
>
> It's been my plan for sometime, I'm time-starved at the moment however.
>
> Chris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: pg_dump as a bunch of PostgreSQL functions

From

Mark Gibson

Date:

16 September 2004, 14:54:47

Mark Gibson <gibsonm@cromwell.co.uk> writes:
>>I have an idea, to break pg_dump into functions within PostgreSQL.

Philip Warner wrote:
> However, there are some complications because pg_dump is also the 
> upgrade tool; the backed can only know how to describe itself for the 
> current dialect of SQL accepted by PG. As we upgrade and improve the 
> SQL, and add features, pg_dump needs to talk to old backends and dump 
> prior versions in a format compatible with current (new) versions. This 
> means that for some purposes it will not be able to use backend 
> functions, or at least will have to have it's own mutant version of them.

Tom Lane wrote:
> This strikes me as largely useless, because the problems that are 
> actually hard for pg_dump are not at the level of individual objects;
> they have to do with problems like determining a safe dump order and
> altering definitions to break circularities in the dump requirements.
> I don't think that extending the pg_get_xxx family of functions would
> make pg_dump's life easier by any measurable amount.

I was thinking really of very low-level dumping functions,
that just dump simple single statements.
Its mainly for development purposes, we generally develop db apps in
three or four stages:
1, In a developer only db, each developer has a private schema to 
experiment in (prototypes, proof-of-concepts, alpha stage, etc).
2, Then its moved into its own schema on the same developer server for 
further stabilization (alpha/beta stage).
3, Once it reaches beta, it moves onto a mirror of the production 
server, for further testing by users (beta/pre-release stage).
4, For release, it's moved onto a production server for general consumption.

So there is lots of dumping and reloading of schemas, the only way
to do it is by using pg_dump and then hand editing the dump (or with
sed/awk scripts etc.), which is a bit of a pain.

I was thinking that if all the dumping functionality was available
as PostgreSQL functions, then a simple PL/pgSQL function could be
written to dump exactly what is required, and using dblink, fed
directly from one db to another. Nice :)

Dependency functions could be developed to determine the ideal
dump order for higher-level dumping, and would also be useful
for displaying dependency graphs in admin interfaces
(phpPgAdmin, pgAdmin III, etc.).

> There is also a fundamental problem with the current pg_get_xxx
> functions, which is that they rely on backend-internal catalog
> operations that generally use SnapshotNow semantics.  This is not what
> we want to guarantee that pg_dump dumps a consistent snapshot --- we
> need to look at catalog rows that existed as of pg_dump's serializable
> snapshot, instead.  We have gotten away with it so far because pg_dump
> starts by taking read locks on every table in sight, and that is
> sufficient to block schema changes on the tables.  But extending the
> pg_get_xxx approach to non-table-related objects would be seriously
> dangerous.  (I think pg_get_viewdef is already broken, actually,
> since you can't lock a view.)

So, are pg_catalog tables subject to MVCC in the same way as user
tables? ie. If I BEGIN a transaction, will the pg_catalog data
remain consistent thoughout the transaction regardless of any DDL
commands in a parallel session?

Philip Warner wrote:> Perhaps it would be nice if, in each new version we created a library> that could be built
againstold versions to provide the functions needed> by pg_dump to upgrade, and a similar library would form part of
thenew> version as well. Kind of a 'pg_dump translation plugin'. This may be way> too expensive an option, when a few
'if'statements inside pg_dump will> achieve almost the same result. It would remove/reduce bloat in pg_dump> and make
thefunctions available more generally, at the expense of> duplicating lots of code for each supported version.

I was thinking of sharing a library (or just code base) between
pg_dump and built-in functions, so that pg_dump doesn't have to
rely on old built-in functions of a db you are trying to upgrade.

-- 
Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk>
Web Developer & Database Admin
Cromwell Tools Ltd.
Leicester, England.