Re: pg_dump as a bunch of PostgreSQL functions - Mailing list pgsql-hackers

From Mark Gibson
Subject Re: pg_dump as a bunch of PostgreSQL functions
Date
Msg-id a5c4cb4bff23489c818762a380dbda3341499af9@cromwell.co.uk
Whole thread Raw
In response to Re: pg_dump as a bunch of PostgreSQL functions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Mark Gibson <gibsonm@cromwell.co.uk> writes:
>>I have an idea, to break pg_dump into functions within PostgreSQL.

Philip Warner wrote:
> However, there are some complications because pg_dump is also the 
> upgrade tool; the backed can only know how to describe itself for the 
> current dialect of SQL accepted by PG. As we upgrade and improve the 
> SQL, and add features, pg_dump needs to talk to old backends and dump 
> prior versions in a format compatible with current (new) versions. This 
> means that for some purposes it will not be able to use backend 
> functions, or at least will have to have it's own mutant version of them.

Tom Lane wrote:
> This strikes me as largely useless, because the problems that are 
> actually hard for pg_dump are not at the level of individual objects;
> they have to do with problems like determining a safe dump order and
> altering definitions to break circularities in the dump requirements.
> I don't think that extending the pg_get_xxx family of functions would
> make pg_dump's life easier by any measurable amount.

I was thinking really of very low-level dumping functions,
that just dump simple single statements.
Its mainly for development purposes, we generally develop db apps in
three or four stages:
1, In a developer only db, each developer has a private schema to 
experiment in (prototypes, proof-of-concepts, alpha stage, etc).
2, Then its moved into its own schema on the same developer server for 
further stabilization (alpha/beta stage).
3, Once it reaches beta, it moves onto a mirror of the production 
server, for further testing by users (beta/pre-release stage).
4, For release, it's moved onto a production server for general consumption.

So there is lots of dumping and reloading of schemas, the only way
to do it is by using pg_dump and then hand editing the dump (or with
sed/awk scripts etc.), which is a bit of a pain.

I was thinking that if all the dumping functionality was available
as PostgreSQL functions, then a simple PL/pgSQL function could be
written to dump exactly what is required, and using dblink, fed
directly from one db to another. Nice :)

Dependency functions could be developed to determine the ideal
dump order for higher-level dumping, and would also be useful
for displaying dependency graphs in admin interfaces
(phpPgAdmin, pgAdmin III, etc.).

> There is also a fundamental problem with the current pg_get_xxx
> functions, which is that they rely on backend-internal catalog
> operations that generally use SnapshotNow semantics.  This is not what
> we want to guarantee that pg_dump dumps a consistent snapshot --- we
> need to look at catalog rows that existed as of pg_dump's serializable
> snapshot, instead.  We have gotten away with it so far because pg_dump
> starts by taking read locks on every table in sight, and that is
> sufficient to block schema changes on the tables.  But extending the
> pg_get_xxx approach to non-table-related objects would be seriously
> dangerous.  (I think pg_get_viewdef is already broken, actually,
> since you can't lock a view.)

So, are pg_catalog tables subject to MVCC in the same way as user
tables? ie. If I BEGIN a transaction, will the pg_catalog data
remain consistent thoughout the transaction regardless of any DDL
commands in a parallel session?

Philip Warner wrote:> Perhaps it would be nice if, in each new version we created a library> that could be built
againstold versions to provide the functions needed> by pg_dump to upgrade, and a similar library would form part of
thenew> version as well. Kind of a 'pg_dump translation plugin'. This may be way> too expensive an option, when a few
'if'statements inside pg_dump will> achieve almost the same result. It would remove/reduce bloat in pg_dump> and make
thefunctions available more generally, at the expense of> duplicating lots of code for each supported version.
 

I was thinking of sharing a library (or just code base) between
pg_dump and built-in functions, so that pg_dump doesn't have to
rely on old built-in functions of a db you are trying to upgrade.

-- 
Mark Gibson <gibsonm |AT| cromwell |DOT| co |DOT| uk>
Web Developer & Database Admin
Cromwell Tools Ltd.
Leicester, England.


pgsql-hackers by date:

Previous
From: Michael Glaesemann
Date:
Subject: Re: subtransaction assert failure
Next
From: Lamar Owen
Date:
Subject: Re: PostgreSQL Core Committee Welcomes New Member