Re: pg_dump in 7.4 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_dump in 7.4
Date
Msg-id 20875.1037194423@sss.pgh.pa.us
Whole thread Raw
In response to pg_dump in 7.4  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Responses Re: pg_dump in 7.4  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Re: pg_dump in 7.4  (Neil Conway <neilc@samurai.com>)
List pgsql-hackers
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Has anyone given much thought to improving pg_dump's object order algorithm
> for 7.4?  It seems that now we have dependencies, it should just be a matter
> of doing a breadth-first or depth-first search over the pg_depend table to
> generate a valid order of oids.

I've thought about this a little.  Using the pg_depend data would
obviously give pg_dump a big leg up on the problem, but it's by no means
a trivial task even so.  Some things to think about:

* We don't store dependencies for SQL functions to things mentioned in
the SQL function body.  (Maybe we should, but we don't.)  So there's
data missing in that case, and possibly other cases.

* pg_dump really ought to still work for pre-7.3 databases.  What will
its fallback strategy be like when it hasn't got pg_depend?

* With ALTER TABLE, CREATE OR REPLACE FUNCTION, etc, it's not hard at
all to create situations with circular dependencies.  Right now pg_dump
is guaranteed to produce an unusable dump in such cases, but our
ambition should be to make it work.  That means pg_dump needs a strategy
for breaking circular dependency paths, by itself using ALTER when
needed to postpone a reference.


The thought that I'd been toying with is to build a list of inter-object
dependencies (using pg_depend if available, else fall back on pg_dump's
native wit, ie, the rather limited set of dependencies it already
understands).  Then do a topological sort, preferring to maintain OID
order in cases where the object ordering is underspecified.  When the
sort fails (ie, there's a circular dependency) then modify the set of
dumpable objects by breaking some object into two parts (a base
declaration and an ALTER command); this changes the dependencies too.
Repeat the sort and adjustment steps until the sort succeeds.

If you're not familiar with topological sorts, look at Knuth or your
favorite algorithms text.  There are one or two instances of the method
in PG already (deadlock detection, for example).
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Zeugswetter Andreas SB SD"
Date:
Subject: Re: RC1?
Next
From: Tom Lane
Date:
Subject: Re: pg_dump in 7.4