Thread: BUG #2308: pg_dump -a does not respect referential dependencies

BUG #2308: pg_dump -a does not respect referential dependencies

From

"Matthew George"

Date:

08 March 2006, 16:02:19

The following bug has been logged online:

Bug reference:      2308
Logged by:          Matthew George
Email address:      georgema@corp.earthlink.net
PostgreSQL version: 8.0.3
Operating system:   Darwin Kernel Version 8.5.0: Sun Jan 22 10:38:46 PST
2006; root:xnu-792.6.61.obj~1/RELEASE_PPC Power Macintosh powerpc
Description:        pg_dump -a does not respect referential dependencies
Details:

When using pg_dump to dump a database, the schema and data are arranged
within the dump so that it may be imported without violating referential
integrity.

When using the -a option to get a data-only dump, the data is ordered in the
dump alphabetically by table.

If a new schema is loaded into a fresh database, the output from the
data-only dump cannot be imported via \i in psql without manually editing
the dump file and reordering the inserts / copies such that dependent tables
have their data loaded before the tables that depend on them.  This is
inconvenient at best.

Since the logic obviously exists within pg_dump already to arrange the data
in the correct order of reference dependencies, can this be added to the
code path for `pg_dump -a` as well?  Or can another option be added that
specifies alphabetic vs. dependency order?

This would really cut down the time it takes to do schema upgrades on
projects that I work with.

Thanks

Re: BUG #2308: pg_dump -a does not respect referential dependencies

From

Tom Lane

Date:

08 March 2006, 17:01:34

"Matthew George" <georgema@corp.earthlink.net> writes:
> Since the logic obviously exists within pg_dump already to arrange the data
> in the correct order of reference dependencies, can this be added to the
> code path for `pg_dump -a` as well?

No.  In a data-only restore there may not be *any* ordering that works
--- consider circular dependencies.

The best practice is to do standard schema+data dumps, wherein the
ordering problem can be handled properly by not creating the FK
constraints until after the data is loaded.  If you really want to
do a data-only restore, I'd suggest dropping the FK constraints
and re-adding them afterwards (which will be a lot faster than
row-by-row retail checks would be, anyway).

Another possibility is the --disable-triggers option, but I can't
really recommend that, because if there are any referential problems
in the data you load, that way will fail to catch it.

            regards, tom lane

Re: BUG #2308: pg_dump -a does not respect referential dependencies

From

Jim Nasby

Date:

08 March 2006, 18:02:08

On Mar 8, 2006, at 3:01 PM, Tom Lane wrote:

> "Matthew George" <georgema@corp.earthlink.net> writes:
> The best practice is to do standard schema+data dumps, wherein the
> ordering problem can be handled properly by not creating the FK
> constraints until after the data is loaded.  If you really want to
> do a data-only restore, I'd suggest dropping the FK constraints
> and re-adding them afterwards (which will be a lot faster than
> row-by-row retail checks would be, anyway).

BTW, it would be really nice if we provided a better way to do this
than manually dropping all the FK constraints and adding them back in
later. Would it be difficult to allow deferring all constraints in
the database during a specified transaction? That would allow for
loading the data in a transaction and doing the constraint checking
later...
--
Jim C. Nasby, Database Architect                decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: BUG #2308: pg_dump -a does not respect referential dependencies

From

Tom Lane

Date:

08 March 2006, 18:37:16

Jim Nasby <decibel@decibel.org> writes:
> BTW, it would be really nice if we provided a better way to do this
> than manually dropping all the FK constraints and adding them back in
> later. Would it be difficult to allow deferring all constraints in
> the database during a specified transaction? That would allow for
> loading the data in a transaction and doing the constraint checking
> later...

You can try SET CONSTRAINTS ALL DEFERRED, but that only works for
constraints that are declared deferrable, which by default FK
constraints are not (stupid but that's what the spec requires).
In any case this would still have performance issues because the
behavior is tuned for transactions that update relatively small
numbers of rows.  Drop/add constraint is a lot better choice in
the context of a bulk load.

I was toying just now with the idea of a pg_dump mode that would issue
the drop and re-add constraint commands for you.  This would only help
for constraints that pg_dump knows of (ie were in the source database),
not any random new FK constraints that might be in the DB you are
loading into, but it'd sure beat doing it manually.

            regards, tom lane