Thread: What pg_restore does to a non-empty target database

What pg_restore does to a non-empty target database

From
"Ken Winter"
Date:
I need to understand, in as much detail as possible, the results that will
occur when pg_restore restores from an archive file into a target database
that already contains some database objects.  I can't find any reference
that spells this out.  (The PG manual isn't specific enough.)

Instead of just asking questions, I decided to make my best guess about the
answers (below), and ask you to confirm, refute, and correct my guesses
until this becomes a complete and accurate statement.  If I have left out
any conditions that would affect the results, please add them into the
account.  A definitive version of this story might be worth posting in some
more permanent and visible place than an e-mail thread.

In case it matters, I'm currently working with PostgreSQL 8.0.  I don't know
if the truth I'm seeking here is version-dependent.  Also, I'm assuming the
archive is in compressed format.  I don't know how different the story would
be if the archive were in a different format.

~ TIA
~ Ken



Given a pg_restore command (possibly filtered and reordered by a ToC file),
where:
 * A is the source archive file (as filtered and reordered by the ToC file,
   if any)
 * T is the target database
 * O is a database object (table, function, etc) that exists in A
   and/or in C

The following are the changes that the pg_restore will produce in T.

If object O exists in both A and T:
      If the command says "--clean":
        T's version of O is dropped
          A's version of O is created
    Else:
        T's version of O is left unchanged
If object O exists in T but not in A:
    T's version of O is left unchanged
If object O exists in A but not in T:
    A's version is created

Suppose in addition that O is a data object (a table or sequence) that is
defined by the database schema and contains data in both A and T.

If the command says "--data-only":
    T's schema definition of O is left unchanged
    T's O data are deleted
    A's O data are inserted
If the command says "--schema-only":
    T's schema definition of O is dropped
    T's O data are deleted (as a side-product of the drop)
    A's schema definition of O is created
    No O data are inserted
If the command says "--data-only" and "--schema-only":
    T's schema definition of O is left unchanged
    T's O data are left unchanged
    In other words, nothing changes
If the command says neither "--data-only" nor "--schema-only":
    T's schema definition of O is dropped
    T's O data are deleted (as a side-product of the drop)
    A's schema definition of O is created
    A's O data are inserted
    In other words, A's version of O entirely replaces T's version

Suppose in addition that the command says "--data-only", it doesn't say
"--exit-on-error", and T's schema definition of O is different from A's.

If T's schema includes a column O.C that does not exist in A's schema:
    A's O data are inserted, and O.C is Null in all rows
If A's schema includes a column O.C that does not exist in T's schema:
    A's O data are inserted, but A's values of O.C are lost
If T's schema includes a constraint K that does not exist in A's schema:
    A's O data are inserted, except for those that violate K
If A's schema includes a constraint K that does not exist in T's schema:
    A's O data are all inserted




Re: What pg_restore does to a non-empty target database

From
Tom Lane
Date:
"Ken Winter" <ken@sunward.org> writes:
> I need to understand, in as much detail as possible, the results that will
> occur when pg_restore restores from an archive file into a target database
> that already contains some database objects.  I can't find any reference
> that spells this out.  (The PG manual isn't specific enough.)

It's not nearly as clean as you are hoping, unless you use --clean.

When you don't, the restore will just try to issue the creation
commands.  Obviously the initial CREATE for a conflicting object will
fail due to a name collision, but in some cases the script involves a
CREATE followed by various ALTERs, which might or might not succeed
against the object that was there before.  And don't forget that any
data to be inserted into a table will be appended to what was there
before, assuming that the existing table is close enough to the schema
of the source table for this to succeed.

You could run the restore in exit-on-error mode, which would ensure
no damage to existing objects, but then any new objects following the
first error wouldn't get loaded.

We don't (and won't) specify either the order in which pg_dump dumps
unrelated objects or the exact breakdown of CREATE/ALTER operations,
so the details of what would happen are very subject to change across
releases.

Bottom line: best use --clean if you want at-all-predictable behavior
for this scenario.

> If the command says "--data-only":
>     T's schema definition of O is left unchanged
>     T's O data are deleted
>     A's O data are inserted

In no case (other than --clean) will the restore attempt to remove any
existing data.  None of the other fancy stuff you have invented out of
thin air will happen, either.

One control you have over what happens in a data-only restore is to dump
the data as COPY (one command per table) or as INSERTs (one command per
row).  If you use COPY, and there's any error in the incoming data for a
table, then none of it gets inserted; while with INSERTs each row
succeeds or fails independently.  I can see uses for each behavior.

            regards, tom lane

Re: What pg_restore does to a non-empty target database

From
"Ken Winter"
Date:
Based on Tom Lane's response, here is version 2 of my attempt to document
what pg_restore does to a target database that already contains objects.

Version 2 has been limited to the case where pg_dump was run with the
--column-inserts option and pg_restore is run with the --clean option.
Also, when there is a possible difference between what pg_restore "tries" to
do (i.e. the SQL actions it invokes) and what actually happens, Version 2
indicates the latter on a line labeled "Exception:".

Version 2 depends on the answer to the following question:  Is it true that
the sequence of SQL commands that is extracted into a file by:

    pg_restore --file=foo.sql [other arguments] foo.bak

is exactly the sequence of SQL commands that will be executed in database
foo by the command:

    pg_restore --dbname=foo [other arguments] foo.bak

(assuming the same [other arguments] in both cases)?

If so, having looked at the extracted SQL from several examples, I'm
thinking it's a safe bet that no pg_restore DROP, DELETE, or CREATE action
will ever fail (except when a DROP's target object does not exist), because
pg_restore sequences the "cleaning" activities so they get rid of dependent
objects first.  Right?

If so, that would seem to guarantee that every action that pg_restore tries
to do in the narrative below is guaranteed to succeed, except INSERTs, which
can fail only as described below.

If not, I have another version of the story below that is a lot more
qualified and complicated.

As before, I solicit your confirmations, corrections, and additions of this
document, hoping to get it to the point where my project team (and anyone
else who wants it) can use it with confidence.

~ Thanks
~ Ken

---------------------------

WHAT PG_RESTORE DOES
Version 2

Given a pg_restore command, where:
* A is the source archive file (as filtered and reordered by a ToC file,
  if any) produced by pg_dump with "--format=t" and "--column-inserts"
  options.
* T is the target database.
* O is a database object (table, function, etc) that exists in A
  and/or in C.
* The pg_restore command has the --clean option, and it does not have the
  --table= or --trigger= or "--exit-on-error or --disable-triggers options.

The following are the changes that the pg_restore command will produce in
database T.

EFFECTS ON SCHEMA OBJECTS

If object O exists in both A and T, pg_restore:
    * Drops T's version of O.
         * Creates A's version of O.
If object O exists in T but not in A, pg_restore:
    * Leaves T's version of O unchanged.
If object O exists in A but not in T, pg_restore:
      * Creates A's version of O.

EFFECTS ON TABLES AND THEIR DATA

Suppose in addition that:
* Database object O is a base table.
* O contains data in both archive A and target database T.

If the command says "--data-only", pg_restore:
    * Leaves T's schema definition of O unchanged.
    * Tries to delete all of T's O data.
        If this causes FK violations, the result depends row-by-row
on
        the ON DELETE action of the FK constraint.
    * Tries to insert all of A's O data.
        The INSERT of any row that causes constraint violations
        or other fatal errors (see below) fails.
If the command says "--schema-only", pg_restore:
    * Drops T's version of O, which deletes T's O data as a side-effect.
         * Create A's version of O.
    * Does not try to insert any of A's O data, so O ends up empty.
If the command says "--data-only" and "--schema-only", pg_restore:
    * Leaves O and its data unchanged.
If the command says neither "--data-only" nor "--schema-only", pg_restore:
    * Drops T's version of O.
    * Assumes that T's O data were deleted (as a side-product of the
DROP)
         * Creates A's version of O.
    * Inserts all of A's O data.

EFFECTS OF DIFFERENCES BETWEEN A AND T TABLE SCHEMAS

Suppose in addition that:
* The pg_restore command says "--data-only".
* T's schema definition of table O is different from A's.

If column O.C exists in T's schema but not in A's:
    * O.C is set to Null in all rows that pg_restore inserts.
If column O.C exists in A's schema but not in T's:
    * The O.C values are lost from all rows that pg_restore inserts.
If column O.C exists in both schemas with incompatible types:
    * All inserts of rows from A fail.
If constraint K exists in T's schema but not in A's:
    * Inserts of rows from A that would violate K fail.
If constraint K exists in A's schema but not in T's:
    * K has no effect on the insertion of rows from A.