Thread: Disabling and enabling constraints and triggers to make pg_restore work

Disabling and enabling constraints and triggers to make pg_restore work

From
"Ken Winter"
Date:
I’m trying to do a data-only pg_restore.  I’m running into a roadblock
whenever the restore tries to populate a table with a foreign key before it
has populated the primary key table that it refers to: This violates the FK
constraint, which aborts the restore.

Given the complexity of the database I’m trying to restore, the prospect of
having to manual order the restore process so that all PK tables are always
populated before all FK tables that reference them is daunting.  Even if I
did that, I don’t think that would handle recursive relationships, where a
FK refers to its own table’s PK.

What I need is a way to disable FKs, so I can then restore all the data, and
then re-enable the FKs.  I first looked for something like “ALTER TABLE
mytab DISABLE CONSTRAINT mycon” and “ALTER TABLE mytab ENABLE CONSTRAINT
mycon” à la Oracle.  I finally found a French PostgreSQL forum
(http://www.postgresqlfr.org/?q=node/156#comment) that says there’s no such
thing in PostgreSQL.  Someone on that forum suggested “update
pg_catalog.pg_class set relchecks=0 where relname =’mytab’” to disable and
“update pg_catalog.pg_class set relchecks=1 where relname =’mytab’” to
re-enable.  But to write to pg_catalog you apparently need to be a
superuser, which alas I'm not.

I also have some triggers that I think I may need to be able to disable.
pg_restore does have an option to do that, but according to
http://www.postgresql.org/docs/8.1/static/app-pgrestore.html this option is
available to superusers only.

(Perhaps I could write a program that drops all my FKs and triggers, and a
second program that recreates them after the data restore is complete.  But
that seems a rather brutal and scary way to patch up a gap in the PostgreSQL
utilities.)

Any suggestions?

~ TIA
~ Ken Winter



"Ken Winter" <kwinter@umich.edu> writes:
> I�m trying to do a data-only pg_restore.  I�m running into a roadblock
> whenever the restore tries to populate a table with a foreign key before it
> has populated the primary key table that it refers to: This violates the FK
> constraint, which aborts the restore.

The simplest answer is "don't do that".  A full restore (schema+data)
will work correctly.  A data-only restore cannot hope to guarantee
referential integrity, other than by dropping and later recreating all
the FK constraints, and there's a fatal flaw in that plan: what if it
doesn't know about all the FK constraints that are in the database it's
trying to load into?  There could be permissions problems too, if you're
trying to do it as non-superuser.

> Someone on that forum suggested �update
> pg_catalog.pg_class set relchecks=0 where relname =�mytab�� to disable and
> �update pg_catalog.pg_class set relchecks=1 where relname =�mytab�� to
> re-enable.  But to write to pg_catalog you apparently need to be a
> superuser, which alas I'm not.

You should certainly not do anything as risky as messing with relchecks
by hand --- there was a case just a couple weeks ago where someone
destroyed his database by fat-fingering an update command like this :-(.
pg_dump has had a --disable-triggers option for years, and using that is
far safer.  There's also (as of 8.1) an ALTER TABLE DISABLE TRIGGERS
command, which is even safer.  However these still require superuser
privileges, because you can easily break referential consistency by
misusing them.

            regards, tom lane

Re: Disabling and enabling constraints and triggers to make pg_restore work

From
"Ken Winter"
Date:
Tom et al ~

I understand that this is all a risky business.  So maybe someone can tell
me a safer way to accomplish what I was trying to accomplish with pg_dump
and data-only pg_restore.

It's a basic bit of database administration work.  I'm trying to establish
two (or if necessary more) instances of the same database - a production
instance and a development instance - and a change management process for
coordinating them.  As you can probably guess:

1. The production instance is the one the users are actually using.  Its
data are The Truth.

2. The development instance is where design changes (to tables, procedures,
and all other database objects) are developed, tested, and readied to go
into production.  Its data are of no value except for testing purposes.

3. The crucial purpose of the change management process is to put into
production each new release of the database design.  The required outcome is
that the new design (from the development instance) be up and running in the
production instance, managing the production data (from the production
instance), which have been preserved without loss.  (Of course, certain
changes in the new design - dropping a table, for example - will cause
certain data to be lost.  That's not the problem I'm wrestling with here.)

So, the process I have in mind goes like this:

1. At the start of a release cycle, drop everything from the development
instance, and copy the schema (with the production data if you want) from
the production instance into the development instance.

2. During the release cycle, the users use the production instance
(including modifying the data in it), and the developers do their work
(which is modifying the design) in the development instance.  The developers
can do whatever they want to the data in the development instance.

3. At the end of the release cycle, empty all the data from the development
instance, shut down the production instance (or at least write-lock up its
data), and copy the production data (data only) into the development
instance.  Then shut down the production instance to users, drop everything
in the production instance, copy everything (data + schema) from development
into production, and reopen it to users.

Note that step 3 also accomplishes step 1 of the next release cycle, so as
soon as step 3 is done, the users can go on using the data and the
developers can begin developing the next release.

It seemed to me that the PostgreSQL utilities for accomplishing this process
would be pg_dump and pg_restore.  And I've been successful using them -
except for that point in step 3 where I need to do a data-only restore from
production into development.  Then I run into the FK violations (when a FK
table gets restored before its PK table) that led me to start this
discussion thread.

So, to say it again:  What I *have* to accomplish is to separate my
production environment from my development environment, and periodically to
move the new version of the design from development into production.  I've
spelled out my whole approach from top to bottom in hopes that someone can
suggest something at some level (different process? different utilities?
different something else?) that will make this possible with PostgreSQL.
There must be thousands of DBAs out there who have solved this problem, and
I hope one of you can tell me how to join your happy ranks.

~ TIA
~ Ken



    -----Original Message-----
    From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    Sent: Sunday, August 06, 2006 11:35 AM
    To: ken@sunward.org
    Cc: PostgreSQL pg-general List
    Subject: Re: [GENERAL] Disabling and enabling constraints and triggers
    to make pg_restore work

    "Ken Winter" <kwinter@umich.edu> writes:
    > I'm trying to do a data-only pg_restore.  I'm running into a roadblock
    > whenever the restore tries to populate a table with a foreign key
    before it
    > has populated the primary key table that it refers to: This violates
    the FK
    > constraint, which aborts the restore.

    The simplest answer is "don't do that".  A full restore (schema+data)
    will work correctly.  A data-only restore cannot hope to guarantee
    referential integrity, other than by dropping and later recreating all
    the FK constraints, and there's a fatal flaw in that plan: what if it
    doesn't know about all the FK constraints that are in the database it's
    trying to load into?  There could be permissions problems too, if you're
    trying to do it as non-superuser.

    > Someone on that forum suggested "update
    > pg_catalog.pg_class set relchecks=0 where relname ='mytab'" to disable
    and
    > "update pg_catalog.pg_class set relchecks=1 where relname ='mytab'" to
    > re-enable.  But to write to pg_catalog you apparently need to be a
    > superuser, which alas I'm not.

    You should certainly not do anything as risky as messing with relchecks
    by hand --- there was a case just a couple weeks ago where someone
    destroyed his database by fat-fingering an update command like this :-(.
    pg_dump has had a --disable-triggers option for years, and using that is
    far safer.  There's also (as of 8.1) an ALTER TABLE DISABLE TRIGGERS
    command, which is even safer.  However these still require superuser
    privileges, because you can easily break referential consistency by
    misusing them.

                regards, tom lane



Re: Disabling and enabling constraints and triggers to

From
Kenneth Downs
Date:
Ken Winter wrote:

>Tom et al ~
>
>I understand that this is all a risky business.  So maybe someone can tell
>me a safer way to accomplish what I was trying to accomplish with pg_dump
>and data-only pg_restore.
>
>It's a basic bit of database administration work.  I'm trying to establish
>two (or if necessary more) instances of the same database - a production
>instance and a development instance - and a change management process for
>coordinating them.  As you can probably guess:
>
>1. The production instance is the one the users are actually using.  Its
>data are The Truth.
>
>2. The development instance is where design changes (to tables, procedures,
>and all other database objects) are developed, tested, and readied to go
>into production.  Its data are of no value except for testing purposes.
>
>3. The crucial purpose of the change management process is to put into
>production each new release of the database design.
>
Ken,

The approach that has worked very well for me over the years is to use a
data dictionary outside of the system that is used to build the tables.
A large part of the development process is making changes to this
dictionary.  You "post" a unit of work to the dev system by running a
diff builder that updates the dev system.  When the work is deemed
acceptable, you post the same alterations to the production server, or
you copy the spec in total from dev to production and run a build
there.  A good system lets you provide values for system tables as part
of the spec.

This approach is far more general-purpose than what you are attempting.
It may be too general-purpose, and your backup/restore system may be
just fine, but the backup/restore system will not scale well.




Attachment

Re: Disabling and enabling constraints and triggers to

From
Berend Tober
Date:
Ken Winter wrote:

>It's a basic bit of database administration work.  I'm trying to establish
>two (or if necessary more) instances of the same database - a production
>instance and a development instance - and a change management process for
>coordinating them.  As you can probably guess:
>
>
You need three: DEV, QAT, and PRD. Changes must be moved in a controlled
fashion from DEV through QAT before committing to PRD. Your developers
generally do not touch QAT or PRD.

>1. The production instance is the one the users are actually using.  Its
>data are The Truth.
>
>2. The development instance is where design changes (to tables, procedures,
>and all other database objects) are developed, tested, and readied to go
>into production.  Its data are of no value except for testing purposes.
>
>
Wrong. TESTING happens in QAT; DEVELOPMENT happens in DEV. Keep your
thinking, and that of your team, straight and disciplined in this
regard. I suppose developers do "test" in DEV as they develop, but
Testing with a capital T, meaing acceptance-for-production testing,
happens separate from the DEV environoment: it is performed not by
developers but by the customer -- or a sufficiently close facsimile
there of. That way you practice in QAT exactly what you will do to PRD
so that you know it works before you do it for real..

>3. The crucial purpose of the change management process is to put into
>production each new release of the database design.  The required outcome is
>that the new design (from the development instance) be up and running in the
>production instance, managing the production data (from the production
>instance), which have been preserved without loss.  (Of course, certain
>changes in the new design - dropping a table, for example - will cause
>certain data to be lost.  That's not the problem I'm wrestling with here.)
>
>So, the process I have in mind goes like this:
>
>1. At the start of a release cycle, drop everything from the development
>instance, and copy the schema (with the production data if you want) from
>the production instance into the development instance.
>
>
This is were you use pg_dump, pg_restore, and psql for the first time --
to create QAT and DEV instances that are identical to PRD. Maybe you
have something more sophisticated that provides a means to limit the
amount and protect confidentiality of the PRD data by some
transformation on the way out to DEV, but QAT should, ideally, be
identical to PRD, and access to QAT should be restricted pretty much the
same as to PRD.

>2. During the release cycle, the users use the production instance
>(including modifying the data in it), and the developers do their work
>(which is modifying the design) in the development instance.  The developers
>can do whatever they want to the data in the development instance.
>
>
The last sentence suggests a lack of discipline (which is revealed more
fully in step 3). The "whatever they want" has to be formalized. Any
changes the developers propose making the the data base, be it DDL or
DML, needs to happen via a script or set of scripts, which are fully
documented and commented so that the changes are understandable,
repeatable, auditable and testable. In the perfect world of my
imagination, the scripts would be digitially signed with gnupg by
developers for release to QAT, and then signed by the customer for
release to PRD. And the scripts would be verified as authentic and
unchanged prior to application against QAT and PRD.

You might not be able to script your changes to the application, but you
certainly will maintain change control using Subversion and tag releases
corresponding to the DDL/DML script sets, in fact the scripts will be
part of your SVN repository, and similarly follow a gnupg signing
policy. Infact, your entire revision should go in a RPM so that the data
base changes and application changes are released as a single unit that
is signed and tamper resistant and can be applied in a simple and easily
repeated manner.

>3. At the end of the release cycle, empty all the data from the development
>instance, shut down the production instance (or at least write-lock up its
>data), and copy the production data (data only) into the development
>instance.  Then shut down the production instance to users, drop everything
>in the production instance, copy everything (data + schema) from development
>into production, and reopen it to users.
>
>

This is a very screwy procedure. Don't do it like that.

Test the entire release in QAT by an automated application of all the
changes -- DDL/DML scripts for the data base, and merge of source code
diffs from the SVN repository for the application. Have the customer
test QAT until comfortable and then digitally sign the change package.
Then apply the change to PRD exactly like you did to QAT.



"Ken Winter" <ken@sunward.org> writes:
> ...
> So, the process I have in mind goes like this:
> ...
> 3. At the end of the release cycle, empty all the data from the development
> instance, shut down the production instance (or at least write-lock up its
> data), and copy the production data (data only) into the development
> instance.  Then shut down the production instance to users, drop everything
> in the production instance, copy everything (data + schema) from development
> into production, and reopen it to users.

I'm not really sure why you'd expect a tool like pg_dump to be helpful
with this sort of scenario.  In almost every interesting case, there's
going to be something you have to *do* to the data in order to load it
into the new schema --- you'll have refactored the tables, changed
representations, or whatever.  So ISTM that foreign key constraints
are the least of your worries; and even more to the point, you almost
certainly *don't* want to bypass revalidating them, if you've made any
least little change in the data relationships.  So my advice would
be "drop the FK constraints, load data, re-add FK constraints".

The nearby comments about having a more formalized development process
sound good to me too, but in any case I think this application goes
beyond what pg_dump can be expected to do.

            regards, tom lane