Thread: Disabling and enabling constraints and triggers to make pg_restore work
Im trying to do a data-only pg_restore. Im running into a roadblock whenever the restore tries to populate a table with a foreign key before it has populated the primary key table that it refers to: This violates the FK constraint, which aborts the restore. Given the complexity of the database Im trying to restore, the prospect of having to manual order the restore process so that all PK tables are always populated before all FK tables that reference them is daunting. Even if I did that, I dont think that would handle recursive relationships, where a FK refers to its own tables PK. What I need is a way to disable FKs, so I can then restore all the data, and then re-enable the FKs. I first looked for something like ALTER TABLE mytab DISABLE CONSTRAINT mycon and ALTER TABLE mytab ENABLE CONSTRAINT mycon à la Oracle. I finally found a French PostgreSQL forum (http://www.postgresqlfr.org/?q=node/156#comment) that says theres no such thing in PostgreSQL. Someone on that forum suggested update pg_catalog.pg_class set relchecks=0 where relname =mytab to disable and update pg_catalog.pg_class set relchecks=1 where relname =mytab to re-enable. But to write to pg_catalog you apparently need to be a superuser, which alas I'm not. I also have some triggers that I think I may need to be able to disable. pg_restore does have an option to do that, but according to http://www.postgresql.org/docs/8.1/static/app-pgrestore.html this option is available to superusers only. (Perhaps I could write a program that drops all my FKs and triggers, and a second program that recreates them after the data restore is complete. But that seems a rather brutal and scary way to patch up a gap in the PostgreSQL utilities.) Any suggestions? ~ TIA ~ Ken Winter
"Ken Winter" <kwinter@umich.edu> writes: > I�m trying to do a data-only pg_restore. I�m running into a roadblock > whenever the restore tries to populate a table with a foreign key before it > has populated the primary key table that it refers to: This violates the FK > constraint, which aborts the restore. The simplest answer is "don't do that". A full restore (schema+data) will work correctly. A data-only restore cannot hope to guarantee referential integrity, other than by dropping and later recreating all the FK constraints, and there's a fatal flaw in that plan: what if it doesn't know about all the FK constraints that are in the database it's trying to load into? There could be permissions problems too, if you're trying to do it as non-superuser. > Someone on that forum suggested �update > pg_catalog.pg_class set relchecks=0 where relname =�mytab�� to disable and > �update pg_catalog.pg_class set relchecks=1 where relname =�mytab�� to > re-enable. But to write to pg_catalog you apparently need to be a > superuser, which alas I'm not. You should certainly not do anything as risky as messing with relchecks by hand --- there was a case just a couple weeks ago where someone destroyed his database by fat-fingering an update command like this :-(. pg_dump has had a --disable-triggers option for years, and using that is far safer. There's also (as of 8.1) an ALTER TABLE DISABLE TRIGGERS command, which is even safer. However these still require superuser privileges, because you can easily break referential consistency by misusing them. regards, tom lane
Tom et al ~ I understand that this is all a risky business. So maybe someone can tell me a safer way to accomplish what I was trying to accomplish with pg_dump and data-only pg_restore. It's a basic bit of database administration work. I'm trying to establish two (or if necessary more) instances of the same database - a production instance and a development instance - and a change management process for coordinating them. As you can probably guess: 1. The production instance is the one the users are actually using. Its data are The Truth. 2. The development instance is where design changes (to tables, procedures, and all other database objects) are developed, tested, and readied to go into production. Its data are of no value except for testing purposes. 3. The crucial purpose of the change management process is to put into production each new release of the database design. The required outcome is that the new design (from the development instance) be up and running in the production instance, managing the production data (from the production instance), which have been preserved without loss. (Of course, certain changes in the new design - dropping a table, for example - will cause certain data to be lost. That's not the problem I'm wrestling with here.) So, the process I have in mind goes like this: 1. At the start of a release cycle, drop everything from the development instance, and copy the schema (with the production data if you want) from the production instance into the development instance. 2. During the release cycle, the users use the production instance (including modifying the data in it), and the developers do their work (which is modifying the design) in the development instance. The developers can do whatever they want to the data in the development instance. 3. At the end of the release cycle, empty all the data from the development instance, shut down the production instance (or at least write-lock up its data), and copy the production data (data only) into the development instance. Then shut down the production instance to users, drop everything in the production instance, copy everything (data + schema) from development into production, and reopen it to users. Note that step 3 also accomplishes step 1 of the next release cycle, so as soon as step 3 is done, the users can go on using the data and the developers can begin developing the next release. It seemed to me that the PostgreSQL utilities for accomplishing this process would be pg_dump and pg_restore. And I've been successful using them - except for that point in step 3 where I need to do a data-only restore from production into development. Then I run into the FK violations (when a FK table gets restored before its PK table) that led me to start this discussion thread. So, to say it again: What I *have* to accomplish is to separate my production environment from my development environment, and periodically to move the new version of the design from development into production. I've spelled out my whole approach from top to bottom in hopes that someone can suggest something at some level (different process? different utilities? different something else?) that will make this possible with PostgreSQL. There must be thousands of DBAs out there who have solved this problem, and I hope one of you can tell me how to join your happy ranks. ~ TIA ~ Ken -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Sunday, August 06, 2006 11:35 AM To: ken@sunward.org Cc: PostgreSQL pg-general List Subject: Re: [GENERAL] Disabling and enabling constraints and triggers to make pg_restore work "Ken Winter" <kwinter@umich.edu> writes: > I'm trying to do a data-only pg_restore. I'm running into a roadblock > whenever the restore tries to populate a table with a foreign key before it > has populated the primary key table that it refers to: This violates the FK > constraint, which aborts the restore. The simplest answer is "don't do that". A full restore (schema+data) will work correctly. A data-only restore cannot hope to guarantee referential integrity, other than by dropping and later recreating all the FK constraints, and there's a fatal flaw in that plan: what if it doesn't know about all the FK constraints that are in the database it's trying to load into? There could be permissions problems too, if you're trying to do it as non-superuser. > Someone on that forum suggested "update > pg_catalog.pg_class set relchecks=0 where relname ='mytab'" to disable and > "update pg_catalog.pg_class set relchecks=1 where relname ='mytab'" to > re-enable. But to write to pg_catalog you apparently need to be a > superuser, which alas I'm not. You should certainly not do anything as risky as messing with relchecks by hand --- there was a case just a couple weeks ago where someone destroyed his database by fat-fingering an update command like this :-(. pg_dump has had a --disable-triggers option for years, and using that is far safer. There's also (as of 8.1) an ALTER TABLE DISABLE TRIGGERS command, which is even safer. However these still require superuser privileges, because you can easily break referential consistency by misusing them. regards, tom lane
Ken Winter wrote: >Tom et al ~ > >I understand that this is all a risky business. So maybe someone can tell >me a safer way to accomplish what I was trying to accomplish with pg_dump >and data-only pg_restore. > >It's a basic bit of database administration work. I'm trying to establish >two (or if necessary more) instances of the same database - a production >instance and a development instance - and a change management process for >coordinating them. As you can probably guess: > >1. The production instance is the one the users are actually using. Its >data are The Truth. > >2. The development instance is where design changes (to tables, procedures, >and all other database objects) are developed, tested, and readied to go >into production. Its data are of no value except for testing purposes. > >3. The crucial purpose of the change management process is to put into >production each new release of the database design. > Ken, The approach that has worked very well for me over the years is to use a data dictionary outside of the system that is used to build the tables. A large part of the development process is making changes to this dictionary. You "post" a unit of work to the dev system by running a diff builder that updates the dev system. When the work is deemed acceptable, you post the same alterations to the production server, or you copy the spec in total from dev to production and run a build there. A good system lets you provide values for system tables as part of the spec. This approach is far more general-purpose than what you are attempting. It may be too general-purpose, and your backup/restore system may be just fine, but the backup/restore system will not scale well.
Attachment
Ken Winter wrote: >It's a basic bit of database administration work. I'm trying to establish >two (or if necessary more) instances of the same database - a production >instance and a development instance - and a change management process for >coordinating them. As you can probably guess: > > You need three: DEV, QAT, and PRD. Changes must be moved in a controlled fashion from DEV through QAT before committing to PRD. Your developers generally do not touch QAT or PRD. >1. The production instance is the one the users are actually using. Its >data are The Truth. > >2. The development instance is where design changes (to tables, procedures, >and all other database objects) are developed, tested, and readied to go >into production. Its data are of no value except for testing purposes. > > Wrong. TESTING happens in QAT; DEVELOPMENT happens in DEV. Keep your thinking, and that of your team, straight and disciplined in this regard. I suppose developers do "test" in DEV as they develop, but Testing with a capital T, meaing acceptance-for-production testing, happens separate from the DEV environoment: it is performed not by developers but by the customer -- or a sufficiently close facsimile there of. That way you practice in QAT exactly what you will do to PRD so that you know it works before you do it for real.. >3. The crucial purpose of the change management process is to put into >production each new release of the database design. The required outcome is >that the new design (from the development instance) be up and running in the >production instance, managing the production data (from the production >instance), which have been preserved without loss. (Of course, certain >changes in the new design - dropping a table, for example - will cause >certain data to be lost. That's not the problem I'm wrestling with here.) > >So, the process I have in mind goes like this: > >1. At the start of a release cycle, drop everything from the development >instance, and copy the schema (with the production data if you want) from >the production instance into the development instance. > > This is were you use pg_dump, pg_restore, and psql for the first time -- to create QAT and DEV instances that are identical to PRD. Maybe you have something more sophisticated that provides a means to limit the amount and protect confidentiality of the PRD data by some transformation on the way out to DEV, but QAT should, ideally, be identical to PRD, and access to QAT should be restricted pretty much the same as to PRD. >2. During the release cycle, the users use the production instance >(including modifying the data in it), and the developers do their work >(which is modifying the design) in the development instance. The developers >can do whatever they want to the data in the development instance. > > The last sentence suggests a lack of discipline (which is revealed more fully in step 3). The "whatever they want" has to be formalized. Any changes the developers propose making the the data base, be it DDL or DML, needs to happen via a script or set of scripts, which are fully documented and commented so that the changes are understandable, repeatable, auditable and testable. In the perfect world of my imagination, the scripts would be digitially signed with gnupg by developers for release to QAT, and then signed by the customer for release to PRD. And the scripts would be verified as authentic and unchanged prior to application against QAT and PRD. You might not be able to script your changes to the application, but you certainly will maintain change control using Subversion and tag releases corresponding to the DDL/DML script sets, in fact the scripts will be part of your SVN repository, and similarly follow a gnupg signing policy. Infact, your entire revision should go in a RPM so that the data base changes and application changes are released as a single unit that is signed and tamper resistant and can be applied in a simple and easily repeated manner. >3. At the end of the release cycle, empty all the data from the development >instance, shut down the production instance (or at least write-lock up its >data), and copy the production data (data only) into the development >instance. Then shut down the production instance to users, drop everything >in the production instance, copy everything (data + schema) from development >into production, and reopen it to users. > > This is a very screwy procedure. Don't do it like that. Test the entire release in QAT by an automated application of all the changes -- DDL/DML scripts for the data base, and merge of source code diffs from the SVN repository for the application. Have the customer test QAT until comfortable and then digitally sign the change package. Then apply the change to PRD exactly like you did to QAT.
"Ken Winter" <ken@sunward.org> writes: > ... > So, the process I have in mind goes like this: > ... > 3. At the end of the release cycle, empty all the data from the development > instance, shut down the production instance (or at least write-lock up its > data), and copy the production data (data only) into the development > instance. Then shut down the production instance to users, drop everything > in the production instance, copy everything (data + schema) from development > into production, and reopen it to users. I'm not really sure why you'd expect a tool like pg_dump to be helpful with this sort of scenario. In almost every interesting case, there's going to be something you have to *do* to the data in order to load it into the new schema --- you'll have refactored the tables, changed representations, or whatever. So ISTM that foreign key constraints are the least of your worries; and even more to the point, you almost certainly *don't* want to bypass revalidating them, if you've made any least little change in the data relationships. So my advice would be "drop the FK constraints, load data, re-add FK constraints". The nearby comments about having a more formalized development process sound good to me too, but in any case I think this application goes beyond what pg_dump can be expected to do. regards, tom lane