Thread: The Right Way to manage schemas in SCM systems
I've been trying to figure out a good way to manage schema change control for a while now. Since I have a development background, I really want to find some way to check the schema into a SCM system like CVS (for example). Just using a pg_dump doesn't work very well becase there's no guarantee of consistent ordering. One of the things I find most useful with SCM systems is the ability to quickly spot changes. A re-ordering within a single file makes for a lot of noise in the deltas. So far, the best idea I've come up with is the "file tree dump" approach: get a list of every object in the database and then dump it into a suitably named file. Finally, create a master file which consists of only include lines. There are a couple of problems I can see with this. 1) How to calculate the dependancy graph (or worse, dealing with RI loops) to determine the right order to load things in isn't stupidly obvious. 2) If I just script this in bash, without making any changes to pg_dump, it doesn't happen in a single transaction. 3) No clever solution for relative vs absolute filenames (unless all this goes into a tar file, but that format already exists and doesn't solve my problem). So my question is, does anyone have a better way of managing schemas in SCM systems? And if not, is there a precieved value in the community for a pg_dump --format scm_friendly option? (In which case I'll take this thread over to hackers) Drew
On 11 Aug 2006 08:05:34 -0700, Andrew Hammond <andrew.george.hammond@gmail.com> wrote:
The approach you are taking here is difficult to implement but makes life easy on developers who like to make changes to the database whenever needed. It can be great for rapid development but I have chosen to take a different approach - one that requires a little more discipline up front but is better for controlling your testing and production environments.
For every project, I first create DDL scripts for my database. I have a code builder that does this for me but you could also just create the database and use the pg_dump to get your initial DDL.
Next I save this DDL in CVS. I break each schema into a separate script. If you are worried about the order, do as pg_dump does - create the tables first, then add the constraints like foreign keys after the structure is there.
This is great for the initial setup. To deal with change, I have a manual process in place and use an auditing process to check that everything is done correctly. Here is how it works:
Development:
1. Copy the production database into the development database (we do this at least once a week).
2. Make the change in development - the nice thing here is I use EMS Manager and whenever I make a change it provides the SQL required to make the change.
3. Copy the SQL for the change (from EMS Manager) and put it into a migration script (this will be executed in production when we deploy the application) - order is very important here.
4. Make the change to the initial DDL and check that change into CVS. Our code builder helps out a lot with this when the changes are large.
5. Update my application code and test with new changes in development. Again out code builder does a lot of this for us.
6. Drop the development database, refresh it from production, run the migration script and test the new code - if all goes well it is ready for production.
7. Deploy to production. We never allow developers to directly make changes to production. Only our administrators have that capability.
Audit:
1. Get a copy of production and put it into development.
2. Run the DDL from CVS and put it into an audit database.
3. Run a database diff - we use DB Comparer (same company as EMS Manager).
4. Reconcile differences and put into DDL.
We also will, from time to time, test the application against the DDL rather than from a copy of production.
Currently we are working on ways to automate this mostly manual process. The nice thing is, the tools we use allow us to do the work fairly quickly. I like a little manual work in the whole process though as it keeps the developers better atuned to the database structure and the changes that are being made to it.
If you were looking for a way to just backup the structure in CVS every day, you may consider writing a script (Perl would be an excellent choice for this) that reads the pg_dump and splits it out into separate files for each schema/table.
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
I've been trying to figure out a good way to manage schema change
control for a while now. Since I have a development background, I
really want to find some way to check the schema into a SCM system like
CVS (for example). Just using a pg_dump doesn't work very well becase
there's no guarantee of consistent ordering. One of the things I find
most useful with SCM systems is the ability to quickly spot changes. A
re-ordering within a single file makes for a lot of noise in the
deltas.
So far, the best idea I've come up with is the "file tree dump"
approach: get a list of every object in the database and then dump it
into a suitably named file. Finally, create a master file which
consists of only include lines. There are a couple of problems I can
see with this.
1) How to calculate the dependancy graph (or worse, dealing with RI
loops) to determine the right order to load things in isn't stupidly
obvious.
2) If I just script this in bash, without making any changes to
pg_dump, it doesn't happen in a single transaction.
3) No clever solution for relative vs absolute filenames (unless all
this goes into a tar file, but that format already exists and doesn't
solve my problem).
So my question is, does anyone have a better way of managing schemas in
SCM systems? And if not, is there a precieved value in the community
for a pg_dump --format scm_friendly option? (In which case I'll take
this thread over to hackers)
For every project, I first create DDL scripts for my database. I have a code builder that does this for me but you could also just create the database and use the pg_dump to get your initial DDL.
Next I save this DDL in CVS. I break each schema into a separate script. If you are worried about the order, do as pg_dump does - create the tables first, then add the constraints like foreign keys after the structure is there.
This is great for the initial setup. To deal with change, I have a manual process in place and use an auditing process to check that everything is done correctly. Here is how it works:
Development:
1. Copy the production database into the development database (we do this at least once a week).
2. Make the change in development - the nice thing here is I use EMS Manager and whenever I make a change it provides the SQL required to make the change.
3. Copy the SQL for the change (from EMS Manager) and put it into a migration script (this will be executed in production when we deploy the application) - order is very important here.
4. Make the change to the initial DDL and check that change into CVS. Our code builder helps out a lot with this when the changes are large.
5. Update my application code and test with new changes in development. Again out code builder does a lot of this for us.
6. Drop the development database, refresh it from production, run the migration script and test the new code - if all goes well it is ready for production.
7. Deploy to production. We never allow developers to directly make changes to production. Only our administrators have that capability.
Audit:
1. Get a copy of production and put it into development.
2. Run the DDL from CVS and put it into an audit database.
3. Run a database diff - we use DB Comparer (same company as EMS Manager).
4. Reconcile differences and put into DDL.
We also will, from time to time, test the application against the DDL rather than from a copy of production.
Currently we are working on ways to automate this mostly manual process. The nice thing is, the tools we use allow us to do the work fairly quickly. I like a little manual work in the whole process though as it keeps the developers better atuned to the database structure and the changes that are being made to it.
If you were looking for a way to just backup the structure in CVS every day, you may consider writing a script (Perl would be an excellent choice for this) that reads the pg_dump and splits it out into separate files for each schema/table.
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================