Thread: Schema version control
Hi all, My company is having trouble managing how we upgrade schema changes across many versions of our software. I imagine thisis a common problem and there're probably some neat solutions that we don't know about. For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002, db0003.... Thenumber represents the schema version which is recorded in the database and updated by the shell scripts. We have a templatethat provides all the functionality we need, we just copy the script and fill in the blanks. The schema upgradescripts are committed to svn along with the software changes, and we have a process when installing the software ata site that runs the scripts on the DB in order before starting up the new version of the software. This has worked really well so far. But we've effectively only had one version of the software in development at any time. We're now in the habit of branching the software to form releases to promote stability when making large changes. The idea is that only really important changes are merged in to the releases. This introduces a bit of a problem when somechange needs to be merged from one release to another. The typical problem is that we might have two versions of thesoftware 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from schema 11023 into 10.0. The issue is that 11023 might depend upon changes introduced in the schema versions before it. Or 11023 mightintroduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. One potential solution is to require that schema changes are never merged in to a release, but of course sometimes businessrequires we do =( I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many differentversions of the database. I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feeling thatthis blog post won't address branches. Cheers! --Royce
On 2/10/2011 3:38 PM, Royce Ausburn wrote: > Hi all, > > My company is having trouble managing how we upgrade schema changes across many versions of our software. I imagine thisis a common problem and there're probably some neat solutions that we don't know about. > > For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002, db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts. Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks. The schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. > > This has worked really well so far. But we've effectively only had one version of the software in development at anytime. We're now in the habit of branching the software to form releases to promote stability when making large changes. The idea is that only really important changes are merged in to the releases. This introduces a bit of a problemwhen some change needs to be merged from one release to another. The typical problem is that we might have two versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from schema11023 in to 10.0. The issue is that 11023 might depend upon changes introduced in the schema versions before it. Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. > > One potential solution is to require that schema changes are never merged in to a release, but of course sometimes businessrequires we do =( > > I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many differentversions of the database. > > I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feeling thatthis blog post won't address branches. > > Cheers! > > --Royce > > So, 10.0 at 10057. 11.0 at 11023. then 10.1 needs some fixes so db is bumped to 10058. Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to 11023. Humm... maybe you need smarter upgrade scripts? Would having logic in the script help? Something like: if not fieldExists('xyz) then alter table ... add xyz ... Or, maybe your schema numbering system is to broad? Maybe each table could have a version number? Or some kinda flags like: create table dbver(key text); then an update would be named: "add xyz to bob". then the update code: q = select key from dbver where key = 'add xyz to bob'; if q.eof then alter table bob add xyz -Andy
On 2/10/2011 4:14 PM, Andy Colson wrote: > On 2/10/2011 3:38 PM, Royce Ausburn wrote: >> Hi all, >> >> My company is having trouble managing how we upgrade schema changes >> across many versions of our software. I imagine this is a common >> problem and there're probably some neat solutions that we don't know >> about. >> >> For the last 10 years we have been writing bash shell scripts >> essentially numbered in order db0001, db0002, db0003.... The number >> represents the schema version which is recorded in the database and >> updated by the shell scripts. We have a template that provides all the >> functionality we need, we just copy the script and fill in the blanks. >> The schema upgrade scripts are committed to svn along with the >> software changes, and we have a process when installing the software >> at a site that runs the scripts on the DB in order before starting up >> the new version of the software. >> >> This has worked really well so far. But we've effectively only had one >> version of the software in development at any time. We're now in the >> habit of branching the software to form releases to promote stability >> when making large changes. The idea is that only really important >> changes are merged in to the releases. This introduces a bit of a >> problem when some change needs to be merged from one release to >> another. The typical problem is that we might have two versions of the >> software 10.0 at schema version 10057 and 11.0 at 11023 and we need to >> merge an important bug fix from schema 11023 in to 10.0. The issue is >> that 11023 might depend upon changes introduced in the schema versions >> before it. Or 11023 might introduce changes that cause later scripts >> to break (11000 - 11023) when upgrading from 10.0 to 11.0. >> >> One potential solution is to require that schema changes are never >> merged in to a release, but of course sometimes business requires we >> do =( >> >> I'm really interested to hear how you guys manage schema upgrades in >> the face of branches and upgrading from many different versions of the >> database. >> >> I've been reading >> http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ >> but I have a feeling that this blog post won't address branches. >> >> Cheers! >> >> --Royce >> >> > > So, 10.0 at 10057. > 11.0 at 11023. > > then 10.1 needs some fixes so db is bumped to 10058. > > Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to > 11023. > > Humm... maybe you need smarter upgrade scripts? Would having logic in > the script help? Something like: > > if not fieldExists('xyz) then alter table ... add xyz ... > > > > Or, maybe your schema numbering system is to broad? Maybe each table > could have a version number? > > > Or some kinda flags like: > create table dbver(key text); > > then an update would be named: "add xyz to bob". > > then the update code: > > q = select key from dbver where key = 'add xyz to bob'; > if q.eof then > alter table bob add xyz > > > -Andy > D'oh! a bug in my update script: q = select key from dbver where key = 'add xyz to bob'; if q.eof then alter table bob add xyz insert into dbver('add xyz to bob'); How embarrassing :-) -Andy
We have this kickass solution we built at work called dbsteward that just takes care of all of this for us, automatically. You just give it the new version and the old version and it generates update statements to feed into PG. The reason I'm bringing this up is that we're working to release dbsteward as open source for PGCon. So, if you can wait a bit, you can jump on that train. In response to Andy Colson <andy@squeakycode.net>: > On 2/10/2011 3:38 PM, Royce Ausburn wrote: > > Hi all, > > > > My company is having trouble managing how we upgrade schema changes across many versions of our software. I imaginethis is a common problem and there're probably some neat solutions that we don't know about. > > > > For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002, db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts. Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks. The schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. > > > > This has worked really well so far. But we've effectively only had one version of the software in development at anytime. We're now in the habit of branching the software to form releases to promote stability when making large changes. The idea is that only really important changes are merged in to the releases. This introduces a bit of a problemwhen some change needs to be merged from one release to another. The typical problem is that we might have two versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from schema11023 in to 10.0. The issue is that 11023 might depend upon changes introduced in the schema versions before it. Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. > > > > One potential solution is to require that schema changes are never merged in to a release, but of course sometimes businessrequires we do =( > > > > I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many differentversions of the database. > > > > I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feelingthat this blog post won't address branches. > > > > Cheers! > > > > --Royce > > > > > > So, 10.0 at 10057. > 11.0 at 11023. > > then 10.1 needs some fixes so db is bumped to 10058. > > Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to > 11023. > > Humm... maybe you need smarter upgrade scripts? Would having logic in > the script help? Something like: > > if not fieldExists('xyz) then alter table ... add xyz ... > > > > Or, maybe your schema numbering system is to broad? Maybe each table > could have a version number? > > > Or some kinda flags like: > create table dbver(key text); > > then an update would be named: "add xyz to bob". > > then the update code: > > q = select key from dbver where key = 'add xyz to bob'; > if q.eof then > alter table bob add xyz > > > -Andy > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On 2/10/2011 4:18 PM, Bill Moran wrote: > > We have this kickass solution we built at work called dbsteward that > just takes care of all of this for us, automatically. You just give > it the new version and the old version and it generates update statements > to feed into PG. > > The reason I'm bringing this up is that we're working to release > dbsteward as open source for PGCon. So, if you can wait a bit, you > can jump on that train. > > In response to Andy Colson<andy@squeakycode.net>: > But... that assumes all updates are DDL only? What if I have a lookup table, and want to add some new entries into it? -Andy
Top-posting is frowned upon by some (not me), but since Bill started it... I for one will be waiting to see your dbsteward. How does it compare functionally or stylistically with Ruby's migration tools (which I found to be pretty cool and frustrating all in one go). On 02/10/2011 03:18 PM, Bill Moran wrote: > > We have this kickass solution we built at work called dbsteward that > just takes care of all of this for us, automatically. You just give > it the new version and the old version and it generates update statements > to feed into PG. > > The reason I'm bringing this up is that we're working to release > dbsteward as open source for PGCon. So, if you can wait a bit, you > can jump on that train. > > In response to Andy Colson <andy@squeakycode.net>: > >> On 2/10/2011 3:38 PM, Royce Ausburn wrote: >>> Hi all, >>> >>> My company is having trouble managing how we upgrade schema changes across many versions of our software. I imaginethis is a common problem and there're probably some neat solutions that we don't know about. >>> >>> For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002, db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts. Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks. The schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. >>> >>> This has worked really well so far. But we've effectively only had one version of the software in development at anytime. We're now in the habit of branching the software to form releases to promote stability when making large changes. The idea is that only really important changes are merged in to the releases. This introduces a bit of a problemwhen some change needs to be merged from one release to another. The typical problem is that we might have two versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from schema11023 in to 10.0. The issue is that 11023 might depend upon changes introduced in the schema versions before it. Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. >>> >>> One potential solution is to require that schema changes are never merged in to a release, but of course sometimes businessrequires we do =( >>> >>> I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many differentversions of the database. >>> >>> I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feelingthat this blog post won't address branches. >>> >>> Cheers! >>> >>> --Royce >>> >>> >> >> So, 10.0 at 10057. >> 11.0 at 11023. >> >> then 10.1 needs some fixes so db is bumped to 10058. >> >> Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to >> 11023. >> >> Humm... maybe you need smarter upgrade scripts? Would having logic in >> the script help? Something like: >> >> if not fieldExists('xyz) then alter table ... add xyz ... >> >> >> >> Or, maybe your schema numbering system is to broad? Maybe each table >> could have a version number? >> >> >> Or some kinda flags like: >> create table dbver(key text); >> >> then an update would be named: "add xyz to bob". >> >> then the update code: >> >> q = select key from dbver where key = 'add xyz to bob'; >> if q.eof then >> alter table bob add xyz >> >> >> -Andy >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > >
In response to Andy Colson <andy@squeakycode.net>: > On 2/10/2011 4:18 PM, Bill Moran wrote: > > > > We have this kickass solution we built at work called dbsteward that > > just takes care of all of this for us, automatically. You just give > > it the new version and the old version and it generates update statements > > to feed into PG. > > > > The reason I'm bringing this up is that we're working to release > > dbsteward as open source for PGCon. So, if you can wait a bit, you > > can jump on that train. > > > > In response to Andy Colson<andy@squeakycode.net>: > > > > But... that assumes all updates are DDL only? What if I have a lookup > table, and want to add some new entries into it? It has provisions for maintaining static data as well. We have a bunch of lookup tables (too many, in my opinion) and it does an excellent job of maintaining them. They just need to have a primary key, but that's not usually a problem with lookup tables. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Royce Ausburn wrote on 10.02.2011 22:38: > I'm really interested to hear how you guys manage schema upgrades in > the face of branches and upgrading from many different versions of > the database. We are quite happy with Liquibase. You can simply run it against a database and tell it to migrate it to "Version x.y" As it keeps track of all changes applied it automatically knows what to do. I can handle static data as well as stored procedure and any custom SQL. Regards Thomas
In response to Rob Sargent <robjsargent@gmail.com>: > Top-posting is frowned upon by some (not me), but since Bill started it... Oops ... the weird thing is that I'm usually really anal about not top- posting ... > I for one will be waiting to see your dbsteward. How does it compare > functionally or stylistically with Ruby's migration tools (which I found > to be pretty cool and frustrating all in one go). I'm not familiar with Ruby's migration tools, so I can't say much. The overview: You store your schema and data as XML (this is easy to migrate to, because it includes a tool that makes the XML from a live database) Keep your XML schema files in some RCS. When it's time for a new deployment, you run the dbsteward tool against the schema XML and it turns it into DDL and DML. When it's time for an upgrade, you run the dbsteward tool against two schema XML files, and it calculates what has changed and generates the appropriate DDL and DML to upgrade. So ... you know, however that compares with the Ruby stuff is how it does. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Bill Moran wrote on 10.02.2011 23:59: > The overview: > You store your schema and data as XML (this is easy to migrate to, because > it includes a tool that makes the XML from a live database) > Keep your XML schema files in some RCS. > When it's time for a new deployment, you run the dbsteward tool against > the schema XML and it turns it into DDL and DML. > When it's time for an upgrade, you run the dbsteward tool against two > schema XML files, and it calculates what has changed and generates the > appropriate DDL and DML to upgrade. This very much sounds like Liquibase. Do you happen to know any differences? Regards Thomas
On 02/10/2011 03:59 PM, Bill Moran wrote: > In response to Rob Sargent <robjsargent@gmail.com>: > >> Top-posting is frowned upon by some (not me), but since Bill started it... > > Oops ... the weird thing is that I'm usually really anal about not top- > posting ... > >> I for one will be waiting to see your dbsteward. How does it compare >> functionally or stylistically with Ruby's migration tools (which I found >> to be pretty cool and frustrating all in one go). > > I'm not familiar with Ruby's migration tools, so I can't say much. > > The overview: > You store your schema and data as XML (this is easy to migrate to, because > it includes a tool that makes the XML from a live database) > Keep your XML schema files in some RCS. > When it's time for a new deployment, you run the dbsteward tool against > the schema XML and it turns it into DDL and DML. > When it's time for an upgrade, you run the dbsteward tool against two > schema XML files, and it calculates what has changed and generates the > appropriate DDL and DML to upgrade. > > So ... you know, however that compares with the Ruby stuff is how it > does. > Now at the bottom :) It's been a couple years since I played with Ruby ActiveRecord but it's (of course) radically than what you describe. The ddl is in the ruby code and naturally the code is in RCS. So a revision is a new instance of ActiveRecord (iirc) which does the change(s) (create table ttt, alter table vvv etc). Maybe skip a rev. Rollback to a rev is definitely there because one writes the undo for each new revision. This include manipulating the data of course, so there are limitations. I personally am leary of the 'make the prod match the dev db' approach. Who knows what extras lurk in the depths. I think one should be able to make the dev db from scratch and write the necessary scripts to change to (and from if possible) each revision. Apply to prod when tested.
On Thu, 10 Feb 2011 17:59:30 -0500, Bill Moran <wmoran@potentialtech.com> wrote: > The overview: > You store your schema and data as XML (this is easy to migrate to, > because > it includes a tool that makes the XML from a live database) We're doing a similar thing here except we're a Lisp shop so our schema is defined as a set of "defentities" and we can migrate from one version to another using a corresponding set of "defmaps". > Keep your XML schema files in some RCS. > When it's time for a new deployment, you run the dbsteward tool against > the schema XML and it turns it into DDL and DML. > When it's time for an upgrade, you run the dbsteward tool against two > schema XML files, and it calculates what has changed and generates the > appropriate DDL and DML to upgrade. This sounds pretty cool. Ours doesn't do that yet but that's next on my TODO list. -- Andy Chambers
In response to Thomas Kellerer <spam_eater@gmx.net>: > Bill Moran wrote on 10.02.2011 23:59: > > The overview: > > You store your schema and data as XML (this is easy to migrate to, because > > it includes a tool that makes the XML from a live database) > > Keep your XML schema files in some RCS. > > When it's time for a new deployment, you run the dbsteward tool against > > the schema XML and it turns it into DDL and DML. > > When it's time for an upgrade, you run the dbsteward tool against two > > schema XML files, and it calculates what has changed and generates the > > appropriate DDL and DML to upgrade. > > This very much sounds like Liquibase. Do you happen to know any differences? Hrm ... before we started writing dbsteward, we looked around to see if something already existed and didn't find Liquibase. This is the first I've heard of it. I'm thinking it was some time in 2008, and according to their changelog, Liquibase was around at that time. I wonder how we missed it ... Anyway ... based on nothing more than a quick scan of their quickstart page, here are the differences I see: * Liquibase is dependent on you creating "changesets". I'm sure this works, but we took a different approach with dbsteward. dbsteward expects you to maintain XML files that represent the entire database, then dbsteward does the work of figuring out what changed. Our opinion was that svn already does the work of tracking changes, why reinvent the wheel. * Looks like liquibase requires you to talk to the database to push the changes? dbsteward outputs a DDL/DML file that you can push in whatever way is best. This is important to us because we use Slony, and DDL changes have to be submitted through EXECUTE SCRIPT() * dbsteward has built-in Slony support (i.e. it will make slony configs as well as slony upgrade scripts in addition to DDL/DML) * Does liquibase support UDFs? dbsteward does. * liquibase has a lot more supported platforms at this time. dbsteward only supports PostgreSQL and MSSQL (because that's all that we needed) but I expect that other support will come quickly once we release it. * Does liquibase support things like multi-column indexes and multi- column primary keys? dbsteward does. I don't think I should go on and on, as I could ask a lot of questions about what liquibase does, and I simply don't have the time right now to research it, or ask all those question ;) Anyway ... sorry for the teaser on this, but we're trying to get through all the hoops the company is requiring us to do to release it, and we think we're on track to be ready by PGCon, so there'll be a website up as soon as we can get it. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
In response to Rob Sargent <robjsargent@gmail.com>: > > On 02/10/2011 03:59 PM, Bill Moran wrote: > > In response to Rob Sargent <robjsargent@gmail.com>: > >> I for one will be waiting to see your dbsteward. How does it compare > >> functionally or stylistically with Ruby's migration tools (which I found > >> to be pretty cool and frustrating all in one go). > > > > I'm not familiar with Ruby's migration tools, so I can't say much. > > > > The overview: > > You store your schema and data as XML (this is easy to migrate to, because > > it includes a tool that makes the XML from a live database) > > Keep your XML schema files in some RCS. > > When it's time for a new deployment, you run the dbsteward tool against > > the schema XML and it turns it into DDL and DML. > > When it's time for an upgrade, you run the dbsteward tool against two > > schema XML files, and it calculates what has changed and generates the > > appropriate DDL and DML to upgrade. > > > > So ... you know, however that compares with the Ruby stuff is how it > > does. > > > Now at the bottom :) > > It's been a couple years since I played with Ruby ActiveRecord but it's > (of course) radically than what you describe. The ddl is in the ruby > code and naturally the code is in RCS. So a revision is a new instance > of ActiveRecord (iirc) which does the change(s) (create table ttt, alter > table vvv etc). Maybe skip a rev. Rollback to a rev is definitely > there because one writes the undo for each new revision. This include > manipulating the data of course, so there are limitations. dbsteward can do downgrades ... you just feed it the old schema and the new schema in reverse of how you'd do an upgrade ;) Oh, also, it allows us to do installation-specific overrides. We use this ONLY for DML for lookup lists where some clients have slightly different names for things than others. In theory, it could do DDL overrides as well, but we decided on a policy of not utilizing that because we wanted the schemas to be consistent on all our installs. > I personally am leary of the 'make the prod match the dev db' approach. > Who knows what extras lurk in the depths. I think one should be able to > make the dev db from scratch and write the necessary scripts to change > to (and from if possible) each revision. Apply to prod when tested. dbsteward allows us to do all this. A developer can make a change, rebuild a test database from their change to make sure it works, then test the upgrade process as well, all before even checking the code in. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Bill Moran wrote on 11.02.2011 00:37: > Anyway ... based on nothing more than a quick scan of their quickstart > page, here are the differences I see: > * Liquibase is dependent on you creating "changesets". I'm sure this > works, but we took a different approach with dbsteward. dbsteward > expects you to maintain XML files that represent the entire database, > then dbsteward does the work of figuring out what changed. Our > opinion was that svn already does the work of tracking changes, why > reinvent the wheel. That sounds like a very nice feature. > * Looks like liquibase requires you to talk to the database to push > the changes? dbsteward outputs a DDL/DML file that you can push > in whatever way is best. This is important to us because we use > Slony, and DDL changes have to be submitted through EXECUTE SCRIPT() No, Liquibase can also emit the SQL that it would execute. > * dbsteward has built-in Slony support (i.e. it will make slony configs > as well as slony upgrade scripts in addition to DDL/DML) > * liquibase has a lot more supported platforms at this time. dbsteward > only supports PostgreSQL and MSSQL (because that's all that we needed) > but I expect that other support will come quickly once we release it. > * Does liquibase support things like multi-column indexes and multi- > column primary keys? dbsteward does. Yes without problems (including of course the necessary foreing keys) > Anyway ... sorry for the teaser on this, but we're trying to get through > all the hoops the company is requiring us to do to release it, and we > think we're on track to be ready by PGCon, so there'll be a website up > as soon as we can get it. Thanks for the feedback, I would really like to see it. The approach that you do not record the changes but simply let the software find them seems like a very nifty feature. I wonder how you detect renaming a table or a column? On which programming language is dbstewart based? Regards Thomas
On 02/10/2011 02:38 PM, Royce Ausburn wrote: > Hi all, > > My company is having trouble managing how we upgrade schema changes across many versions of our software. I imagine thisis a common problem and there're probably some neat solutions that we don't know about. > > For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002, db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts. Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks. The schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. > > This has worked really well so far. But we've effectively only had one version of the software in development at anytime. We're now in the habit of branching the software to form releases to promote stability when making large changes. The idea is that only really important changes are merged in to the releases. This introduces a bit of a problemwhen some change needs to be merged from one release to another. The typical problem is that we might have two versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from schema11023 in to 10.0. The issue is that 11023 might depend upon changes introduced in the schema versions before it. Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. > > One potential solution is to require that schema changes are never merged in to a release, but of course sometimes businessrequires we do =( > > I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many differentversions of the database. > > I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feeling thatthis blog post won't address branches. > > Cheers! > > --Royce > > Don't the bash scripts get checked in to .../perforce/cvs/svn/git/...? Aren't they part of the resources of the project(s)?
On 02/10/2011 04:44 PM, Bill Moran wrote: > In response to Rob Sargent <robjsargent@gmail.com>: >> >> On 02/10/2011 03:59 PM, Bill Moran wrote: >>> In response to Rob Sargent <robjsargent@gmail.com>: >>>> I for one will be waiting to see your dbsteward. How does it compare >>>> functionally or stylistically with Ruby's migration tools (which I found >>>> to be pretty cool and frustrating all in one go). >>> >>> I'm not familiar with Ruby's migration tools, so I can't say much. >>> >>> The overview: >>> You store your schema and data as XML (this is easy to migrate to, because >>> it includes a tool that makes the XML from a live database) >>> Keep your XML schema files in some RCS. >>> When it's time for a new deployment, you run the dbsteward tool against >>> the schema XML and it turns it into DDL and DML. >>> When it's time for an upgrade, you run the dbsteward tool against two >>> schema XML files, and it calculates what has changed and generates the >>> appropriate DDL and DML to upgrade. >>> >>> So ... you know, however that compares with the Ruby stuff is how it >>> does. >>> >> Now at the bottom :) >> >> It's been a couple years since I played with Ruby ActiveRecord but it's >> (of course) radically than what you describe. The ddl is in the ruby >> code and naturally the code is in RCS. So a revision is a new instance >> of ActiveRecord (iirc) which does the change(s) (create table ttt, alter >> table vvv etc). Maybe skip a rev. Rollback to a rev is definitely >> there because one writes the undo for each new revision. This include >> manipulating the data of course, so there are limitations. > > dbsteward can do downgrades ... you just feed it the old schema and > the new schema in reverse of how you'd do an upgrade ;) > > Oh, also, it allows us to do installation-specific overrides. We use > this ONLY for DML for lookup lists where some clients have slightly > different names for things than others. In theory, it could do DDL > overrides as well, but we decided on a policy of not utilizing that > because we wanted the schemas to be consistent on all our installs. > >> I personally am leary of the 'make the prod match the dev db' approach. >> Who knows what extras lurk in the depths. I think one should be able to >> make the dev db from scratch and write the necessary scripts to change >> to (and from if possible) each revision. Apply to prod when tested. > > dbsteward allows us to do all this. A developer can make a change, > rebuild a test database from their change to make sure it works, then > test the upgrade process as well, all before even checking the code in. > Good work. Will look forward to it.
On 11/02/2011, at 9:59 AM, Thomas Kellerer wrote: > Royce Ausburn wrote on 10.02.2011 22:38: >> I'm really interested to hear how you guys manage schema upgrades in >> the face of branches and upgrading from many different versions of >> the database. > > We are quite happy with Liquibase. You can simply run it against a database and tell it to migrate it to "Version x.y" > > As it keeps track of all changes applied it automatically knows what to do. > > I can handle static data as well as stored procedure and any custom SQL. Thanks! Liquibase looks pretty neat. We'll have to check it out. --Royce
> > So, 10.0 at 10057. > 11.0 at 11023. > > then 10.1 needs some fixes so db is bumped to 10058. > > Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to 11023. > > Humm... maybe you need smarter upgrade scripts? Would having logic in the script help? Something like: > > if not fieldExists('xyz) then alter table ... add xyz ... > > > > Or, maybe your schema numbering system is to broad? Maybe each table could have a version number? > > > Or some kinda flags like: > create table dbver(key text); > > then an update would be named: "add xyz to bob". > > then the update code: > > q = select key from dbver where key = 'add xyz to bob'; > if q.eof then > alter table bob add xyz > This is effectively the approach we've been working with so far, but it isn't great. The issue is that you need to be reallyaware of what changes might or might not have been in the previous databases... This can be hard. There's also the problem of updating data, etc.
> > Don't the bash scripts get checked in to .../perforce/cvs/svn/git/...? > Aren't they part of the resources of the project(s)? Yep - they absolutely are. The issue is that there're multiple branches *potentially* having new scripts committed. Fortunatelyit's rare as the release branches seldom require schema changes, but it's an important enough problem to needa better solution. --Royce
On Thu, Feb 10, 2011 at 6:44 PM, Bill Moran <wmoran@potentialtech.com> wrote:
What about upgrades that can't be derived directly from an inspection of the schema? Some examples:
- Adding a NOT NULL constraint (without adding a DEFAULT). You often want to precede this with filling in any existing NULL values, so the new constraint doesn't fail.
- Updating triggers, functions and their effects. For example, when I have an FTS index with a trigger to update an index column, and I change the underlying trigger, I often do something like "UPDATE table SET column = column", to cause all of the update triggers to fire and recalculate the index columns.
- Creating a new column based on an old one, and removing the old one; eg. add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old columns "i" and "j".
- Updating data from an external source, such as ORM model code; for example, if you have a table representing external files, an update may want to calculate and update the SHA-1 of each file.
- For efficiency, dropping a specific index while making a large update, and then recreating the index.
In my experience, while generating schema updates automatically is handy, it tends to make nontrivial database updates more complicated. These sorts of things happen often and are an integral part of a database update, so I'm just curious how/if you deal with them.
I've used Ruby's migrations, and for my Django databases I use my own migration system which is based in principle off of it: create scripts to migrate the database from version X to X+1 and X-1, and upgrade or downgrade by running the appropriate scripts in sequence.
It's not ideal, since it can't generate a database at a specific version directly; it always has to run through the entire sequence of migrations to the version you want, and the migrations accumulate. However, it can handle whatever arbitrary steps are needed to update a database, and I don't need to test updates from every version to every other version.
dbsteward can do downgrades ... you just feed it the old schema and
the new schema in reverse of how you'd do an upgrade ;)
Oh, also, it allows us to do installation-specific overrides. We use
this ONLY for DML for lookup lists where some clients have slightly
different names for things than others. In theory, it could do DDL
overrides as well, but we decided on a policy of not utilizing that
because we wanted the schemas to be consistent on all our installs.
What about upgrades that can't be derived directly from an inspection of the schema? Some examples:
- Adding a NOT NULL constraint (without adding a DEFAULT). You often want to precede this with filling in any existing NULL values, so the new constraint doesn't fail.
- Updating triggers, functions and their effects. For example, when I have an FTS index with a trigger to update an index column, and I change the underlying trigger, I often do something like "UPDATE table SET column = column", to cause all of the update triggers to fire and recalculate the index columns.
- Creating a new column based on an old one, and removing the old one; eg. add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old columns "i" and "j".
- Updating data from an external source, such as ORM model code; for example, if you have a table representing external files, an update may want to calculate and update the SHA-1 of each file.
- For efficiency, dropping a specific index while making a large update, and then recreating the index.
In my experience, while generating schema updates automatically is handy, it tends to make nontrivial database updates more complicated. These sorts of things happen often and are an integral part of a database update, so I'm just curious how/if you deal with them.
I've used Ruby's migrations, and for my Django databases I use my own migration system which is based in principle off of it: create scripts to migrate the database from version X to X+1 and X-1, and upgrade or downgrade by running the appropriate scripts in sequence.
It's not ideal, since it can't generate a database at a specific version directly; it always has to run through the entire sequence of migrations to the version you want, and the migrations accumulate. However, it can handle whatever arbitrary steps are needed to update a database, and I don't need to test updates from every version to every other version.
--
Glenn Maynard
In response to Glenn Maynard <glenn@zewt.org>: > On Thu, Feb 10, 2011 at 6:44 PM, Bill Moran <wmoran@potentialtech.com>wrote: > > > dbsteward can do downgrades ... you just feed it the old schema and > > the new schema in reverse of how you'd do an upgrade ;) > > > > Oh, also, it allows us to do installation-specific overrides. We use > > this ONLY for DML for lookup lists where some clients have slightly > > different names for things than others. In theory, it could do DDL > > overrides as well, but we decided on a policy of not utilizing that > > because we wanted the schemas to be consistent on all our installs. > > > > What about upgrades that can't be derived directly from an inspection of the > schema? Some examples: > > - Adding a NOT NULL constraint (without adding a DEFAULT). You often want > to precede this with filling in any existing NULL values, so the new > constraint doesn't fail. > - Updating triggers, functions and their effects. For example, when I have > an FTS index with a trigger to update an index column, and I change the > underlying trigger, I often do something like "UPDATE table SET column = > column", to cause all of the update triggers to fire and recalculate the > index columns. > - Creating a new column based on an old one, and removing the old one; eg. > add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old > columns "i" and "j". > - Updating data from an external source, such as ORM model code; for > example, if you have a table representing external files, an update may want > to calculate and update the SHA-1 of each file. > - For efficiency, dropping a specific index while making a large update, and > then recreating the index. > > In my experience, while generating schema updates automatically is handy, it > tends to make nontrivial database updates more complicated. These sorts of > things happen often and are an integral part of a database update, so I'm > just curious how/if you deal with them. > > I've used Ruby's migrations, and for my Django databases I use my own > migration system which is based in principle off of it: create scripts to > migrate the database from version X to X+1 and X-1, and upgrade or downgrade > by running the appropriate scripts in sequence. > > It's not ideal, since it can't generate a database at a specific version > directly; it always has to run through the entire sequence of migrations to > the version you want, and the migrations accumulate. However, it can handle > whatever arbitrary steps are needed to update a database, and I don't need > to test updates from every version to every other version. You're correct (based on our experience over the past few years). The big caveat is that 99.9% of the database changes don't fall into those "nontrivial" categories, and dbsteward makes those 99.9% of the changes easy to do, reliable to reproduce, and easy to track. We've added some stuff to handle the other .1% as well, like <beforeUpdateSQL> and <afterUpdateSQL> where you can put an arbitrary SQL strings to be run before or after the remainder of the automatic stuff is done. We probably haven't seen every circumstance that needs a special handling, but we've already struggled through a bunch. All this is part of the reason we're pushing to get this stuff open- sourced. We feel like we've got something that's pretty far along, and we feel that community involvement will help enhance things. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Fri, Feb 11, 2011 at 12:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
My experience is maybe more like 95% than 99.9%, for what it's worth; they're the exception, but not rare.
Here's a fairly common example, in the abstract:
version 1 has two columns, i and j;
version 2 has one column, k, where k = i + j; and
version 3 has one column, x, where x = k * 2
Not only is updating from 1 to 2 tricky ("k = i + j" lies between the adding of "k" but before the removal of i and j; it's neither a "before" nor an "after"), but updating directly from 1 to 3 without first migrating to 2 is extremely hard. I suspect you'd need to snapshot the schema at each version where these are needed to update incrementally, rather than always trying to convert directly to the current version--maybe you already do that.
Anyhow, just some thoughts based on my own experience with database updates--good luck.
The big caveat is that 99.9% of the database changes don't fall into those
"nontrivial" categories, and dbsteward makes those 99.9% of the changes
easy to do, reliable to reproduce, and easy to track.
My experience is maybe more like 95% than 99.9%, for what it's worth; they're the exception, but not rare.
We've added some stuff to handle the other .1% as well, like <beforeUpdateSQL>
and <afterUpdateSQL> where you can put an arbitrary SQL strings to be run
before or after the remainder of the automatic stuff is done. We probably
haven't seen every circumstance that needs a special handling, but we've
already struggled through a bunch.
Here's a fairly common example, in the abstract:
version 1 has two columns, i and j;
version 2 has one column, k, where k = i + j; and
version 3 has one column, x, where x = k * 2
Not only is updating from 1 to 2 tricky ("k = i + j" lies between the adding of "k" but before the removal of i and j; it's neither a "before" nor an "after"), but updating directly from 1 to 3 without first migrating to 2 is extremely hard. I suspect you'd need to snapshot the schema at each version where these are needed to update incrementally, rather than always trying to convert directly to the current version--maybe you already do that.
Anyhow, just some thoughts based on my own experience with database updates--good luck.
--
Glenn Maynard
On 10 Feb 2011, at 23:59, Bill Moran wrote: > The overview: > You store your schema and data as XML (this is easy to migrate to, because > it includes a tool that makes the XML from a live database) > Keep your XML schema files in some RCS. That reminds me of something I've been wondering about - How well do modern RCSs deal with structured data formats (likeXML)? It would appear that most of them still use diff, which is line-based with limited context tracking, to determinechange-sets. Is that combination guaranteed to result in valid XML if you merge revisions that are far enough apart? Or are there RCSsaround that know about XML format (and possibly other structured formats) and handle it differently? I've heavily used RCSs (mostly Subversion) with, for example HTML, and merge conflicts haven't been unusual. It doesn't helpSubversion's diff is a bit simplistic about white-space, I'm sure some of those conflicts were quite unnecessary. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d54eac711731788013809!
Hi, Where can we donwload dbsteward? Best Regards, On Fri, Feb 11, 2011 at 5:16 AM, Bill Moran <wmoran@potentialtech.com> wrote: > In response to Glenn Maynard <glenn@zewt.org>: > >> On Thu, Feb 10, 2011 at 6:44 PM, Bill Moran <wmoran@potentialtech.com>wrote: >> >> > dbsteward can do downgrades ... you just feed it the old schema and >> > the new schema in reverse of how you'd do an upgrade ;) >> > >> > Oh, also, it allows us to do installation-specific overrides. We use >> > this ONLY for DML for lookup lists where some clients have slightly >> > different names for things than others. In theory, it could do DDL >> > overrides as well, but we decided on a policy of not utilizing that >> > because we wanted the schemas to be consistent on all our installs. >> > >> >> What about upgrades that can't be derived directly from an inspection of the >> schema? Some examples: >> >> - Adding a NOT NULL constraint (without adding a DEFAULT). You often want >> to precede this with filling in any existing NULL values, so the new >> constraint doesn't fail. >> - Updating triggers, functions and their effects. For example, when I have >> an FTS index with a trigger to update an index column, and I change the >> underlying trigger, I often do something like "UPDATE table SET column = >> column", to cause all of the update triggers to fire and recalculate the >> index columns. >> - Creating a new column based on an old one, and removing the old one; eg. >> add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old >> columns "i" and "j". >> - Updating data from an external source, such as ORM model code; for >> example, if you have a table representing external files, an update may want >> to calculate and update the SHA-1 of each file. >> - For efficiency, dropping a specific index while making a large update, and >> then recreating the index. >> >> In my experience, while generating schema updates automatically is handy, it >> tends to make nontrivial database updates more complicated. These sorts of >> things happen often and are an integral part of a database update, so I'm >> just curious how/if you deal with them. >> >> I've used Ruby's migrations, and for my Django databases I use my own >> migration system which is based in principle off of it: create scripts to >> migrate the database from version X to X+1 and X-1, and upgrade or downgrade >> by running the appropriate scripts in sequence. >> >> It's not ideal, since it can't generate a database at a specific version >> directly; it always has to run through the entire sequence of migrations to >> the version you want, and the migrations accumulate. However, it can handle >> whatever arbitrary steps are needed to update a database, and I don't need >> to test updates from every version to every other version. > > You're correct (based on our experience over the past few years). > > The big caveat is that 99.9% of the database changes don't fall into those > "nontrivial" categories, and dbsteward makes those 99.9% of the changes > easy to do, reliable to reproduce, and easy to track. > > We've added some stuff to handle the other .1% as well, like <beforeUpdateSQL> > and <afterUpdateSQL> where you can put an arbitrary SQL strings to be run > before or after the remainder of the automatic stuff is done. We probably > haven't seen every circumstance that needs a special handling, but we've > already struggled through a bunch. > > All this is part of the reason we're pushing to get this stuff open- > sourced. We feel like we've got something that's pretty far along, and > we feel that community involvement will help enhance things. > > -- > Bill Moran > http://www.potentialtech.com > http://people.collaborativefusion.com/~wmoran/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
In response to Andre Lopes <lopes80andre@gmail.com>: > Hi, > > Where can we donwload dbsteward? You can't yet. We're still in the process of getting everything lined up to release it. We've registerd dbsteward.org, so when we have things ready, that'll be where you can go. Expect it to come live in the next few months. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Glenn Maynard writes: > - Adding a NOT NULL constraint (without adding a DEFAULT). You often want > to precede this with filling in any existing NULL values, so the new > constraint doesn't fail. > - Updating triggers, functions and their effects. For example, when I have > an FTS index with a trigger to update an index column, and I change the > underlying trigger, I often do something like "UPDATE table SET column = > column", to cause all of the update triggers to fire and recalculate the > index columns. > - Creating a new column based on an old one, and removing the old one; eg. > add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old > columns "i" and "j". > - Updating data from an external source, such as ORM model code; for > example, if you have a table representing external files, an update may want > to calculate and update the SHA-1 of each file. > - For efficiency, dropping a specific index while making a large update, and > then recreating the index. > > In my experience, while generating schema updates automatically is handy, it > tends to make nontrivial database updates more complicated. These sorts of > things happen often and are an integral part of a database update, so I'm > just curious how/if you deal with them. > > I've used Ruby's migrations, and for my Django databases I use my own > migration system which is based in principle off of it: create scripts to > migrate the database from version X to X+1 and X-1, and upgrade or downgrade > by running the appropriate scripts in sequence. > > It's not ideal, since it can't generate a database at a specific version > directly; it always has to run through the entire sequence of migrations to > the version you want, and the migrations accumulate. However, it can handle > whatever arbitrary steps are needed to update a database, and I don't need > to test updates from every version to every other version. I'm with Glenn on this point. I have found updating a db version to be far more complex (for reasons he illustrates and which I find more common than not) than any automation tool can handle. And I wonder if the time spent developing such a tool (or writing changesets, xml, etc. for a given tool) actually saves development time. FWIW, this is what I do: 1. I have a table in my database, meta, that contains exactly one row, and holds configuration information. A minimal version of this table: CREATE TABLE meta ( id integer DEFAULT 1 NOT NULL CHECK (id = 1), major integer NOT NULL, minor integer NOT NULL, patch integer NOT NULL ); The first column is to guarantee I only have one row. The next three provide a tuple for my version, e.g., 1.1.3. I add other columns for software configuration as needed. 2. My application configuration has a DB_VERSION variable which defines the version of the database the software depends on. If there's a mismatch, my programs can't connect to the database, thus guaranteeing db<=>software happiness. 3. Like Glenn, I have a script that can go forwards and backwards, one revision at a time. The effort in maintaining this script is minimal, actually: in my RCS, I have the current schema, which is generated with: `pg_dump -O -s`. Based on diffs between current system and last revision I can manually generate the necessary DDL statements for the script. If I have data as part of my "schema," like the row in my meta table, those are simple enough to examine and add to the script with DML statements. 4. Whenever a revision changes the schema, I tag my my repository, so I can easily check out that version. 5. As for branches. If production is at 1.1.3 and three branches create versions 1.1.4, another 1.1.4 and 1.2.0, and all three want to merge their various changes back into a 1.1.3 production database? I say, without being flip, don't let this happen. Here is where I think no software process can make anyone happy. It's a human process: declare someone the owner of the database schema, let them own the long term development of the schema, and if anyone needs a change, they have to communicate and vet those changes with the db designer. Dan
In response to Alban Hertroys <dalroi@solfertje.student.utwente.nl>: > On 10 Feb 2011, at 23:59, Bill Moran wrote: > > > The overview: > > You store your schema and data as XML (this is easy to migrate to, because > > it includes a tool that makes the XML from a live database) > > Keep your XML schema files in some RCS. > > That reminds me of something I've been wondering about - How well do modern RCSs deal with structured data formats (likeXML)? It would appear that most of them still use diff, which is line-based with limited context tracking, to determinechange-sets. > > Is that combination guaranteed to result in valid XML if you merge revisions that are far enough apart? Or are there RCSsaround that know about XML format (and possibly other structured formats) and handle it differently? > > I've heavily used RCSs (mostly Subversion) with, for example HTML, and merge conflicts haven't been unusual. It doesn'thelp Subversion's diff is a bit simplistic about white-space, I'm sure some of those conflicts were quite unnecessary. We keep a lot of XML in RCS (Subversion) aside from just our DB schemas, and our experience has been somewhat similar to your experience with HTML. Since SVN is a line-based RCS, it can occasionally be painful to resolve conflicts. Probably our best defense against the problem is that we have a commit hook that won't let you commit unless the xml file validates against the DTD first. This forces developers to make sure they haven't mangled the xml. The second best defense (in my opinion) is careful formatting. By keeping each tag on its own line, and indenting consistently, the number of mangled xml files was never much worse than the number of mangled source code files. (even before we had the commit hooks) -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Fri, Feb 11, 2011 at 8:35 AM, Daniel Popowich <danielpopowich@gmail.com> wrote:
What both Rails and my own system does is, rather than having a version, each migration has a name; for example, "20110211-193000-create_initial_db". The table is the set of which migrations have been applied to the database, eg. "20110211-193000-create_initial_db". The migrations are applied in sorted order; hence the prefixed timestamp.
The benefit of this is when you're merging changes from separate branches. Disconnected branches can each add their own migration rules for the part of the database they affect. When the two branches are merged, they fit together naturally. This doesn't work at all when you have ordered "version numbers".
For example, if "20110211-add_column" is added to trunk, and then a branch is merged which has "20110210-add_table", the merged migration fits in implicitly and there are no collisions as with version numbers. (It's applied out of order if you already ran add_column, of course, so you still need to be sure that branches don't collide with each other.) Databases migrate straightforwardly after the merge, regardless of whether they were migrated to trunk or to the branch.
FWIW, this is what I do:
1. I have a table in my database, meta, that contains exactly one
row, and holds configuration information. A minimal version of this
table:
CREATE TABLE meta (
id integer DEFAULT 1 NOT NULL CHECK (id = 1),
major integer NOT NULL,
minor integer NOT NULL,
patch integer NOT NULL
);
What both Rails and my own system does is, rather than having a version, each migration has a name; for example, "20110211-193000-create_initial_db". The table is the set of which migrations have been applied to the database, eg. "20110211-193000-create_initial_db". The migrations are applied in sorted order; hence the prefixed timestamp.
The benefit of this is when you're merging changes from separate branches. Disconnected branches can each add their own migration rules for the part of the database they affect. When the two branches are merged, they fit together naturally. This doesn't work at all when you have ordered "version numbers".
For example, if "20110211-add_column" is added to trunk, and then a branch is merged which has "20110210-add_table", the merged migration fits in implicitly and there are no collisions as with version numbers. (It's applied out of order if you already ran add_column, of course, so you still need to be sure that branches don't collide with each other.) Databases migrate straightforwardly after the merge, regardless of whether they were migrated to trunk or to the branch.
--
Glenn Maynard
> What about upgrades that can't be derived directly from an inspection > of the schema? Some examples: > > - Adding a NOT NULL constraint (without adding a DEFAULT). You often > want to precede this with filling in any existing NULL values, so the > new constraint doesn't fail. This is an important point. The upgrade to the new schema should be defined with a matching transformer that will initialize such a value. > - Updating triggers, functions and their effects. For example, when I > have an FTS index with a trigger to update an index column, and I > change the underlying trigger, I often do something like "UPDATE table > SET column = column", to cause all of the update triggers to fire and > recalculate the index columns. This is the kind of work one should not need to write for every upgrade. The upgrade should be specified as succinctly as possible, and everything else around it should be automatic. > - Creating a new column based on an old one, and removing the old one; > eg. add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop > the old columns "i" and "j". Again, a transformer for computing such new columns is clearly needed. With ChronicDB not only can such a transformer be provided, but also the old version of the application client (like a web-app) can continue to use the old schema while maintaining backwards compatible: its query requests are also transformed. > - For efficiency, dropping a specific index while making a large > update, and then recreating the index. > > In my experience, while generating schema updates automatically is > handy, it tends to make nontrivial database updates more complicated. > These sorts of things happen often and are an integral part of a > database update, so I'm just curious how/if you deal with them. Glenn, I think you have hit the nail on the head here. A more systematic model for dealing with nontrivial database updates is needed. In the case of making a large update that recreates an index, one approach might be to instantiate a new schema that has the new index on it, and ensure service is switched automatically and with data consistency. > I've used Ruby's migrations, and for my Django databases I use my own > migration system which is based in principle off of it: create scripts > to migrate the database from version X to X+1 and X-1, and upgrade or > downgrade by running the appropriate scripts in sequence. > > It's not ideal, since it can't generate a database at a specific > version directly; it always has to run through the entire sequence of > migrations to the version you want, and the migrations accumulate. > However, it can handle whatever arbitrary steps are needed to update a > database, and I don't need to test updates from every version to every > other version. This is something we are planning on adding to ChronicDB. Given a list of database upgrade definitions, the "end result" should be computed so that one could upgrade from any arbitrary version to any other arbitrary version.
On Fri, 2011-02-11 at 08:35 -0500, Daniel Popowich wrote: > think no software process can make anyone happy. It's a human > process: declare someone the owner of the database schema, let them > own the long term development of the schema, and if anyone needs a > change, they have to communicate and vet those changes with the db > designer. This cannot be understated. It is very much a human process, and the database schema owner should be governing the data model.
On Thu, Feb 10, 2011 at 02:58:15PM -0700, Rob Sargent wrote: > On 02/10/2011 02:38 PM, Royce Ausburn wrote: > > My company is having trouble managing how we upgrade schema changes across > > many versions of our software. I imagine this is a common problem and > > there're probably some neat solutions that we don't know about. > > > > For the last 10 years we have been writing bash shell scripts essentially > > numbered in order db0001, db0002, db0003.... The number represents the > > schema version which is recorded in the database and updated by the shell > > scripts. We have a template that provides all the functionality we need, > > we just copy the script and fill in the blanks. The schema upgrade > > scripts are committed to svn along with the software changes, and we have > > a process when installing the software at a site that runs the scripts on > > the DB in order before starting up the new version of the software. > > Don't the bash scripts get checked in to .../perforce/cvs/svn/git/...? > Aren't they part of the resources of the project(s)? I was thinking about this a little more. With the new CREATE EXTENSION functionality in Postgres, we have the infrastructure to run various SQL scripts to migrate between versioned states. Obviously the extension code relates to extensions such as datatypes. I was wondering if this is sufficiently generic that it could be used to migrate between different versions of a schema? This wouldn't be using the EXTENSION functionality, just the ability to run the scripts. This would enable easy upgrades (and downgrades, branching etc.) between different schema versions, providing that the appropriate scripts were installed. If this were optionally also accessible via an SQL syntax such as an analogue of CREATE and/or ALTER EXTENSION, it would provide a reliable and standardised method for installing and upgrading a schema, which would potentially prevent a great deal of wheel-reinvention between software packages. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
In response to Roger Leigh <rleigh@codelibre.net>: > On Thu, Feb 10, 2011 at 02:58:15PM -0700, Rob Sargent wrote: > > On 02/10/2011 02:38 PM, Royce Ausburn wrote: > > > My company is having trouble managing how we upgrade schema changes across > > > many versions of our software. I imagine this is a common problem and > > > there're probably some neat solutions that we don't know about. http://dbsteward.org We've been using this for several years and it's made our schema versioning almost a non-issue. Finally got the go-ahead to release it just this month. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/