Home > mailing lists

Thread: Schema version control

Schema version control

From

Royce Ausburn

Date:

10 February 2011, 17:38:54

Hi all,

My company is having trouble managing how we upgrade schema changes across many versions of our software. I imagine
thisis a common problem and there're probably some neat solutions that we don't know about.

For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002, db0003....
Thenumber represents the schema version which is recorded in the database and updated by the shell scripts. We have a
templatethat provides all the functionality we need, we just copy the script and fill in the blanks. The schema
upgradescripts are committed to svn along with the software changes, and we have a process when installing the software
ata site that runs the scripts on the DB in order before starting up the new version of the software.

This has worked really well so far. But we've effectively only had one version of the software in development at any
time. We're now in the habit of branching the software to form releases to promote stability when making large changes.
The idea is that only really important changes are merged in to the releases. This introduces a bit of a problem when
somechange needs to be merged from one release to another. The typical problem is that we might have two versions of
thesoftware 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from schema 11023
into 10.0. The issue is that 11023 might depend upon changes introduced in the schema versions before it. Or 11023
mightintroduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0.

One potential solution is to require that schema changes are never merged in to a release, but of course sometimes
businessrequires we do =(

I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many
differentversions of the database.

I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feeling
thatthis blog post won't address branches.

Cheers!

--Royce

Re: Schema version control

From

Andy Colson

Date:

10 February 2011, 18:14:15

On 2/10/2011 3:38 PM, Royce Ausburn wrote:
> Hi all,
>
> My company is having trouble managing how we upgrade schema changes across many versions of our software.  I imagine
thisis a common problem and there're probably some neat solutions that we don't know about. 
>
> For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002,
db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts.
Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks.  The
schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the
softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. 
>
> This has worked really well so far.  But  we've effectively only had one version of the software in development at
anytime.  We're now in the habit of branching the software to form releases to promote stability when making large
changes. The idea is that only really important changes are merged in to the releases.  This introduces a bit of a
problemwhen some change needs to be merged from one release to another.  The typical problem is that we might have two
versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from
schema11023 in to 10.0.  The issue is that 11023 might depend upon changes introduced in the schema versions before it.
Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. 
>
> One potential solution is to require that schema changes are never merged in to a release, but of course sometimes
businessrequires we do =( 
>
> I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many
differentversions of the database. 
>
> I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feeling
thatthis blog post won't address branches. 
>
> Cheers!
>
> --Royce
>
>

So, 10.0 at 10057.
11.0 at 11023.

then 10.1 needs some fixes so db is bumped to 10058.

Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to
11023.

Humm... maybe you need smarter upgrade scripts?  Would having logic in
the script help?  Something like:

if not fieldExists('xyz) then alter table ...  add xyz ...



Or, maybe your schema numbering system is to broad?  Maybe each table
could have a version number?


Or some kinda flags like:
create table dbver(key text);

then an update would be named: "add xyz to bob".

then the update code:

q = select key from dbver where key = 'add xyz to bob';
if q.eof then
    alter table bob add xyz


-Andy

Re: Schema version control

From

Andy Colson

Date:

10 February 2011, 18:18:24

On 2/10/2011 4:14 PM, Andy Colson wrote:
> On 2/10/2011 3:38 PM, Royce Ausburn wrote:
>> Hi all,
>>
>> My company is having trouble managing how we upgrade schema changes
>> across many versions of our software. I imagine this is a common
>> problem and there're probably some neat solutions that we don't know
>> about.
>>
>> For the last 10 years we have been writing bash shell scripts
>> essentially numbered in order db0001, db0002, db0003.... The number
>> represents the schema version which is recorded in the database and
>> updated by the shell scripts. We have a template that provides all the
>> functionality we need, we just copy the script and fill in the blanks.
>> The schema upgrade scripts are committed to svn along with the
>> software changes, and we have a process when installing the software
>> at a site that runs the scripts on the DB in order before starting up
>> the new version of the software.
>>
>> This has worked really well so far. But we've effectively only had one
>> version of the software in development at any time. We're now in the
>> habit of branching the software to form releases to promote stability
>> when making large changes. The idea is that only really important
>> changes are merged in to the releases. This introduces a bit of a
>> problem when some change needs to be merged from one release to
>> another. The typical problem is that we might have two versions of the
>> software 10.0 at schema version 10057 and 11.0 at 11023 and we need to
>> merge an important bug fix from schema 11023 in to 10.0. The issue is
>> that 11023 might depend upon changes introduced in the schema versions
>> before it. Or 11023 might introduce changes that cause later scripts
>> to break (11000 - 11023) when upgrading from 10.0 to 11.0.
>>
>> One potential solution is to require that schema changes are never
>> merged in to a release, but of course sometimes business requires we
>> do =(
>>
>> I'm really interested to hear how you guys manage schema upgrades in
>> the face of branches and upgrading from many different versions of the
>> database.
>>
>> I've been reading
>> http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/
>> but I have a feeling that this blog post won't address branches.
>>
>> Cheers!
>>
>> --Royce
>>
>>
>
> So, 10.0 at 10057.
> 11.0 at 11023.
>
> then 10.1 needs some fixes so db is bumped to 10058.
>
> Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to
> 11023.
>
> Humm... maybe you need smarter upgrade scripts? Would having logic in
> the script help? Something like:
>
> if not fieldExists('xyz) then alter table ... add xyz ...
>
>
>
> Or, maybe your schema numbering system is to broad? Maybe each table
> could have a version number?
>
>
> Or some kinda flags like:
> create table dbver(key text);
>
> then an update would be named: "add xyz to bob".
>
> then the update code:
>
> q = select key from dbver where key = 'add xyz to bob';
> if q.eof then
> alter table bob add xyz
>
>
> -Andy
>

D'oh!  a bug in my update script:


q = select key from dbver where key = 'add xyz to bob';
if q.eof then
    alter table bob add xyz
    insert into dbver('add xyz to bob');

How embarrassing :-)

-Andy

Re: Schema version control

From

Bill Moran

Date:

10 February 2011, 18:20:38

We have this kickass solution we built at work called dbsteward that
just takes care of all of this for us, automatically.  You just give
it the new version and the old version and it generates update statements
to feed into PG.

The reason I'm bringing this up is that we're working to release
dbsteward as open source for PGCon.  So, if you can wait a bit, you
can jump on that train.

In response to Andy Colson <andy@squeakycode.net>:

> On 2/10/2011 3:38 PM, Royce Ausburn wrote:
> > Hi all,
> >
> > My company is having trouble managing how we upgrade schema changes across many versions of our software.  I
imaginethis is a common problem and there're probably some neat solutions that we don't know about. 
> >
> > For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002,
db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts.
Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks.  The
schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the
softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. 
> >
> > This has worked really well so far.  But  we've effectively only had one version of the software in development at
anytime.  We're now in the habit of branching the software to form releases to promote stability when making large
changes. The idea is that only really important changes are merged in to the releases.  This introduces a bit of a
problemwhen some change needs to be merged from one release to another.  The typical problem is that we might have two
versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from
schema11023 in to 10.0.  The issue is that 11023 might depend upon changes introduced in the schema versions before it.
Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. 
> >
> > One potential solution is to require that schema changes are never merged in to a release, but of course sometimes
businessrequires we do =( 
> >
> > I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many
differentversions of the database. 
> >
> > I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a
feelingthat this blog post won't address branches. 
> >
> > Cheers!
> >
> > --Royce
> >
> >
>
> So, 10.0 at 10057.
> 11.0 at 11023.
>
> then 10.1 needs some fixes so db is bumped to 10058.
>
> Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to
> 11023.
>
> Humm... maybe you need smarter upgrade scripts?  Would having logic in
> the script help?  Something like:
>
> if not fieldExists('xyz) then alter table ...  add xyz ...
>
>
>
> Or, maybe your schema numbering system is to broad?  Maybe each table
> could have a version number?
>
>
> Or some kinda flags like:
> create table dbver(key text);
>
> then an update would be named: "add xyz to bob".
>
> then the update code:
>
> q = select key from dbver where key = 'add xyz to bob';
> if q.eof then
>     alter table bob add xyz
>
>
> -Andy
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Andy Colson

Date:

10 February 2011, 18:42:50

On 2/10/2011 4:18 PM, Bill Moran wrote:
>
> We have this kickass solution we built at work called dbsteward that
> just takes care of all of this for us, automatically.  You just give
> it the new version and the old version and it generates update statements
> to feed into PG.
>
> The reason I'm bringing this up is that we're working to release
> dbsteward as open source for PGCon.  So, if you can wait a bit, you
> can jump on that train.
>
> In response to Andy Colson<andy@squeakycode.net>:
>

But... that assumes all updates are DDL only?  What if I have a lookup
table, and want to add some new entries into it?

-Andy

Re: Schema version control

From

Rob Sargent

Date:

10 February 2011, 18:43:09

Top-posting is frowned upon by some (not me), but since Bill started it...

I for one will be waiting to see your dbsteward.  How does it compare
functionally or stylistically with Ruby's migration tools (which I found
to be pretty cool and frustrating all in one go).

On 02/10/2011 03:18 PM, Bill Moran wrote:
>
> We have this kickass solution we built at work called dbsteward that
> just takes care of all of this for us, automatically.  You just give
> it the new version and the old version and it generates update statements
> to feed into PG.
>
> The reason I'm bringing this up is that we're working to release
> dbsteward as open source for PGCon.  So, if you can wait a bit, you
> can jump on that train.
>
> In response to Andy Colson <andy@squeakycode.net>:
>
>> On 2/10/2011 3:38 PM, Royce Ausburn wrote:
>>> Hi all,
>>>
>>> My company is having trouble managing how we upgrade schema changes across many versions of our software.  I
imaginethis is a common problem and there're probably some neat solutions that we don't know about. 
>>>
>>> For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002,
db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts.
Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks.  The
schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the
softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. 
>>>
>>> This has worked really well so far.  But  we've effectively only had one version of the software in development at
anytime.  We're now in the habit of branching the software to form releases to promote stability when making large
changes. The idea is that only really important changes are merged in to the releases.  This introduces a bit of a
problemwhen some change needs to be merged from one release to another.  The typical problem is that we might have two
versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from
schema11023 in to 10.0.  The issue is that 11023 might depend upon changes introduced in the schema versions before it.
Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. 
>>>
>>> One potential solution is to require that schema changes are never merged in to a release, but of course sometimes
businessrequires we do =( 
>>>
>>> I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many
differentversions of the database. 
>>>
>>> I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a
feelingthat this blog post won't address branches. 
>>>
>>> Cheers!
>>>
>>> --Royce
>>>
>>>
>>
>> So, 10.0 at 10057.
>> 11.0 at 11023.
>>
>> then 10.1 needs some fixes so db is bumped to 10058.
>>
>> Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to
>> 11023.
>>
>> Humm... maybe you need smarter upgrade scripts?  Would having logic in
>> the script help?  Something like:
>>
>> if not fieldExists('xyz) then alter table ...  add xyz ...
>>
>>
>>
>> Or, maybe your schema numbering system is to broad?  Maybe each table
>> could have a version number?
>>
>>
>> Or some kinda flags like:
>> create table dbver(key text);
>>
>> then an update would be named: "add xyz to bob".
>>
>> then the update code:
>>
>> q = select key from dbver where key = 'add xyz to bob';
>> if q.eof then
>>     alter table bob add xyz
>>
>>
>> -Andy
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>

Re: Schema version control

From

Bill Moran

Date:

10 February 2011, 18:51:48

In response to Andy Colson <andy@squeakycode.net>:

> On 2/10/2011 4:18 PM, Bill Moran wrote:
> >
> > We have this kickass solution we built at work called dbsteward that
> > just takes care of all of this for us, automatically.  You just give
> > it the new version and the old version and it generates update statements
> > to feed into PG.
> >
> > The reason I'm bringing this up is that we're working to release
> > dbsteward as open source for PGCon.  So, if you can wait a bit, you
> > can jump on that train.
> >
> > In response to Andy Colson<andy@squeakycode.net>:
> >
>
> But... that assumes all updates are DDL only?  What if I have a lookup
> table, and want to add some new entries into it?

It has provisions for maintaining static data as well.  We have a bunch
of lookup tables (too many, in my opinion) and it does an excellent job
of maintaining them.  They just need to have a primary key, but that's
not usually a problem with lookup tables.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Thomas Kellerer

Date:

10 February 2011, 18:59:41

Royce Ausburn wrote on 10.02.2011 22:38:
> I'm really interested to hear how you guys manage schema upgrades in
> the face of branches and upgrading from many different versions of
> the database.

We are quite happy with Liquibase. You can simply run it against a database and tell it to migrate it to "Version x.y"

As it keeps track of all changes applied it automatically knows what to do.

I can handle static data as well as stored procedure and any custom SQL.

Regards
Thomas

Re: Schema version control

From

Bill Moran

Date:

10 February 2011, 18:59:49

In response to Rob Sargent <robjsargent@gmail.com>:

> Top-posting is frowned upon by some (not me), but since Bill started it...

Oops ... the weird thing is that I'm usually really anal about not top-
posting ...

> I for one will be waiting to see your dbsteward.  How does it compare
> functionally or stylistically with Ruby's migration tools (which I found
> to be pretty cool and frustrating all in one go).

I'm not familiar with Ruby's migration tools, so I can't say much.

The overview:
You store your schema and data as XML (this is easy to migrate to, because
it includes a tool that makes the XML from a live database)
Keep your XML schema files in some RCS.
When it's time for a new deployment, you run the dbsteward tool against
the schema XML and it turns it into DDL and DML.
When it's time for an upgrade, you run the dbsteward tool against two
schema XML files, and it calculates what has changed and generates the
appropriate DDL and DML to upgrade.

So ... you know, however that compares with the Ruby stuff is how it
does.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Thomas Kellerer

Date:

10 February 2011, 19:05:15

Bill Moran wrote on 10.02.2011 23:59:
> The overview:
> You store your schema and data as XML (this is easy to migrate to, because
> it includes a tool that makes the XML from a live database)
> Keep your XML schema files in some RCS.
> When it's time for a new deployment, you run the dbsteward tool against
> the schema XML and it turns it into DDL and DML.
> When it's time for an upgrade, you run the dbsteward tool against two
> schema XML files, and it calculates what has changed and generates the
> appropriate DDL and DML to upgrade.

This very much sounds like Liquibase. Do you happen to know any differences?

Regards
Thomas

Re: Schema version control

From

Rob Sargent

Date:

10 February 2011, 19:18:48

On 02/10/2011 03:59 PM, Bill Moran wrote:
> In response to Rob Sargent <robjsargent@gmail.com>:
>
>> Top-posting is frowned upon by some (not me), but since Bill started it...
>
> Oops ... the weird thing is that I'm usually really anal about not top-
> posting ...
>
>> I for one will be waiting to see your dbsteward.  How does it compare
>> functionally or stylistically with Ruby's migration tools (which I found
>> to be pretty cool and frustrating all in one go).
>
> I'm not familiar with Ruby's migration tools, so I can't say much.
>
> The overview:
> You store your schema and data as XML (this is easy to migrate to, because
> it includes a tool that makes the XML from a live database)
> Keep your XML schema files in some RCS.
> When it's time for a new deployment, you run the dbsteward tool against
> the schema XML and it turns it into DDL and DML.
> When it's time for an upgrade, you run the dbsteward tool against two
> schema XML files, and it calculates what has changed and generates the
> appropriate DDL and DML to upgrade.
>
> So ... you know, however that compares with the Ruby stuff is how it
> does.
>
Now at the bottom :)

It's been a couple years since I played with Ruby ActiveRecord but it's
(of course) radically than what you describe.  The ddl is in the ruby
code and naturally the code is in RCS.  So a revision is a new instance
of ActiveRecord (iirc) which does the change(s) (create table ttt, alter
table vvv etc).  Maybe skip a rev.  Rollback to a rev is definitely
there because one writes the undo for each new revision.  This include
manipulating the data of course, so there are limitations.

I personally am leary of the 'make the prod match the dev db' approach.
Who knows what extras lurk in the depths. I think one should be able to
make the dev db from scratch and write the necessary scripts to change
to (and from if possible) each revision. Apply to prod when tested.

Re: Schema version control

From

"Andy Chambers"

Date:

10 February 2011, 19:21:44

On Thu, 10 Feb 2011 17:59:30 -0500, Bill Moran <wmoran@potentialtech.com>
wrote:

> The overview:
> You store your schema and data as XML (this is easy to migrate to,
> because
> it includes a tool that makes the XML from a live database)

We're doing a similar thing here except we're a Lisp shop so our
schema is defined as a set of "defentities" and we can migrate from
one version to another using a corresponding set of "defmaps".

> Keep your XML schema files in some RCS.
> When it's time for a new deployment, you run the dbsteward tool against
> the schema XML and it turns it into DDL and DML.
> When it's time for an upgrade, you run the dbsteward tool against two
> schema XML files, and it calculates what has changed and generates the
> appropriate DDL and DML to upgrade.

This sounds pretty cool.  Ours doesn't do that yet but that's next
on my TODO list.

--
Andy Chambers

Re: Schema version control

From

Bill Moran

Date:

10 February 2011, 19:37:40

In response to Thomas Kellerer <spam_eater@gmx.net>:

> Bill Moran wrote on 10.02.2011 23:59:
> > The overview:
> > You store your schema and data as XML (this is easy to migrate to, because
> > it includes a tool that makes the XML from a live database)
> > Keep your XML schema files in some RCS.
> > When it's time for a new deployment, you run the dbsteward tool against
> > the schema XML and it turns it into DDL and DML.
> > When it's time for an upgrade, you run the dbsteward tool against two
> > schema XML files, and it calculates what has changed and generates the
> > appropriate DDL and DML to upgrade.
>
> This very much sounds like Liquibase. Do you happen to know any differences?

Hrm ... before we started writing dbsteward, we looked around to see
if something already existed and didn't find Liquibase.  This is the
first I've heard of it.  I'm thinking it was some time in 2008, and
according to their changelog, Liquibase was around at that time.  I
wonder how we missed it ...

Anyway ... based on nothing more than a quick scan of their quickstart
page, here are the differences I see:
* Liquibase is dependent on you creating "changesets".  I'm sure this
  works, but we took a different approach with dbsteward.  dbsteward
  expects you to maintain XML files that represent the entire database,
  then dbsteward does the work of figuring out what changed.  Our
  opinion was that svn already does the work of tracking changes, why
  reinvent the wheel.
* Looks like liquibase requires you to talk to the database to push
  the changes?  dbsteward outputs a DDL/DML file that you can push
  in whatever way is best.  This is important to us because we use
  Slony, and DDL changes have to be submitted through EXECUTE SCRIPT()
* dbsteward has built-in Slony support (i.e. it will make slony configs
  as well as slony upgrade scripts in addition to DDL/DML)
* Does liquibase support UDFs?  dbsteward does.
* liquibase has a lot more supported platforms at this time.  dbsteward
  only supports PostgreSQL and MSSQL (because that's all that we needed)
  but I expect that other support will come quickly once we release it.
* Does liquibase support things like multi-column indexes and multi-
  column primary keys?  dbsteward does.

I don't think I should go on and on, as I could ask a lot of questions
about what liquibase does, and I simply don't have the time right now
to research it, or ask all those question ;)

Anyway ... sorry for the teaser on this, but we're trying to get through
all the hoops the company is requiring us to do to release it, and we
think we're on track to be ready by PGCon, so there'll be a website up
as soon as we can get it.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Bill Moran

Date:

10 February 2011, 19:44:47

In response to Rob Sargent <robjsargent@gmail.com>:
>
> On 02/10/2011 03:59 PM, Bill Moran wrote:
> > In response to Rob Sargent <robjsargent@gmail.com>:
> >> I for one will be waiting to see your dbsteward.  How does it compare
> >> functionally or stylistically with Ruby's migration tools (which I found
> >> to be pretty cool and frustrating all in one go).
> >
> > I'm not familiar with Ruby's migration tools, so I can't say much.
> >
> > The overview:
> > You store your schema and data as XML (this is easy to migrate to, because
> > it includes a tool that makes the XML from a live database)
> > Keep your XML schema files in some RCS.
> > When it's time for a new deployment, you run the dbsteward tool against
> > the schema XML and it turns it into DDL and DML.
> > When it's time for an upgrade, you run the dbsteward tool against two
> > schema XML files, and it calculates what has changed and generates the
> > appropriate DDL and DML to upgrade.
> >
> > So ... you know, however that compares with the Ruby stuff is how it
> > does.
> >
> Now at the bottom :)
>
> It's been a couple years since I played with Ruby ActiveRecord but it's
> (of course) radically than what you describe.  The ddl is in the ruby
> code and naturally the code is in RCS.  So a revision is a new instance
> of ActiveRecord (iirc) which does the change(s) (create table ttt, alter
> table vvv etc).  Maybe skip a rev.  Rollback to a rev is definitely
> there because one writes the undo for each new revision.  This include
> manipulating the data of course, so there are limitations.

dbsteward can do downgrades ... you just feed it the old schema and
the new schema in reverse of how you'd do an upgrade ;)

Oh, also, it allows us to do installation-specific overrides.  We use
this ONLY for DML for lookup lists where some clients have slightly
different names for things than others.  In theory, it could do DDL
overrides as well, but we decided on a policy of not utilizing that
because we wanted the schemas to be consistent on all our installs.

> I personally am leary of the 'make the prod match the dev db' approach.
> Who knows what extras lurk in the depths. I think one should be able to
> make the dev db from scratch and write the necessary scripts to change
> to (and from if possible) each revision. Apply to prod when tested.

dbsteward allows us to do all this.  A developer can make a change,
rebuild a test database from their change to make sure it works, then
test the upgrade process as well, all before even checking the code in.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Thomas Kellerer

Date:

10 February 2011, 19:44:58

Bill Moran wrote on 11.02.2011 00:37:
> Anyway ... based on nothing more than a quick scan of their quickstart
> page, here are the differences I see:
> * Liquibase is dependent on you creating "changesets".  I'm sure this
>    works, but we took a different approach with dbsteward.  dbsteward
>    expects you to maintain XML files that represent the entire database,
>    then dbsteward does the work of figuring out what changed.  Our
>    opinion was that svn already does the work of tracking changes, why
>    reinvent the wheel.

That sounds like a very nice feature.

> * Looks like liquibase requires you to talk to the database to push
>    the changes?  dbsteward outputs a DDL/DML file that you can push
>    in whatever way is best.  This is important to us because we use
>    Slony, and DDL changes have to be submitted through EXECUTE SCRIPT()

No, Liquibase can also emit the SQL that it would execute.

> * dbsteward has built-in Slony support (i.e. it will make slony configs
>    as well as slony upgrade scripts in addition to DDL/DML)


> * liquibase has a lot more supported platforms at this time.  dbsteward
>    only supports PostgreSQL and MSSQL (because that's all that we needed)
>    but I expect that other support will come quickly once we release it.


> * Does liquibase support things like multi-column indexes and multi-
>    column primary keys?  dbsteward does.

Yes without problems (including of course the necessary foreing keys)


> Anyway ... sorry for the teaser on this, but we're trying to get through
> all the hoops the company is requiring us to do to release it, and we
> think we're on track to be ready by PGCon, so there'll be a website up
> as soon as we can get it.

Thanks for the feedback, I would really like to see it.

The approach that you do not record the changes but simply let the software find them seems like a very nifty feature.
I wonder how you detect renaming a table or a column?

On which programming language is dbstewart based?


Regards
Thomas

Re: Schema version control

From

Rob Sargent

Date:

10 February 2011, 19:49:54


On 02/10/2011 02:38 PM, Royce Ausburn wrote:
> Hi all,
>
> My company is having trouble managing how we upgrade schema changes across many versions of our software.  I imagine
thisis a common problem and there're probably some neat solutions that we don't know about.   
>
> For the last 10 years we have been writing bash shell scripts essentially numbered in order db0001, db0002,
db0003....The number represents the schema version which is recorded in the database and updated by the shell scripts.
Wehave a template that provides all the functionality we need, we just copy the script and fill in the blanks.  The
schemaupgrade scripts are committed to svn along with the software changes, and we have a process when installing the
softwareat a site that runs the scripts on the DB in order before starting up the new version of the software. 
>
> This has worked really well so far.  But  we've effectively only had one version of the software in development at
anytime.  We're now in the habit of branching the software to form releases to promote stability when making large
changes. The idea is that only really important changes are merged in to the releases.  This introduces a bit of a
problemwhen some change needs to be merged from one release to another.  The typical problem is that we might have two
versionsof the software 10.0 at schema version 10057 and 11.0 at 11023 and we need to merge an important bug fix from
schema11023 in to 10.0.  The issue is that 11023 might depend upon changes introduced in the schema versions before it.
Or 11023 might introduce changes that cause later scripts to break (11000 - 11023) when upgrading from 10.0 to 11.0. 
>
> One potential solution is to require that schema changes are never merged in to a release, but of course sometimes
businessrequires we do =( 
>
> I'm really interested to hear how you guys manage schema upgrades in the face of branches and upgrading from many
differentversions of the database. 
>
> I've been reading http://pyrseas.wordpress.com/2011/02/07/version-control-part-2-sql-databases/ but I have a feeling
thatthis blog post won't address branches. 
>
> Cheers!
>
> --Royce
>
>

Don't the bash scripts get checked in to .../perforce/cvs/svn/git/...?
Aren't they part of the resources of the project(s)?

Re: Schema version control

From

Rob Sargent

Date:

10 February 2011, 20:43:38


On 02/10/2011 04:44 PM, Bill Moran wrote:
> In response to Rob Sargent <robjsargent@gmail.com>:
>>
>> On 02/10/2011 03:59 PM, Bill Moran wrote:
>>> In response to Rob Sargent <robjsargent@gmail.com>:
>>>> I for one will be waiting to see your dbsteward.  How does it compare
>>>> functionally or stylistically with Ruby's migration tools (which I found
>>>> to be pretty cool and frustrating all in one go).
>>>
>>> I'm not familiar with Ruby's migration tools, so I can't say much.
>>>
>>> The overview:
>>> You store your schema and data as XML (this is easy to migrate to, because
>>> it includes a tool that makes the XML from a live database)
>>> Keep your XML schema files in some RCS.
>>> When it's time for a new deployment, you run the dbsteward tool against
>>> the schema XML and it turns it into DDL and DML.
>>> When it's time for an upgrade, you run the dbsteward tool against two
>>> schema XML files, and it calculates what has changed and generates the
>>> appropriate DDL and DML to upgrade.
>>>
>>> So ... you know, however that compares with the Ruby stuff is how it
>>> does.
>>>
>> Now at the bottom :)
>>
>> It's been a couple years since I played with Ruby ActiveRecord but it's
>> (of course) radically than what you describe.  The ddl is in the ruby
>> code and naturally the code is in RCS.  So a revision is a new instance
>> of ActiveRecord (iirc) which does the change(s) (create table ttt, alter
>> table vvv etc).  Maybe skip a rev.  Rollback to a rev is definitely
>> there because one writes the undo for each new revision.  This include
>> manipulating the data of course, so there are limitations.
>
> dbsteward can do downgrades ... you just feed it the old schema and
> the new schema in reverse of how you'd do an upgrade ;)
>
> Oh, also, it allows us to do installation-specific overrides.  We use
> this ONLY for DML for lookup lists where some clients have slightly
> different names for things than others.  In theory, it could do DDL
> overrides as well, but we decided on a policy of not utilizing that
> because we wanted the schemas to be consistent on all our installs.
>
>> I personally am leary of the 'make the prod match the dev db' approach.
>> Who knows what extras lurk in the depths. I think one should be able to
>> make the dev db from scratch and write the necessary scripts to change
>> to (and from if possible) each revision. Apply to prod when tested.
>
> dbsteward allows us to do all this.  A developer can make a change,
> rebuild a test database from their change to make sure it works, then
> test the upgrade process as well, all before even checking the code in.
>

Good work.  Will look forward to it.

Re: Schema version control

From

Royce Ausburn

Date:

10 February 2011, 23:15:43

On 11/02/2011, at 9:59 AM, Thomas Kellerer wrote:

> Royce Ausburn wrote on 10.02.2011 22:38:
>> I'm really interested to hear how you guys manage schema upgrades in
>> the face of branches and upgrading from many different versions of
>> the database.
>
> We are quite happy with Liquibase. You can simply run it against a database and tell it to migrate it to "Version
x.y"
>
> As it keeps track of all changes applied it automatically knows what to do.
>
> I can handle static data as well as stored procedure and any custom SQL.

Thanks!  Liquibase looks pretty neat.  We'll have to check it out.

--Royce

Re: Schema version control

From

Royce Ausburn

Date:

10 February 2011, 23:16:11

>
> So, 10.0 at 10057.
> 11.0 at 11023.
>
> then 10.1 needs some fixes so db is bumped to 10058.
>
> Then, later, you can upgrade 10057 to 11023, but you cant get 10058 to 11023.
>
> Humm... maybe you need smarter upgrade scripts?  Would having logic in the script help?  Something like:
>
> if not fieldExists('xyz) then alter table ...  add xyz ...
>
>
>
> Or, maybe your schema numbering system is to broad?  Maybe each table could have a version number?
>
>
> Or some kinda flags like:
> create table dbver(key text);
>
> then an update would be named: "add xyz to bob".
>
> then the update code:
>
> q = select key from dbver where key = 'add xyz to bob';
> if q.eof then
>     alter table bob add xyz
>


This is effectively the approach we've been working with so far, but it isn't great.  The issue is that you need to be
reallyaware of what changes might or might not have been in the previous databases... This can be hard. 

There's also the problem of updating data, etc.

Re: Schema version control

From

Royce Ausburn

Date:

10 February 2011, 23:16:37

>
> Don't the bash scripts get checked in to .../perforce/cvs/svn/git/...?
> Aren't they part of the resources of the project(s)?

Yep - they absolutely are.  The issue is that there're multiple branches *potentially* having new scripts committed.
Fortunatelyit's rare as the release branches seldom require schema changes, but it's an important enough problem to
needa better solution. 

--Royce

Re: Schema version control

From

Glenn Maynard

Date:

11 February 2011, 00:48:22

On Thu, Feb 10, 2011 at 6:44 PM, Bill Moran <wmoran@potentialtech.com> wrote:

dbsteward can do downgrades ... you just feed it the old schema and
the new schema in reverse of how you'd do an upgrade ;)

Oh, also, it allows us to do installation-specific overrides. We use
this ONLY for DML for lookup lists where some clients have slightly
different names for things than others. In theory, it could do DDL
overrides as well, but we decided on a policy of not utilizing that
because we wanted the schemas to be consistent on all our installs.

What about upgrades that can't be derived directly from an inspection of the schema? Some examples:

- Adding a NOT NULL constraint (without adding a DEFAULT). You often want to precede this with filling in any existing NULL values, so the new constraint doesn't fail.
- Updating triggers, functions and their effects. For example, when I have an FTS index with a trigger to update an index column, and I change the underlying trigger, I often do something like "UPDATE table SET column = column", to cause all of the update triggers to fire and recalculate the index columns.
- Creating a new column based on an old one, and removing the old one; eg. add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old columns "i" and "j".
- Updating data from an external source, such as ORM model code; for example, if you have a table representing external files, an update may want to calculate and update the SHA-1 of each file.
- For efficiency, dropping a specific index while making a large update, and then recreating the index.

In my experience, while generating schema updates automatically is handy, it tends to make nontrivial database updates more complicated. These sorts of things happen often and are an integral part of a database update, so I'm just curious how/if you deal with them.

I've used Ruby's migrations, and for my Django databases I use my own migration system which is based in principle off of it: create scripts to migrate the database from version X to X+1 and X-1, and upgrade or downgrade by running the appropriate scripts in sequence.

It's not ideal, since it can't generate a database at a specific version directly; it always has to run through the entire sequence of migrations to the version you want, and the migrations accumulate. However, it can handle whatever arbitrary steps are needed to update a database, and I don't need to test updates from every version to every other version.

--
Glenn Maynard

Re: Schema version control

From

Bill Moran

Date:

11 February 2011, 01:16:31

In response to Glenn Maynard <glenn@zewt.org>:

> On Thu, Feb 10, 2011 at 6:44 PM, Bill Moran <wmoran@potentialtech.com>wrote:
>
> > dbsteward can do downgrades ... you just feed it the old schema and
> > the new schema in reverse of how you'd do an upgrade ;)
> >
> > Oh, also, it allows us to do installation-specific overrides.  We use
> > this ONLY for DML for lookup lists where some clients have slightly
> > different names for things than others.  In theory, it could do DDL
> > overrides as well, but we decided on a policy of not utilizing that
> > because we wanted the schemas to be consistent on all our installs.
> >
>
> What about upgrades that can't be derived directly from an inspection of the
> schema?  Some examples:
>
> - Adding a NOT NULL constraint (without adding a DEFAULT).  You often want
> to precede this with filling in any existing NULL values, so the new
> constraint doesn't fail.
> - Updating triggers, functions and their effects.  For example, when I have
> an FTS index with a trigger to update an index column, and I change the
> underlying trigger, I often do something like "UPDATE table SET column =
> column", to cause all of the update triggers to fire and recalculate the
> index columns.
> - Creating a new column based on an old one, and removing the old one; eg.
> add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old
> columns "i" and "j".
> - Updating data from an external source, such as ORM model code; for
> example, if you have a table representing external files, an update may want
> to calculate and update the SHA-1 of each file.
> - For efficiency, dropping a specific index while making a large update, and
> then recreating the index.
>
> In my experience, while generating schema updates automatically is handy, it
> tends to make nontrivial database updates more complicated.  These sorts of
> things happen often and are an integral part of a database update, so I'm
> just curious how/if you deal with them.
>
> I've used Ruby's migrations, and for my Django databases I use my own
> migration system which is based in principle off of it: create scripts to
> migrate the database from version X to X+1 and X-1, and upgrade or downgrade
> by running the appropriate scripts in sequence.
>
> It's not ideal, since it can't generate a database at a specific version
> directly; it always has to run through the entire sequence of migrations to
> the version you want, and the migrations accumulate.  However, it can handle
> whatever arbitrary steps are needed to update a database, and I don't need
> to test updates from every version to every other version.

You're correct (based on our experience over the past few years).

The big caveat is that 99.9% of the database changes don't fall into those
"nontrivial" categories, and dbsteward makes those 99.9% of the changes
easy to do, reliable to reproduce, and easy to track.

We've added some stuff to handle the other .1% as well, like <beforeUpdateSQL>
and <afterUpdateSQL> where you can put an arbitrary SQL strings to be run
before or after the remainder of the automatic stuff is done.  We probably
haven't seen every circumstance that needs a special handling, but we've
already struggled through a bunch.

All this is part of the reason we're pushing to get this stuff open-
sourced.  We feel like we've got something that's pretty far along, and
we feel that community involvement will help enhance things.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Glenn Maynard

Date:

11 February 2011, 01:35:22

On Fri, Feb 11, 2011 at 12:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:

The big caveat is that 99.9% of the database changes don't fall into those
"nontrivial" categories, and dbsteward makes those 99.9% of the changes
easy to do, reliable to reproduce, and easy to track.

My experience is maybe more like 95% than 99.9%, for what it's worth; they're the exception, but not rare.

We've added some stuff to handle the other .1% as well, like <beforeUpdateSQL>
and <afterUpdateSQL> where you can put an arbitrary SQL strings to be run
before or after the remainder of the automatic stuff is done. We probably
haven't seen every circumstance that needs a special handling, but we've
already struggled through a bunch.

Here's a fairly common example, in the abstract:

version 1 has two columns, i and j;
version 2 has one column, k, where k = i + j; and
version 3 has one column, x, where x = k * 2

Not only is updating from 1 to 2 tricky ("k = i + j" lies between the adding of "k" but before the removal of i and j; it's neither a "before" nor an "after"), but updating directly from 1 to 3 without first migrating to 2 is extremely hard. I suspect you'd need to snapshot the schema at each version where these are needed to update incrementally, rather than always trying to convert directly to the current version--maybe you already do that.

Anyhow, just some thoughts based on my own experience with database updates--good luck.

--
Glenn Maynard

Re: Schema version control

From

Alban Hertroys

Date:

11 February 2011, 03:52:53

On 10 Feb 2011, at 23:59, Bill Moran wrote:

> The overview:
> You store your schema and data as XML (this is easy to migrate to, because
> it includes a tool that makes the XML from a live database)
> Keep your XML schema files in some RCS.

That reminds me of something I've been wondering about - How well do modern RCSs deal with structured data formats
(likeXML)? It would appear that most of them still use diff, which is line-based with limited context tracking, to
determinechange-sets. 

Is that combination guaranteed to result in valid XML if you merge revisions that are far enough apart? Or are there
RCSsaround that know about XML format (and possibly other structured formats) and handle it differently? 

I've heavily used RCSs (mostly Subversion) with, for example HTML, and merge conflicts haven't been unusual. It doesn't
helpSubversion's diff is a bit simplistic about white-space, I'm sure some of those conflicts were quite unnecessary. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d54eac711731788013809!

Re: Schema version control

From

Andre Lopes

Date:

11 February 2011, 05:17:15

Hi,

Where can we donwload dbsteward?

Best Regards,



On Fri, Feb 11, 2011 at 5:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
> In response to Glenn Maynard <glenn@zewt.org>:
>
>> On Thu, Feb 10, 2011 at 6:44 PM, Bill Moran <wmoran@potentialtech.com>wrote:
>>
>> > dbsteward can do downgrades ... you just feed it the old schema and
>> > the new schema in reverse of how you'd do an upgrade ;)
>> >
>> > Oh, also, it allows us to do installation-specific overrides.  We use
>> > this ONLY for DML for lookup lists where some clients have slightly
>> > different names for things than others.  In theory, it could do DDL
>> > overrides as well, but we decided on a policy of not utilizing that
>> > because we wanted the schemas to be consistent on all our installs.
>> >
>>
>> What about upgrades that can't be derived directly from an inspection of the
>> schema?  Some examples:
>>
>> - Adding a NOT NULL constraint (without adding a DEFAULT).  You often want
>> to precede this with filling in any existing NULL values, so the new
>> constraint doesn't fail.
>> - Updating triggers, functions and their effects.  For example, when I have
>> an FTS index with a trigger to update an index column, and I change the
>> underlying trigger, I often do something like "UPDATE table SET column =
>> column", to cause all of the update triggers to fire and recalculate the
>> index columns.
>> - Creating a new column based on an old one, and removing the old one; eg.
>> add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old
>> columns "i" and "j".
>> - Updating data from an external source, such as ORM model code; for
>> example, if you have a table representing external files, an update may want
>> to calculate and update the SHA-1 of each file.
>> - For efficiency, dropping a specific index while making a large update, and
>> then recreating the index.
>>
>> In my experience, while generating schema updates automatically is handy, it
>> tends to make nontrivial database updates more complicated.  These sorts of
>> things happen often and are an integral part of a database update, so I'm
>> just curious how/if you deal with them.
>>
>> I've used Ruby's migrations, and for my Django databases I use my own
>> migration system which is based in principle off of it: create scripts to
>> migrate the database from version X to X+1 and X-1, and upgrade or downgrade
>> by running the appropriate scripts in sequence.
>>
>> It's not ideal, since it can't generate a database at a specific version
>> directly; it always has to run through the entire sequence of migrations to
>> the version you want, and the migrations accumulate.  However, it can handle
>> whatever arbitrary steps are needed to update a database, and I don't need
>> to test updates from every version to every other version.
>
> You're correct (based on our experience over the past few years).
>
> The big caveat is that 99.9% of the database changes don't fall into those
> "nontrivial" categories, and dbsteward makes those 99.9% of the changes
> easy to do, reliable to reproduce, and easy to track.
>
> We've added some stuff to handle the other .1% as well, like <beforeUpdateSQL>
> and <afterUpdateSQL> where you can put an arbitrary SQL strings to be run
> before or after the remainder of the automatic stuff is done.  We probably
> haven't seen every circumstance that needs a special handling, but we've
> already struggled through a bunch.
>
> All this is part of the reason we're pushing to get this stuff open-
> sourced.  We feel like we've got something that's pretty far along, and
> we feel that community involvement will help enhance things.
>
> --
> Bill Moran
> http://www.potentialtech.com
> http://people.collaborativefusion.com/~wmoran/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Schema version control

From

Bill Moran

Date:

11 February 2011, 10:19:25

In response to Andre Lopes <lopes80andre@gmail.com>:

> Hi,
>
> Where can we donwload dbsteward?

You can't yet.  We're still in the process of getting everything lined up
to release it.

We've registerd dbsteward.org, so when we have things ready, that'll be
where you can go.  Expect it to come live in the next few months.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Daniel Popowich

Date:

11 February 2011, 10:32:14

Glenn Maynard writes:
> - Adding a NOT NULL constraint (without adding a DEFAULT).  You often want
> to precede this with filling in any existing NULL values, so the new
> constraint doesn't fail.
> - Updating triggers, functions and their effects.  For example, when I have
> an FTS index with a trigger to update an index column, and I change the
> underlying trigger, I often do something like "UPDATE table SET column =
> column", to cause all of the update triggers to fire and recalculate the
> index columns.
> - Creating a new column based on an old one, and removing the old one; eg.
> add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old
> columns "i" and "j".
> - Updating data from an external source, such as ORM model code; for
> example, if you have a table representing external files, an update may want
> to calculate and update the SHA-1 of each file.
> - For efficiency, dropping a specific index while making a large update, and
> then recreating the index.
>
> In my experience, while generating schema updates automatically is handy, it
> tends to make nontrivial database updates more complicated.  These sorts of
> things happen often and are an integral part of a database update, so I'm
> just curious how/if you deal with them.
>
> I've used Ruby's migrations, and for my Django databases I use my own
> migration system which is based in principle off of it: create scripts to
> migrate the database from version X to X+1 and X-1, and upgrade or downgrade
> by running the appropriate scripts in sequence.
>
> It's not ideal, since it can't generate a database at a specific version
> directly; it always has to run through the entire sequence of migrations to
> the version you want, and the migrations accumulate.  However, it can handle
> whatever arbitrary steps are needed to update a database, and I don't need
> to test updates from every version to every other version.

I'm with Glenn on this point.  I have found updating a db version to
be far more complex (for reasons he illustrates and which I find more
common than not) than any automation tool can handle.  And I wonder if
the time spent developing such a tool (or writing changesets, xml,
etc. for a given tool) actually saves development time.

FWIW, this is what I do:

 1.  I have a table in my database, meta, that contains exactly one
 row, and holds configuration information.  A minimal version of this
 table:

    CREATE TABLE meta (
    id integer DEFAULT 1 NOT NULL CHECK (id = 1),
    major integer NOT NULL,
    minor integer NOT NULL,
    patch integer NOT NULL
    );

  The first column is to guarantee I only have one row.  The next
  three provide a tuple for my version, e.g., 1.1.3.  I add other
  columns for software configuration as needed.

 2.  My application configuration has a DB_VERSION variable which
 defines the version of the database the software depends on.  If
 there's a mismatch, my programs can't connect to the database, thus
 guaranteeing db<=>software happiness.

 3.  Like Glenn, I have a script that can go forwards and backwards,
 one revision at a time.  The effort in maintaining this script is
 minimal, actually: in my RCS, I have the current schema, which is
 generated with: `pg_dump -O -s`.  Based on diffs between current
 system and last revision I can manually generate the necessary DDL
 statements for the script.  If I have data as part of my "schema,"
 like the row in my meta table, those are simple enough to examine and
 add to the script with DML statements.

 4.  Whenever a revision changes the schema, I tag my my repository,
 so I can easily check out that version.

 5.  As for branches.  If production is at 1.1.3 and three branches
 create versions 1.1.4, another 1.1.4 and 1.2.0, and all three want to
 merge their various changes back into a 1.1.3 production database?  I
 say, without being flip, don't let this happen.  Here is where I
 think no software process can make anyone happy.  It's a human
 process: declare someone the owner of the database schema, let them
 own the long term development of the schema, and if anyone needs a
 change, they have to communicate and vet those changes with the db
 designer.

Dan

Re: Schema version control

From

Bill Moran

Date:

11 February 2011, 11:11:47

In response to Alban Hertroys <dalroi@solfertje.student.utwente.nl>:

> On 10 Feb 2011, at 23:59, Bill Moran wrote:
>
> > The overview:
> > You store your schema and data as XML (this is easy to migrate to, because
> > it includes a tool that makes the XML from a live database)
> > Keep your XML schema files in some RCS.
>
> That reminds me of something I've been wondering about - How well do modern RCSs deal with structured data formats
(likeXML)? It would appear that most of them still use diff, which is line-based with limited context tracking, to
determinechange-sets. 
>
> Is that combination guaranteed to result in valid XML if you merge revisions that are far enough apart? Or are there
RCSsaround that know about XML format (and possibly other structured formats) and handle it differently? 
>
> I've heavily used RCSs (mostly Subversion) with, for example HTML, and merge conflicts haven't been unusual. It
doesn'thelp Subversion's diff is a bit simplistic about white-space, I'm sure some of those conflicts were quite
unnecessary.

We keep a lot of XML in RCS (Subversion) aside from just our DB schemas,
and our experience has been somewhat similar to your experience with
HTML.  Since SVN is a line-based RCS, it can occasionally be painful
to resolve conflicts.

Probably our best defense against the problem is that we have a commit
hook that won't let you commit unless the xml file validates against
the DTD first.  This forces developers to make sure they haven't
mangled the xml.

The second best defense (in my opinion) is careful formatting.  By keeping
each tag on its own line, and indenting consistently, the number of mangled
xml files was never much worse than the number of mangled source code
files. (even before we had the commit hooks)

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Schema version control

From

Glenn Maynard

Date:

11 February 2011, 21:03:03

On Fri, Feb 11, 2011 at 8:35 AM, Daniel Popowich <danielpopowich@gmail.com> wrote:

FWIW, this is what I do:

1. I have a table in my database, meta, that contains exactly one
row, and holds configuration information. A minimal version of this
table:

CREATE TABLE meta (
id integer DEFAULT 1 NOT NULL CHECK (id = 1),
major integer NOT NULL,
minor integer NOT NULL,
patch integer NOT NULL
);

What both Rails and my own system does is, rather than having a version, each migration has a name; for example, "20110211-193000-create_initial_db". The table is the set of which migrations have been applied to the database, eg. "20110211-193000-create_initial_db". The migrations are applied in sorted order; hence the prefixed timestamp.

The benefit of this is when you're merging changes from separate branches. Disconnected branches can each add their own migration rules for the part of the database they affect. When the two branches are merged, they fit together naturally. This doesn't work at all when you have ordered "version numbers".

For example, if "20110211-add_column" is added to trunk, and then a branch is merged which has "20110210-add_table", the merged migration fits in implicitly and there are no collisions as with version numbers. (It's applied out of order if you already ran add_column, of course, so you still need to be sure that branches don't collide with each other.) Databases migrate straightforwardly after the merge, regardless of whether they were migrated to trunk or to the branch.

--
Glenn Maynard

Re: Schema version control

From

ChronicDB Community Team

Date:

21 February 2011, 20:24:29

> What about upgrades that can't be derived directly from an inspection
> of the schema?  Some examples:
>
> - Adding a NOT NULL constraint (without adding a DEFAULT).  You often
> want to precede this with filling in any existing NULL values, so the
> new constraint doesn't fail.

This is an important point. The upgrade to the new schema should be
defined with a matching transformer that will initialize such a value.

> - Updating triggers, functions and their effects.  For example, when I
> have an FTS index with a trigger to update an index column, and I
> change the underlying trigger, I often do something like "UPDATE table
> SET column = column", to cause all of the update triggers to fire and
> recalculate the index columns.

This is the kind of work one should not need to write for every upgrade.
The upgrade should be specified as succinctly as possible, and
everything else around it should be automatic.

> - Creating a new column based on an old one, and removing the old one;
> eg. add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop
> the old columns "i" and "j".

Again, a transformer for computing such new columns is clearly needed.
With ChronicDB not only can such a transformer be provided, but also the
old version of the application client (like a web-app) can continue to
use the old schema while maintaining backwards compatible: its query
requests are also transformed.

> - For efficiency, dropping a specific index while making a large
> update, and then recreating the index.
>
> In my experience, while generating schema updates automatically is
> handy, it tends to make nontrivial database updates more complicated.
> These sorts of things happen often and are an integral part of a
> database update, so I'm just curious how/if you deal with them.

Glenn, I think you have hit the nail on the head here. A more systematic
model for dealing with nontrivial database updates is needed. In the
case of making a large update that recreates an index, one approach
might be to instantiate a new schema that has the new index on it, and
ensure service is switched automatically and with data consistency.

> I've used Ruby's migrations, and for my Django databases I use my own
> migration system which is based in principle off of it: create scripts
> to migrate the database from version X to X+1 and X-1, and upgrade or
> downgrade by running the appropriate scripts in sequence.
>
> It's not ideal, since it can't generate a database at a specific
> version directly; it always has to run through the entire sequence of
> migrations to the version you want, and the migrations accumulate.
> However, it can handle whatever arbitrary steps are needed to update a
> database, and I don't need to test updates from every version to every
> other version.

This is something we are planning on adding to ChronicDB. Given a list
of database upgrade definitions, the "end result" should be computed so
that one could upgrade from any arbitrary version to any other arbitrary
version.

Re: Schema version control

From

ChronicDB Community Team

Date:

21 February 2011, 20:52:38

On Fri, 2011-02-11 at 08:35 -0500, Daniel Popowich wrote:

> think no software process can make anyone happy.  It's a human
>  process: declare someone the owner of the database schema, let them
>  own the long term development of the schema, and if anyone needs a
>  change, they have to communicate and vet those changes with the db
>  designer.

This cannot be understated. It is very much a human process, and the
database schema owner should be governing the data model.

Re: Schema version control

From

Roger Leigh

Date:

23 January 2012, 12:16:45

On Thu, Feb 10, 2011 at 02:58:15PM -0700, Rob Sargent wrote:
> On 02/10/2011 02:38 PM, Royce Ausburn wrote:
> > My company is having trouble managing how we upgrade schema changes across
> > many versions of our software.  I imagine this is a common problem and
> > there're probably some neat solutions that we don't know about.
> >
> > For the last 10 years we have been writing bash shell scripts essentially
> > numbered in order db0001, db0002, db0003.... The number represents the
> > schema version which is recorded in the database and updated by the shell
> > scripts.  We have a template that provides all the functionality we need,
> > we just copy the script and fill in the blanks.  The schema upgrade
> > scripts are committed to svn along with the software changes, and we have
> > a process when installing the software at a site that runs the scripts on
> > the DB in order before starting up the new version of the software.
>
> Don't the bash scripts get checked in to .../perforce/cvs/svn/git/...?
> Aren't they part of the resources of the project(s)?

I was thinking about this a little more.  With the new CREATE
EXTENSION functionality in Postgres, we have the infrastructure to
run various SQL scripts to migrate between versioned states.
Obviously the extension code relates to extensions such as
datatypes.  I was wondering if this is sufficiently generic
that it could be used to migrate between different versions of
a schema?

This wouldn't be using the EXTENSION functionality, just the
ability to run the scripts.  This would enable easy upgrades
(and downgrades, branching etc.) between different schema
versions, providing that the appropriate scripts were installed.
If this were optionally also accessible via an SQL syntax such
as an analogue of CREATE and/or ALTER EXTENSION, it would
provide a reliable and standardised method for installing and
upgrading a schema, which would potentially prevent a great
deal of wheel-reinvention between software packages.

Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Re: Schema version control

From

Bill Moran

Date:

23 January 2012, 12:19:44

In response to Roger Leigh <rleigh@codelibre.net>:

> On Thu, Feb 10, 2011 at 02:58:15PM -0700, Rob Sargent wrote:
> > On 02/10/2011 02:38 PM, Royce Ausburn wrote:
> > > My company is having trouble managing how we upgrade schema changes across
> > > many versions of our software.  I imagine this is a common problem and
> > > there're probably some neat solutions that we don't know about.

http://dbsteward.org

We've been using this for several years and it's made our schema versioning
almost a non-issue.  Finally got the go-ahead to release it just this
month.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/