Re: Schema version management - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: Schema version management |
Date | |
Msg-id | CAHyXU0yUEQtJ3=JOzkBonFwNw1VOy4ej+xJiFopEJ2NDr0DviA@mail.gmail.com Whole thread Raw |
In response to | Schema version management (Joel Jacobson <joel@trustly.com>) |
Responses |
Re: Schema version management
Re: Schema version management |
List | pgsql-hackers |
On Sun, May 20, 2012 at 2:41 PM, Joel Jacobson <joel@trustly.com> wrote: > Hi, > > I just read a very interesting post about "schema version management". > > Quote: "You could set it up so that every developer gets their own > test database, sets up the schema there, takes a dump, and checks that > in. There are going to be problems with that, including that dumps > produced by pg_dump are ugly and optimized for restoring, not for > developing with, and they don't have a deterministic output order." ( > http://petereisentraut.blogspot.com/2012/05/my-anti-take-on-database-schema-version.html > ) > > Back in December 2010, I suggested a new option to pg_dump, --split, > which would write the schema definition of each object in separate > files: > > http://archives.postgresql.org/pgsql-hackers/2010-12/msg02285.php > > Instead of a huge plain text schema file, impossible to version > control, all tables/sequences/views/functions are written to separate > files, allowing the use of a version control software system, such as > git, to do proper version controlling. > > The "deterministic output order" problem mentioned in the post above, > is not a problem if each object (table/sequence/view/function/etc) is > written to the same filename everytime. > No matter the order, the tree of files and their content will be > identical, no matter the order in which they are dumped. > > I remember a lot of hackers were very positive about this option, but > we somehow failed to agree on the naming of files in the tree > structure. I'm sure we can work that out though. > > I use this feature in production, I have a cronjob which does a dump > of the schema every hour, committing any eventual changes to a > separate git branch for each database installation, such as > production, development and test. > If no changes to the schema have been made, nothing will be committed > to git since none of the files have changed. > > It is then drop-dead simple to diff two different branches of the > database schema, such as development or production, or diffing > different revisions allowing point-in-time comparison of the schema. > > This is an example of the otuput of a git log --summary for one of the > automatic commits to our production database's git-repo: > > -- > commit 18c31f8162d851b0dac3bad7e80529ef2ed18be3 > Author: Production Database <production.database@trustly.com> > Date: Fri May 4 15:00:04 2012 +0200 > > Update of database schema Linux DB0 2.6.26-2-amd64 #1 SMP Wed Aug > 19 22:33:18 UTC 2009 x86_64 GNU/Linux Fri, 04 May 2012 15:00:04 +0200 > > create mode 100644 > gluepay-split/public/CONSTRAINT/openingclosingbalances_pkey.sql > create mode 100644 > gluepay-split/public/CONSTRAINT/openingclosingbalances_source_key.sql > create mode 100644 gluepay-split/public/SEQUENCE/seqopeningclosingbalance.sql > create mode 100644 gluepay-split/public/TABLE/openingclosingbalances.sql > -- > > Here we can see we apparently deployed a new table, > "openingclosingbalances" around Fri May 4 15:00:04. > > Without any manual work, I'm able to follow all changes actually > _deployed_ in each database. > > At my company, a highly database-centric stored-procedure intensive > business dealing with mission-critical monetary transactions, we've > been using this technique to successfully do schema version management > without any hassle for the last two years. > > Hopefully this can add to the list of various possible _useful_ schema > version management methods. What does your patch do that you can't already do with pg_restore? create function foo(a int, b int, c text) returns int as $$ select 0; $$ language sql; CREATE FUNCTION pg_dump -Fc postgres -s > postgres.dump pg_restore -l postgres.dump | grep FUNCTION 196; 1255 32939 FUNCTION public foo(integer, integer, text) merlin pg_restore -P "foo(integer, integer, text)" postgres.dump <function body follows> it's fairly easy to wrap pg_restore with a smalls script that extracts function bodies and writes them out to file names. this is a great and underused feature, so I'd argue that if you wanted to formalize per object file extraction you should be looking at expanding pg_restore, not pg_dump. merlin
pgsql-hackers by date: