Re: Schema version management - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: Schema version management
Date
Msg-id CAHyXU0yUEQtJ3=JOzkBonFwNw1VOy4ej+xJiFopEJ2NDr0DviA@mail.gmail.com
Whole thread Raw
In response to Schema version management  (Joel Jacobson <joel@trustly.com>)
Responses Re: Schema version management
Re: Schema version management
List pgsql-hackers
On Sun, May 20, 2012 at 2:41 PM, Joel Jacobson <joel@trustly.com> wrote:
> Hi,
>
> I just read a very interesting post about "schema version management".
>
> Quote: "You could set it up so that every developer gets their own
> test database, sets up the schema there, takes a dump, and checks that
> in. There are going to be problems with that, including that dumps
> produced by pg_dump are ugly and optimized for restoring, not for
> developing with, and they don't have a deterministic output order." (
> http://petereisentraut.blogspot.com/2012/05/my-anti-take-on-database-schema-version.html
> )
>
> Back in December 2010, I suggested a new option to pg_dump, --split,
> which would write the schema definition of each object in separate
> files:
>
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg02285.php
>
> Instead of a huge plain text schema file, impossible to version
> control, all tables/sequences/views/functions are written to separate
> files, allowing the use of a version control software system, such as
> git, to do proper version controlling.
>
> The "deterministic output order" problem mentioned in the post above,
> is not a problem if each object (table/sequence/view/function/etc) is
> written to the same filename everytime.
> No matter the order, the tree of files and their content will be
> identical, no matter the order in which they are dumped.
>
> I remember a lot of hackers were very positive about this option, but
> we somehow failed to agree on the naming of files in the tree
> structure. I'm sure we can work that out though.
>
> I use this feature in production, I have a cronjob which does a dump
> of the schema every hour, committing any eventual changes to a
> separate git branch for each database installation, such as
> production, development and test.
> If no changes to the schema have been made, nothing will be committed
> to git since none of the files have changed.
>
> It is then drop-dead simple to diff two different branches of the
> database schema, such as development or production, or diffing
> different revisions allowing point-in-time comparison of the schema.
>
> This is an example of the otuput of a git log --summary for one of the
> automatic commits to our production database's git-repo:
>
> --
> commit 18c31f8162d851b0dac3bad7e80529ef2ed18be3
> Author: Production Database <production.database@trustly.com>
> Date:   Fri May 4 15:00:04 2012 +0200
>
>     Update of database schema Linux DB0 2.6.26-2-amd64 #1 SMP Wed Aug
> 19 22:33:18 UTC 2009 x86_64 GNU/Linux Fri, 04 May 2012 15:00:04 +0200
>
>  create mode 100644
> gluepay-split/public/CONSTRAINT/openingclosingbalances_pkey.sql
>  create mode 100644
> gluepay-split/public/CONSTRAINT/openingclosingbalances_source_key.sql
>  create mode 100644 gluepay-split/public/SEQUENCE/seqopeningclosingbalance.sql
>  create mode 100644 gluepay-split/public/TABLE/openingclosingbalances.sql
> --
>
> Here we can see we apparently deployed a new table,
> "openingclosingbalances" around Fri May 4 15:00:04.
>
> Without any manual work, I'm able to follow all changes actually
> _deployed_ in each database.
>
> At my company, a highly database-centric stored-procedure intensive
> business dealing with mission-critical monetary transactions, we've
> been using this technique to successfully do schema version management
> without any hassle for the last two years.
>
> Hopefully this can add to the list of various possible _useful_ schema
> version management methods.

What does your patch do that you can't already do with pg_restore?

create function foo(a int, b int, c text) returns int as $$ select 0;
$$ language sql;
CREATE FUNCTION

pg_dump -Fc postgres -s > postgres.dump
pg_restore -l postgres.dump  | grep FUNCTION
196; 1255 32939 FUNCTION public foo(integer, integer, text) merlin

pg_restore -P "foo(integer, integer, text)" postgres.dump
<function body follows>

it's fairly easy to wrap pg_restore with a smalls script that extracts
function bodies and writes them out to file names.  this is a great
and underused feature, so I'd argue that if you wanted to formalize
per object file extraction you should be looking at expanding
pg_restore, not pg_dump.

merlin


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Schema version management
Next
From: Joe Conway
Date:
Subject: Re: has_language_privilege returns incorrect answer for non-superuser