Re: Version Control? - Mailing list pgsql-general
From | Russ Brown |
---|---|
Subject | Re: Version Control? |
Date | |
Msg-id | 42A9DD1B.2080202@gmail.com Whole thread Raw |
In response to | Re: Version Control? (John DeSoi <desoi@pgedit.com>) |
Responses |
Re: Version Control?
|
List | pgsql-general |
John DeSoi wrote: > > On Jun 10, 2005, at 11:38 AM, Peter Fein wrote: > >> >>> This would obviously have to be pretty damn clever. Amongs the >>> difficulties would be ensuring that the patch applies changes in the >>> correct order (e.g. add column before adding foreign key). It's hard, >>> but I don't believe it's impossible. >> >> >> As an uninformed, off-the-wall idea, could one compare snapshots of the >> system tables to generate these diffs? I know next-to-nothing about >> these, but it seems like they'd contain the info you'd need. > > > I think it would be a requirement; I'm not convinced it could be done > just from comparing table definitions. For example, using the table > definitions only, how could you distinguish between renaming the last > column in the table versus dropping the column and adding a new one of > the same type. The target table would be the same, but the semantics of > getting there are not. > That's an excellent point that I certainly hadn't thought of. Indeed, if you look at the output from the general text-based diff programs, they only output additions or removals. If you change a line it gets marked as two separate changes: a removal and an addition. My ideal is still to be able to create a 'diff' between any two arbitrary revisions of a schema. Therefore, the schema must only contain the state of the database, and not transitional information (since the transition is undefined before the 'diff' takes place). There would need to be some other way to identify the two columns as being the same historically. This could be something in the comment (though I don't like the idea of enforcing metadata in comments) or using some technique such as an OID (though I can't remember if things like columns have OIDs, and it's not particularly portable either). Perhaps the problem is best served by two files: one containing the schema snapshot itself, and the other containing a 'manifest' which lists each entity in the schema along with a unique identifier for each entity. Then you have the problem of how you assign each identifier, since many people may be working on the same schema on different branches. Just picking the next number wouldn't work as you'd get clashes. It would have to be some sort of GUID. An added benefit of this 'manifest' file is that it would be a good place to store additional meta-data on things like whether a given table's data should be dumped or not. (thinks for a while...) Now that I think about it, why stick with the schema dump at all? We could (for example) define a generic XML schema on which the manifest is based which describes everything, from the structure, meta data and the actual required data itself to a list of tables that should be ignored completely. You'd have to provide a script to update the file from a database, and when it encounters something like a column rename it could ask you if you actually did drop and re-add the column or if it was a rename. Depending on the answer you give the column either gets a new GUID or keeps the old one. So taking that example, you do your commit and somebody updates their working copy and sees that the database file has changed. They run the diff command to see what they need to do to update their version and it outputs the DROP COLUMN/ADD COLUMN or RENAME statement that they need, along with a couple of configuration inserts that you did too. This certainly is a far more complex problem than I originally thought it was. I'd like any solution to be able to work in any version control system, and to be applicable to any database engine (my employer is a MySQL house, while I personally prefer Postgres for my personal stuff, so I'm in favour of it being completely portable). Another important design criteria for me would be ease of use. This is a version-control stage on top of the existing version control work that a developer needs to do, and if it's a real hassle they aren't going to bother to do it properly. It therefore needs to be able to do things as automatically as possible, for example by connecting to a database and generating the manifest file directly. I also think a requirement would have to be that a database would need a dedicated table (or view) to store its current version number. The manifest file could specify where to find this. It would be required to know which revision you need to 'diff' against to upgrade the database to the latest version. For extra brownie points the diff command could output the update statement to set the version to the new version automatically, so you don't forget to do it. As I say it's a hard problem, but I certainly don't think it's impossible. -- Russ.
pgsql-general by date: