Re: Upgrading a database dump/restore - Mailing list pgsql-hackers
From | Mark Woodward |
---|---|
Subject | Re: Upgrading a database dump/restore |
Date | |
Msg-id | 16560.24.91.171.78.1160409010.squirrel@mail.mohawksoft.com Whole thread Raw |
In response to | Re: Upgrading a database dump/restore (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Upgrading a database dump/restore
Re: Upgrading a database dump/restore |
List | pgsql-hackers |
> "Mark Woodward" <pgsql@mohawksoft.com> writes: >>> Whenever someone actually writes a pg_upgrade, we'll institute a policy >>> to restrict changes it can't handle. > >> IMHO, *before* any such tool *can* be written, a set of rules must be >> enacted regulating catalog changes. > > That one is easy: there are no rules. We already know how to deal with > catalog restructurings --- you do the equivalent of a pg_dump -s and > reload. Any proposed pg_upgrade that can't cope with this will be > rejected out of hand, because that technology was already proven five > years ago. > > The issues that are actually interesting have to do with the contents > of user tables and indexes, not catalogs. It is becomming virtually impossible to recreate databases. Data storage sizes are increasing faster than the transimssion speeds of the media on which they are stored or the systems by which they are connected. The world is looking at a terabyte as merely a "very large" database these days. tens of terabytes are not far from being common. Dumping out a database is bad enough, but that's only the data, and that can takes (mostly) only hours. Recreating a large database with complex indexes can take days or hours for the data, hours per index, it adds up. No one could expect that this could happen by 8.2, or the release after that, but as a direction for the project, the "directors" of the PostgreSQL project must realize that the dump/restore is becomming like the old locking vacuum problem. It is a *serious* issue for PostgreSQL adoption and arguably a real design flaw. If the barrier to upgrade it too high, people will not upgrade. If people do not upgrade, then older versions will have to be supported longer or users will have to be abandoned. If users are abandoned and there are critical bugs in previous versions of PostgreSQL, then user who eventually have to migrate their data, they will probably not use PostgreSQL in an attempt to avoid repeating this situation. While the economics of open source/ free software are different, there is still a penalty for losing customers, and word of mouth is a dangerous thing. Once or twice in the customers product usage history can you expect to get away with this sort of inconvenience, but if every new major version requres a HUGE process, then the TCO of PostgreSQL gets very high indeed. If it is a data format issue, maybe there should be a forum for a "next gen" version of the current data layout that is extensible without restructuring. This is not something that a couple people can go off and do and submit a patch, it is something that has to be supported and promoted from the core team, otherwise it won't happen. We all know that. The question is whether or not you all think it is worth doing. I've done consulting work for some very large companies that everyone has heard of. These sorts of things matter.
pgsql-hackers by date: