Thread: Why dump/restore to upgrade?
We all understand the reasons why one MUST dump and restore to upgrade to 7.2, but I'd like to make a general call to arms that this (or 7.3) should be the last release to require this. It doesn't look good. One should be able to upgrade in place, and even revert to an older version, if nessisary. What do you all think?
At 9:56 PM -0500 2/7/02, mlw wrote: >We all understand the reasons why one MUST dump and restore to upgrade to 7.2, >but I'd like to make a general call to arms that this (or 7.3) should be the >last release to require this. > >It doesn't look good. One should be able to upgrade in place, and even revert >to an older version, if nessisary. > >What do you all think? At the very least, wouldn't it be easy for pg to automate the existing process? Say, dump/restore using textfiles in /tmp, and maybe even spit out a backup copy of the old database as a big tarball? IMHO, most of the complaints about PostgreSQL revolve around initial setup and inital ease of use. Sure, none of those issuesmatter once you're used to the system, but they wouldn't be difficult to fix either. -pmb
-----Original Message----- From: mlw [mailto:markw@mohawksoft.com] Sent: Thursday, February 07, 2002 6:56 PM To: PostgreSQL-development Subject: [HACKERS] Why dump/restore to upgrade? We all understand the reasons why one MUST dump and restore to upgrade to 7.2, but I'd like to make a general call to arms that this (or 7.3) should be the last release to require this. It doesn't look good. One should be able to upgrade in place, and even revert to an older version, if nessisary. What do you all think? >>------------------------------------------------------------------- I think sometimes it might be good to change the internal structure. Rdb has made me do dump and load. SQL*Server has made me do dump and load. However, if there is a change to internal structure that requires such a transition, it is traditional to change the MAJOR version number since the change is traumatic. Something that can make the transition a lot less painful is to automatically output the SQL schema in a manner such that dependencies are not violated. (e.g. don't try to create a view before the table is defined, stuff like that). To try to say "The internal structure will never need to change again" will lead to either certain failure or stagnation because you can't update the internals. <<-------------------------------------------------------------------
mlw <markw@mohawksoft.com> writes: > but I'd like to make a general call to arms that this (or 7.3) should be the > last release to require this. We will never make such a commitment, at least not in the foreseeable future. I would like to see more attention paid to supporting cross-version upgrades via pg_upgrade (or some improved version thereof) when practical, which it should be more often than not. But to bind ourselves forever to the current on-disk format is sheer folly. And if you have to convert the datafile format then you might as well dump and reload. regards, tom lane
On Friday 08 February 2002 12:44 am, Tom Lane wrote: > mlw <markw@mohawksoft.com> writes: > > but I'd like to make a general call to arms that this (or 7.3) should be > > the last release to require this. > We will never make such a commitment, at least not in the foreseeable > future. Why? > But to bind > ourselves forever to the current on-disk format is sheer folly. Certainly true. But upgradability!=bound-to-the-same-format. > And if you have to convert the datafile format then you might as > well dump and reload. No. Dump - reload is folly. And it doesn't always work. And that's the problem. I've fought this problem a long time. Too long of a time. And it is a problem. Unfortunately, it is a problem that is going to require some deep thought and hackery. I believe it should be this simple for our users: 1.) Postmaster starts, finds old files. It's OK with that. 2.) A connection starts a postgres backend. When the backend starts, it sees the old format tree and adjusts to it as best it can -- if this means fewer features, well, it'll just have to get over it. Send a warning down the connection that it is in reduced functionality mode or some such. 3.) An SQL command could then be issued down the connection that would, in a transaction-safe manner, convert the data on the disk into the newest format. Until the magic bullet upgrade command is sent, the backend operates in a reduced functionality mode -- maybe even read-only if necessary. In this mode, a safer pg_dump can be executed -- how many times now have we told people to use a newer pg_dump to dump an old version database? Just having read-only access to the old data through the new backend would in reality be much better than the fiasco we have now -- then we can safely pg_dump the data, stop postmaster, initdb, start postmaster, and reload the data. If the conversion program is large enough to cause worry about backend bloat, then make it standalone and not let postmaster start up on old data -- pg_upgrade on steroids. No, this isn't a core feature. Yes, there are features that are better uses of developer time. Sure, there is a partially working pg_upgrade utility, for which I thank Bruce for weathering it out upon. BUT OUR UPGRADE PROCESS STINKS. Sorry for yelling. But this touches a raw nerve for me. My apologies if I have offended anyone -- PostgreSQL is just too good an RDBMS to suffer from this problem. The developers here have put too much hard work of high quality for anyone to disparage PostgreSQL due to the lack of good solid upgrading. And I'm not upset at any one person -- it's the program; the process; and the users that rely on our code which cause me to be this vehement on this subject. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
Tom Lane wrote: > > mlw <markw@mohawksoft.com> writes: > > but I'd like to make a general call to arms that this (or 7.3) should be the > > last release to require this. > > We will never make such a commitment, at least not in the foreseeable > future. Here's the problem. If you have a database that is in service, you can not upgrade postgres on that machine without taking it out of service for the duration of a backup/restore. A small database is not a big deal, a large database is a problem. A system could be out of service for hours. For a mission critical installation, this is really unacceptable. > > I would like to see more attention paid to supporting cross-version > upgrades via pg_upgrade (or some improved version thereof) when > practical, which it should be more often than not. But to bind > ourselves forever to the current on-disk format is sheer folly. > And if you have to convert the datafile format then you might as > well dump and reload. The backup/restore to upgrade will be a deal breaker for many installations. If you want more people using PostgreSQL, you need to accept that this is a very real problem, and one which should be addressed as an unacceptable behavior. I don't want to say "Other databases do it, why can't PostgreSQL" because that isn't the point. Databases can be HUGE, pg_dumpall can take an hour or more to run. Then, it takes longer to restore because indexes have to be recreated.
Hi everyone, mlw wrote: > > Tom Lane wrote: <snip> > For a mission critical installation, this is really unacceptable. > > > > > I would like to see more attention paid to supporting cross-version > > upgrades via pg_upgrade (or some improved version thereof) when > > practical, which it should be more often than not. But to bind > > ourselves forever to the current on-disk format is sheer folly. > > And if you have to convert the datafile format then you might as > > well dump and reload. > > The backup/restore to upgrade will be a deal breaker for many installations. If > you want more people using PostgreSQL, you need to accept that this is a very > real problem, and one which should be addressed as an unacceptable behavior. > > I don't want to say "Other databases do it, why can't PostgreSQL" because that > isn't the point. Databases can be HUGE, pg_dumpall can take an hour or more to > run. Then, it takes longer to restore because indexes have to be recreated. Here's a thought for a method which doesn't yet exist, but as we're about to start into the next version of PostgreSQL it might be worth considering. a) Do a pg_dump of the old-version database in question. It takes note of the latest transaction numbers in progress. b) On a seperate machine, start doing a restore of the data, into the new version database. c) Take the old-version database out-of-production, so no new transactions are done on it. d) Run a yet-to-be-created utility which then takes a look at the difference between the two, and updates the new-version database with the entries which have changed since the first snapshot. e) Put the new-version database into production, assuming the applications hitting it have already been tested for compatibility with the new version. You could skip step c) (taking the old database offline) and do instead the syncronisation step with it online if there has been a large timeframe of changes, and THEN (one the differences are minimal) take the old-version database offline and do another syncronisation for the remaining differences. However, I get the feeling a lot of this kind of thing is very similar to some of the approaches to replication already around or being developed. It might be simpler to make the old-version database a master replica, make the new-version database a slave replica of it, then once they're in sync cut over to the new system (again assuming the applications using it have been tested for compatibility with the new version). The theory sounds alright, but in practise it might not be that easy. We can live in hope however. :) Regards and best wishes, Justin Clift > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Fri, 8 Feb 2002, mlw wrote: > > We will never make such a commitment, at least not in the foreseeable > > future. > > Here's the problem. If you have a database that is in service, you can > not upgrade postgres on that machine without taking it out of service > for the duration of a backup/restore. A small database is not a big > deal, a large database is a problem. A system could be out of service > for hours. > > For a mission critical installation, this is really unacceptable. Cheap solution: replication. Bring up a slave with the new version. Do the initial sync. Wait for a quiet time, and switch over (possibly still replicating to the old one so you can revent the upgrade). The overhead would be fairly high, but it's a cheap way to get the required properties, and is easy to revert. Matthew.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I don't want to say "Other databases do it, why can't PostgreSQL" because > that isn't the point. Databases can be HUGE, pg_dumpall can take an hour or > more to run. Then, it takes longer to restore because indexes have to be > recreated. A way to put PostgreSQL into a read only mode could help this (as well as many other things.) That way they could at least let users have access to the data the entire time that the dump restore process is going. It could take a lot of disk space, but it would help the case where time is a bigger problem than space. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE8Y/oc8BXvT14W9HARAiGgAJ9ckgaq2YHupc/1Wl6RlRNJ5UulNwCeNWrt ZrVN4qo+L9nDckpIPQ80MYM= =1WXQ -----END PGP SIGNATURE-----
Lamar Owen <lamar.owen@wgcr.org> writes: > On Friday 08 February 2002 12:44 am, Tom Lane wrote: > > mlw <markw@mohawksoft.com> writes: > > > but I'd like to make a general call to arms that this (or 7.3) should be > > > the last release to require this. > > > We will never make such a commitment, at least not in the foreseeable > > future. > > Why? > > > But to bind > > ourselves forever to the current on-disk format is sheer folly. > > Certainly true. But upgradability!=bound-to-the-same-format. > > > And if you have to convert the datafile format then you might as > > well dump and reload. > > No. Dump - reload is folly. And it doesn't always work. And that's the > problem. Add some unicode into it for even more fun. Bah. -- Trond Eivind Glomsrød Red Hat, Inc.