Thread: Why dump/restore to upgrade?

Why dump/restore to upgrade?

From
mlw
Date:
We all understand the reasons why one MUST dump and restore to upgrade to 7.2,
but I'd like to make a general call to arms that this (or 7.3) should be the
last release to require this.

It doesn't look good. One should be able to upgrade in place, and even revert
to an older version, if nessisary.

What do you all think?


Re: Why dump/restore to upgrade?

From
Peter Bierman
Date:
At 9:56 PM -0500 2/7/02, mlw wrote:
>We all understand the reasons why one MUST dump and restore to upgrade to 7.2,
>but I'd like to make a general call to arms that this (or 7.3) should be the
>last release to require this.
>
>It doesn't look good. One should be able to upgrade in place, and even revert
>to an older version, if nessisary.
>
>What do you all think?


At the very least, wouldn't it be easy for pg to automate the existing process?
Say, dump/restore using textfiles in /tmp, and maybe even spit out a backup copy of the old database as a big tarball?

IMHO, most of the complaints about PostgreSQL revolve around initial setup and inital ease of use. Sure, none of those
issuesmatter once you're used to the system, but they wouldn't be difficult to fix either.
 

-pmb




Re: Why dump/restore to upgrade?

From
"Dann Corbit"
Date:
-----Original Message-----
From: mlw [mailto:markw@mohawksoft.com]
Sent: Thursday, February 07, 2002 6:56 PM
To: PostgreSQL-development
Subject: [HACKERS] Why dump/restore to upgrade?


We all understand the reasons why one MUST dump and restore to upgrade
to 7.2,
but I'd like to make a general call to arms that this (or 7.3) should be
the
last release to require this.

It doesn't look good. One should be able to upgrade in place, and even
revert
to an older version, if nessisary.

What do you all think?
>>-------------------------------------------------------------------
I think sometimes it might be good to change the internal structure.

Rdb has made me do dump and load.
SQL*Server has made me do dump and load.

However, if there is a change to internal structure that requires such a
transition, it is traditional to change the MAJOR version number since
the change is traumatic.

Something that can make the transition a lot less painful is to
automatically output the SQL schema in a manner such that dependencies
are not violated.  (e.g. don't try to create a view before the table is
defined, stuff like that).

To try to say "The internal structure will never need to change again"
will lead to either certain failure or stagnation because you can't 
update the internals.
<<-------------------------------------------------------------------


Re: Why dump/restore to upgrade?

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
> but I'd like to make a general call to arms that this (or 7.3) should be the
> last release to require this.

We will never make such a commitment, at least not in the foreseeable
future.

I would like to see more attention paid to supporting cross-version
upgrades via pg_upgrade (or some improved version thereof) when
practical, which it should be more often than not.  But to bind
ourselves forever to the current on-disk format is sheer folly.
And if you have to convert the datafile format then you might as
well dump and reload.
        regards, tom lane


Re: Why dump/restore to upgrade?

From
Lamar Owen
Date:
On Friday 08 February 2002 12:44 am, Tom Lane wrote:
> mlw <markw@mohawksoft.com> writes:
> > but I'd like to make a general call to arms that this (or 7.3) should be
> > the last release to require this.

> We will never make such a commitment, at least not in the foreseeable
> future.

Why?

> But to bind
> ourselves forever to the current on-disk format is sheer folly.

Certainly true.  But upgradability!=bound-to-the-same-format.

> And if you have to convert the datafile format then you might as
> well dump and reload.

No.  Dump - reload is folly.  And it doesn't always work.  And that's the 
problem.

I've fought this problem a long time.  Too long of a time.  And it is a 
problem.  Unfortunately, it is a problem that is going to require some deep 
thought and hackery.

I believe it should be this simple for our users:

1.) Postmaster starts, finds old files.  It's OK with that.

2.) A connection starts a postgres backend.  When the backend starts, it sees 
the old format tree and adjusts to it as best it can -- if this means fewer 
features, well, it'll just have to get over it.  Send a warning down the 
connection that it is in reduced functionality mode or some such.

3.) An SQL command could then be issued down the connection that would, in a 
transaction-safe manner, convert the data on the disk into the newest format. 
Until the magic bullet upgrade command is sent, the backend operates in a 
reduced functionality mode -- maybe even read-only if necessary. 

In this mode, a safer pg_dump can be executed -- how many times now have we 
told people to use a newer pg_dump to dump an old version database?  Just 
having read-only access to the old data through the new backend would in 
reality be much better than the fiasco we have now -- then we can safely 
pg_dump the data, stop postmaster, initdb, start postmaster, and reload the 
data.

If the conversion program is large enough to cause worry about backend bloat, 
then make it standalone and not let postmaster start up on old data -- 
pg_upgrade on steroids.

No, this isn't a core feature.  Yes, there are features that are better uses 
of developer time.  Sure, there is a partially working pg_upgrade utility, 
for which I thank Bruce for weathering it out upon. 

BUT OUR UPGRADE PROCESS STINKS.  

Sorry for yelling.  But this touches a raw nerve for me.  My apologies if I 
have offended anyone -- PostgreSQL is just too good an RDBMS to suffer from 
this problem.  The developers here have put too much hard work of high 
quality for anyone to disparage PostgreSQL due to the lack of good solid 
upgrading.  And I'm not upset at any one person -- it's the program; the 
process; and the users that rely on our code which cause me to be this 
vehement on this subject.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: Why dump/restore to upgrade?

From
mlw
Date:
Tom Lane wrote:
> 
> mlw <markw@mohawksoft.com> writes:
> > but I'd like to make a general call to arms that this (or 7.3) should be the
> > last release to require this.
> 
> We will never make such a commitment, at least not in the foreseeable
> future.

Here's the problem. If you have a database that is in service, you can not
upgrade postgres on that machine without taking it out of service for the
duration of a backup/restore. A small database is not a big deal, a large
database is a problem. A system could be out of service for hours.

For a mission critical installation, this is really unacceptable.

> 
> I would like to see more attention paid to supporting cross-version
> upgrades via pg_upgrade (or some improved version thereof) when
> practical, which it should be more often than not.  But to bind
> ourselves forever to the current on-disk format is sheer folly.
> And if you have to convert the datafile format then you might as
> well dump and reload.

The backup/restore to upgrade will be a deal breaker for many installations. If
you want more people using PostgreSQL, you need to accept that this is a very
real problem, and one which should be addressed as an unacceptable behavior.

I don't want to say "Other databases do it, why can't PostgreSQL" because that
isn't the point. Databases can be HUGE, pg_dumpall can take an hour or more to
run. Then, it takes longer to restore because indexes have to be recreated.


Re: Why dump/restore to upgrade?

From
Justin Clift
Date:
Hi everyone,

mlw wrote:
> 
> Tom Lane wrote:
<snip> 
> For a mission critical installation, this is really unacceptable.
> 
> >
> > I would like to see more attention paid to supporting cross-version
> > upgrades via pg_upgrade (or some improved version thereof) when
> > practical, which it should be more often than not.  But to bind
> > ourselves forever to the current on-disk format is sheer folly.
> > And if you have to convert the datafile format then you might as
> > well dump and reload.
> 
> The backup/restore to upgrade will be a deal breaker for many installations. If
> you want more people using PostgreSQL, you need to accept that this is a very
> real problem, and one which should be addressed as an unacceptable behavior.
> 
> I don't want to say "Other databases do it, why can't PostgreSQL" because that
> isn't the point. Databases can be HUGE, pg_dumpall can take an hour or more to
> run. Then, it takes longer to restore because indexes have to be recreated.

Here's a thought for a method which doesn't yet exist, but as we're
about to start into the next version of PostgreSQL it might be worth
considering.

a) Do a pg_dump of the old-version database in question.  It takes note
of the latest transaction numbers in progress.

b) On a seperate machine, start doing a restore of the data, into the
new version database.

c) Take the old-version database out-of-production, so no new
transactions are done on it.

d) Run a yet-to-be-created utility which then takes a look at the
difference between the two, and updates the new-version database with
the entries which have changed since the first snapshot.

e) Put the new-version database into production, assuming the
applications hitting it have already been tested for compatibility with
the new version.

You could skip step c) (taking the old database offline) and do instead
the syncronisation step with it online if there has been a large
timeframe of changes, and THEN (one the differences are minimal) take
the old-version database offline and do another syncronisation for the
remaining differences.

However, I get the feeling a lot of this kind of thing is very similar
to some of the approaches to replication already around or being
developed.

It might be simpler to make the old-version database a master replica,
make the new-version database a slave replica of it, then once they're
in sync cut over to the new system (again assuming the applications
using it have been tested for compatibility with the new version).

The theory sounds alright, but in practise it might not be that easy. 
We can live in hope however.  :)

Regards and best wishes,

Justin Clift


> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: Why dump/restore to upgrade?

From
Matthew Kirkwood
Date:
On Fri, 8 Feb 2002, mlw wrote:

> > We will never make such a commitment, at least not in the foreseeable
> > future.
>
> Here's the problem. If you have a database that is in service, you can
> not upgrade postgres on that machine without taking it out of service
> for the duration of a backup/restore. A small database is not a big
> deal, a large database is a problem. A system could be out of service
> for hours.
>
> For a mission critical installation, this is really unacceptable.

Cheap solution: replication.  Bring up a slave with the
new version.  Do the initial sync.  Wait for a quiet
time, and switch over (possibly still replicating to
the old one so you can revent the upgrade).

The overhead would be fairly high, but it's a cheap way
to get the required properties, and is easy to revert.

Matthew.



Re: Why dump/restore to upgrade?

From
"Matthew T. O'Connor"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I don't want to say "Other databases do it, why can't PostgreSQL" because
> that isn't the point. Databases can be HUGE, pg_dumpall can take an hour or
> more to run. Then, it takes longer to restore because indexes have to be
> recreated.

A way to put PostgreSQL into a read only mode could help this (as well as 
many other things.)  That way they could at least let users have access to 
the data the entire time that the dump restore process is going.  It could 
take a lot of disk space, but it would help the case where time is a bigger 
problem than space.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8Y/oc8BXvT14W9HARAiGgAJ9ckgaq2YHupc/1Wl6RlRNJ5UulNwCeNWrt
ZrVN4qo+L9nDckpIPQ80MYM=
=1WXQ
-----END PGP SIGNATURE-----


Re: Why dump/restore to upgrade?

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
Lamar Owen <lamar.owen@wgcr.org> writes:

> On Friday 08 February 2002 12:44 am, Tom Lane wrote:
> > mlw <markw@mohawksoft.com> writes:
> > > but I'd like to make a general call to arms that this (or 7.3) should be
> > > the last release to require this.
> 
> > We will never make such a commitment, at least not in the foreseeable
> > future.
> 
> Why?
> 
> > But to bind
> > ourselves forever to the current on-disk format is sheer folly.
> 
> Certainly true.  But upgradability!=bound-to-the-same-format.
> 
> > And if you have to convert the datafile format then you might as
> > well dump and reload.
> 
> No.  Dump - reload is folly.  And it doesn't always work.  And that's the 
> problem.

Add some unicode into it for even more fun. Bah.

-- 
Trond Eivind Glomsrød
Red Hat, Inc.