Re: Upgrading a database dump/restore - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: Upgrading a database dump/restore
Date
Msg-id 16560.24.91.171.78.1160409010.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: Upgrading a database dump/restore  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Upgrading a database dump/restore  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Upgrading a database dump/restore  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>>> Whenever someone actually writes a pg_upgrade, we'll institute a policy
>>> to restrict changes it can't handle.
>
>> IMHO, *before* any such tool *can* be written, a set of rules must be
>> enacted regulating catalog changes.
>
> That one is easy: there are no rules.  We already know how to deal with
> catalog restructurings --- you do the equivalent of a pg_dump -s and
> reload.  Any proposed pg_upgrade that can't cope with this will be
> rejected out of hand, because that technology was already proven five
> years ago.
>
> The issues that are actually interesting have to do with the contents
> of user tables and indexes, not catalogs.

It is becomming virtually impossible to recreate databases. Data storage
sizes are increasing faster than the transimssion speeds of the media on
which they are stored or the systems by which they are connected. The
world is looking at a terabyte as merely a "very large" database these
days. tens of terabytes are not far from being common.

Dumping out a database is bad enough, but that's only the data, and that
can takes (mostly) only hours. Recreating a large database with complex
indexes can take days or hours for the data, hours per index, it adds up.

No one could expect that this could happen by 8.2, or the release after
that, but as a direction for the project, the "directors" of the
PostgreSQL project must realize that the dump/restore is becomming like
the old locking vacuum problem. It is a *serious* issue for PostgreSQL
adoption and arguably a real design flaw.

If the barrier to upgrade it too high, people will not upgrade. If people
do not upgrade, then older versions will have to be supported longer or
users will have to be abandoned. If users are abandoned and there are
critical bugs in previous versions of PostgreSQL, then user who eventually
have to migrate their data, they will probably not use PostgreSQL in an
attempt to avoid repeating this situation.

While the economics of open source/ free software are different, there is
still a penalty for losing customers, and word of mouth is a dangerous
thing. Once or twice in the customers product usage history can you expect
to get away with this sort of inconvenience, but if every new major
version requres a HUGE process, then the TCO of PostgreSQL gets very high
indeed.

If it is a data format issue, maybe there should be a forum for a "next
gen" version of the current data layout that is extensible without
restructuring. This is not something that a couple people can go off and
do and submit a patch, it is something that has to be supported and
promoted from the core team, otherwise it won't happen. We all know that.

The question is whether or not you all think it is worth doing. I've done
consulting work for some very large companies that everyone has heard of.
These sorts of things matter.


pgsql-hackers by date:

Previous
From: Mark Cave-Ayland
Date:
Subject: Re: 8.2beta1 crash possibly in libpq
Next
From: "Jim C. Nasby"
Date:
Subject: Re: width_bucket function for timestamps