Thread: Resurrecting pg_upgrade

Resurrecting pg_upgrade

From

Tom Lane

Date:

12 December 2003, 15:00:43

I am planning to solve the ancient problem of updating to a new major
version without dump/reload, by means of writing a new, more bulletproof
implementation of Bruce's old pg_upgrade script. Here are some design
notes --- please comment.

The upgrade scenario
--------------------

I envision the upgrade process as working like this:
1. Shut down old postmaster. 2. If planning to install over old software, move at least the old "postgres"
executableout of the way. 3. Install new-version Postgres software. 4. Run pg_upgrade (of new version). It will
requireaccess to the old postgres executable, as well as the new postgres and pg_dump executables. 5. (optional)
Renamedata directory --- see below. 6. Start new postmaster. 7. When satisfied that upgrade was successful, rm -rf old
$PGDATA.

pg_upgrade will build a new $PGDATA directory without changing the
contents of the old one in any way --- this policy ensures that you're
not screwed if the upgrade fails for some reason. A disadvantage
is that you need a place to put the new $PGDATA that's different from
the old. Of course you can manually rename the directories before or
after running pg_upgrade, but does anyone feel that's not acceptable?
We could make this simpler if we were willing to let pg_upgrade move
the old files out of the way, but that loses the "no damage to old
installation" property. (But I think Oliver was suggesting adopting
version-specific pathnames for PGDATA, in which case this is really
no issue anyway.)

What pg_upgrade itself does is basically:
1. Run "pg_dumpall --schema-only" (or a near equivalent thereof) to obtain the schema info for the old
installation.2. initdb the new installation. 3. Load the above script into the new installation. 4. Physically link
(hard-link)user table and index files into the new PGDATA directory, replacing the empty files created by loading
theschema. Since we use physical links, there is little growth of total disk space used, and yet the old PGDATA
directory remains unscathed.

There are some additional steps and special considerations, which are
elaborated on below, but that's the basic idea. Note that it should
be possible to upgrade even very large databases in a short time,
since the user table contents are not scanned.

In principle, since pg_dump can dump schemas from several versions back,
this scheme allows upgrades to jump across several versions, not just one.
The only hard restriction in the current plan is that the contents of user
table files cannot change. We could support changes in contents of
indexes (for example, 7.3 to 7.4 btree or hash index migration) by issuing
a REINDEX instead of moving the old index contents. However, this loses
the potential for pg_upgrade to be fast and not chew much additional disk
space. It'd still beat a dump/reload though.

Currently the no-table-contents-changes restriction keeps us from
upgrading from versions older than 7.4 anyway (since type NUMERIC had its
on-disk representation changed in 7.4). We could possibly upgrade 7.3
databases that contain no NUMERIC columns, if we take the hit of
rebuilding indexes. But is it worth bothering with?

Communication
-------------

I think it's important to be able to run pg_upgrade with the postmaster
shut down. Otherwise there is too much risk that some other user will
change the database while we are working. The original pg_upgrade script
left it to the DBA to ensure this wouldn't happen, but that seems like
a foot-gun of much too large caliber. Instead, all operations should be
done through a standalone backend. An additional advantage of doing it
this way is that a standalone backend is implicitly superuser, and so no
SQL permissions issues will get in the way.

This makes it difficult to use pg_dump, however, since pg_dump doesn't
know how to talk to a standalone backend.

I propose the following solution:

1. The backend should be modified so that a standalone backend is capable
of interacting with a client using the normal FE/BE protocol over
stdin/stdout (this would be selected with a command line switch to
override the existing standalone-backend behavior).

2. libpq should be extended so that one way to contact the database server
is to fork/exec a standalone backend and communicate with it using the
above facility. There are any number of ways we could design this
feature, but one convenient way is that a PGHOST value starting with a
pipe symbol "|" could be taken to mean doing this, with the rest of the
string providing the postgres executable's name and possibly command-line
options. libpq would tack on the switch needed for FE/BE protocol and
the database name to connect to, and exec() the result.

This would allow both pg_dump and psql to be used with standalone
backends. There are probably other uses for such a feature besides
pg_upgrade's need.

Does this approach create any problems for the Windows port?
I'm assuming we can set up pipes in both directions between
a parent and child process --- is that available in Windows?

Gotchas
-------

To do the physical relinking of table files, pg_upgrade will need to
discover the mapping from old installation's DB OIDs and relfilenode
numbers to new ones. This mapping will get even more complex when
tablespaces enter the picture; but it seems very doable. Beware of
old files that are symlinks to someplace else; need to replicate the
symlink if so. What should happen if hard link fails because of
cross-file-system reference (most likely because a whole database
subdir was symlinked)? May be okay to punt, or make a symlink, but
the latter raises a risk that rm -rf OLDPGDATA will break things.

Do we have any supported platforms that don't implement hard-links
(perhaps Windows??)

WAL: need to do pg_resetxlog in new installation to bring WAL end up to or
beyond end of old WAL. This ensures LSN values in user table page headers
will be considered valid. Do not need to worry about copying old XLOG,
fortunately.

pg_clog: need to copy old installation's clog (and hence, its XID counter)
so that xact status in existing user tables will be valid. This means
doing the resetxlog and clog copy immediately after initdb's VACUUM
FREEZE, before we start loading the old schema. Note we do NOT need
to vacuum the copied-over tables.

OIDs: it's not essential, but likely a good idea to make the OID counter
match the old installation's too.

pg_control: make sure all fields of new pg_control match old, e.g. check
that compile-time options are the same. This will require knowledge of
past and present pg_control layouts, but that seems tolerable.

Large objects: not a problem, just migrate pg_largeobject contents as
though it were a user table. Since LOs will thereby retain their old
OIDs, we don't need to worry about fixing references to them from user
tables.

TOAST tables need to retain the same OID they had before, since this OID
is recorded in TOAST external references, which we don't want to update.
It should not be impossible to adjust their OIDs after loading the schema
definition, but it surely has potential to be tedious. (NB: without
UPDATE SET oid = val, it won't be possible to do this via a SQL command;
might be better to attack that than develop a low-level hack. Are there
any other cases where pg_upgrade needs to massage the new installation at
levels below SQL? User table contents upgrade is one, but we're not gonna
support that in the short term anyway.) What do we do if desired OID
conflicts with some existing table? Renumbering a table in the general
case is theoretically possible, but there are far more things to worry
about than for TOAST tables (eg, references in stored rules...) Best
solution might be to examine all the TOAST table OIDs before deciding
where to set the new installation's OID counter. Also possibly detect
case where old TOAST table is empty, and don't bother migrating if so,
thus reducing number of cases where collision can occur.

Note: we do not change the *name* of a toast table, since that's derived
from its parent table's OID not its own. No need to touch the toast
table's index, either. We do have to update reltoastrelid in parent.
Also note mapping from old DB to new can't rely on table names for
matching TOAST tables; have to drive it off the parent tables.

TOAST valueid OIDs are not a problem, since we'll migrate the toast-table
contents in toto.

What about migrating config files? In particular, changes in names
or semantics of pre-existing config vars seem like a big issue.
First cut: just copy the files.
Second cut: extract non-default values from old file, insert into
new file (so as to preserve comments about variables that didn't
exist in old version).
We could imagine adding smarts about specific variable names here,
if particular variables change in ways that we can deal with specially.

Future config-file changes such as changing the syntax of pg_hba.conf would
need to come with update code that can reformat the old file; or perhaps in
some cases we'd have to punt and leave it to manual updating.

Thoughts?
regards, tom lane