Resurrecting pg_upgrade - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Resurrecting pg_upgrade |
Date | |
Msg-id | 21176.1071255609@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Resurrecting pg_upgrade
Re: Resurrecting pg_upgrade Re: Resurrecting pg_upgrade |
List | pgsql-hackers |
I am planning to solve the ancient problem of updating to a new major version without dump/reload, by means of writing a new, more bulletproof implementation of Bruce's old pg_upgrade script. Here are some design notes --- please comment. The upgrade scenario -------------------- I envision the upgrade process as working like this: 1. Shut down old postmaster. 2. If planning to install over old software, move at least the old "postgres" executableout of the way. 3. Install new-version Postgres software. 4. Run pg_upgrade (of new version). It will requireaccess to the old postgres executable, as well as the new postgres and pg_dump executables. 5. (optional) Renamedata directory --- see below. 6. Start new postmaster. 7. When satisfied that upgrade was successful, rm -rf old $PGDATA. pg_upgrade will build a new $PGDATA directory without changing the contents of the old one in any way --- this policy ensures that you're not screwed if the upgrade fails for some reason. A disadvantage is that you need a place to put the new $PGDATA that's different from the old. Of course you can manually rename the directories before or after running pg_upgrade, but does anyone feel that's not acceptable? We could make this simpler if we were willing to let pg_upgrade move the old files out of the way, but that loses the "no damage to old installation" property. (But I think Oliver was suggesting adopting version-specific pathnames for PGDATA, in which case this is really no issue anyway.) What pg_upgrade itself does is basically: 1. Run "pg_dumpall --schema-only" (or a near equivalent thereof) to obtain the schema info for the old installation.2. initdb the new installation. 3. Load the above script into the new installation. 4. Physically link (hard-link)user table and index files into the new PGDATA directory, replacing the empty files created by loading theschema. Since we use physical links, there is little growth of total disk space used, and yet the old PGDATA directory remains unscathed. There are some additional steps and special considerations, which are elaborated on below, but that's the basic idea. Note that it should be possible to upgrade even very large databases in a short time, since the user table contents are not scanned. In principle, since pg_dump can dump schemas from several versions back, this scheme allows upgrades to jump across several versions, not just one. The only hard restriction in the current plan is that the contents of user table files cannot change. We could support changes in contents of indexes (for example, 7.3 to 7.4 btree or hash index migration) by issuing a REINDEX instead of moving the old index contents. However, this loses the potential for pg_upgrade to be fast and not chew much additional disk space. It'd still beat a dump/reload though. Currently the no-table-contents-changes restriction keeps us from upgrading from versions older than 7.4 anyway (since type NUMERIC had its on-disk representation changed in 7.4). We could possibly upgrade 7.3 databases that contain no NUMERIC columns, if we take the hit of rebuilding indexes. But is it worth bothering with? Communication ------------- I think it's important to be able to run pg_upgrade with the postmaster shut down. Otherwise there is too much risk that some other user will change the database while we are working. The original pg_upgrade script left it to the DBA to ensure this wouldn't happen, but that seems like a foot-gun of much too large caliber. Instead, all operations should be done through a standalone backend. An additional advantage of doing it this way is that a standalone backend is implicitly superuser, and so no SQL permissions issues will get in the way. This makes it difficult to use pg_dump, however, since pg_dump doesn't know how to talk to a standalone backend. I propose the following solution: 1. The backend should be modified so that a standalone backend is capable of interacting with a client using the normal FE/BE protocol over stdin/stdout (this would be selected with a command line switch to override the existing standalone-backend behavior). 2. libpq should be extended so that one way to contact the database server is to fork/exec a standalone backend and communicate with it using the above facility. There are any number of ways we could design this feature, but one convenient way is that a PGHOST value starting with a pipe symbol "|" could be taken to mean doing this, with the rest of the string providing the postgres executable's name and possibly command-line options. libpq would tack on the switch needed for FE/BE protocol and the database name to connect to, and exec() the result. This would allow both pg_dump and psql to be used with standalone backends. There are probably other uses for such a feature besides pg_upgrade's need. Does this approach create any problems for the Windows port? I'm assuming we can set up pipes in both directions between a parent and child process --- is that available in Windows? Gotchas ------- To do the physical relinking of table files, pg_upgrade will need to discover the mapping from old installation's DB OIDs and relfilenode numbers to new ones. This mapping will get even more complex when tablespaces enter the picture; but it seems very doable. Beware of old files that are symlinks to someplace else; need to replicate the symlink if so. What should happen if hard link fails because of cross-file-system reference (most likely because a whole database subdir was symlinked)? May be okay to punt, or make a symlink, but the latter raises a risk that rm -rf OLDPGDATA will break things. Do we have any supported platforms that don't implement hard-links (perhaps Windows??) WAL: need to do pg_resetxlog in new installation to bring WAL end up to or beyond end of old WAL. This ensures LSN values in user table page headers will be considered valid. Do not need to worry about copying old XLOG, fortunately. pg_clog: need to copy old installation's clog (and hence, its XID counter) so that xact status in existing user tables will be valid. This means doing the resetxlog and clog copy immediately after initdb's VACUUM FREEZE, before we start loading the old schema. Note we do NOT need to vacuum the copied-over tables. OIDs: it's not essential, but likely a good idea to make the OID counter match the old installation's too. pg_control: make sure all fields of new pg_control match old, e.g. check that compile-time options are the same. This will require knowledge of past and present pg_control layouts, but that seems tolerable. Large objects: not a problem, just migrate pg_largeobject contents as though it were a user table. Since LOs will thereby retain their old OIDs, we don't need to worry about fixing references to them from user tables. TOAST tables need to retain the same OID they had before, since this OID is recorded in TOAST external references, which we don't want to update. It should not be impossible to adjust their OIDs after loading the schema definition, but it surely has potential to be tedious. (NB: without UPDATE SET oid = val, it won't be possible to do this via a SQL command; might be better to attack that than develop a low-level hack. Are there any other cases where pg_upgrade needs to massage the new installation at levels below SQL? User table contents upgrade is one, but we're not gonna support that in the short term anyway.) What do we do if desired OID conflicts with some existing table? Renumbering a table in the general case is theoretically possible, but there are far more things to worry about than for TOAST tables (eg, references in stored rules...) Best solution might be to examine all the TOAST table OIDs before deciding where to set the new installation's OID counter. Also possibly detect case where old TOAST table is empty, and don't bother migrating if so, thus reducing number of cases where collision can occur. Note: we do not change the *name* of a toast table, since that's derived from its parent table's OID not its own. No need to touch the toast table's index, either. We do have to update reltoastrelid in parent. Also note mapping from old DB to new can't rely on table names for matching TOAST tables; have to drive it off the parent tables. TOAST valueid OIDs are not a problem, since we'll migrate the toast-table contents in toto. What about migrating config files? In particular, changes in names or semantics of pre-existing config vars seem like a big issue. First cut: just copy the files. Second cut: extract non-default values from old file, insert into new file (so as to preserve comments about variables that didn't exist in old version). We could imagine adding smarts about specific variable names here, if particular variables change in ways that we can deal with specially. Future config-file changes such as changing the syntax of pg_hba.conf would need to come with update code that can reformat the old file; or perhaps in some cases we'd have to punt and leave it to manual updating. Thoughts? regards, tom lane
pgsql-hackers by date: