[Fwd: pg_migrator: in-place upgrade tool at pgFoundry] - Mailing list pgsql-hackers

From
Subject [Fwd: pg_migrator: in-place upgrade tool at pgFoundry]
Date
Msg-id 1162396732.7998.142.camel@sakai.localdomain
Whole thread Raw
List pgsql-hackers
<br /><font color="#000000">EnterpriseDB has created a new project at pgFoundry - <a
href="http://pgfoundry.org/projects/pg-migrator/">http://pgfoundry.org/projects/pg-migrator/</a></font><br/><br /><font
color="#000000">pg_migratoris a tool that can in-place upgrade existing data without the usual dump/reload
cycle.</font><br/><br /><font color="#000000">The pg_migrator project site </font>(<font color="#000000">at
pgFoundry</font>)<fontcolor="#000000"> contains a complete implementation of the functionality described below as well
asa copy of the introductory document that I've included in this message.</font><br /><br /><font
color="#000000">We</font>would <font color="#000000">w</font>elcome<font color="#000000"> feedback on implementation
detailsand ideas for improvements.  </font><br /><br />         -- Korry<br /><br /><table cellpadding="0"
cellspacing="0"width="100%"><tr><td><br /><br /> --<br />   Korry Douglas    <a
href="mailto:korryd@enterprisedb.com">korryd@enterprisedb.com</a><br/>   EnterpriseDB      <a
href="http://www.enterprisedb.com">http://www.enterprisedb.com</a></td></tr></table><pre>
<font
color="#000000">=====================================================================================================</font>

<font color="#000000">------------------------------------------------------------------------------</font>
<font color="#000000">PG_MIGRATOR: IN-PLACE UPGRADES FOR POSTGRESQL </font>
<font color="#000000">------------------------------------------------------------------------------</font>

<font color="#000000">Upgrading a PostgreSQL database from one release to another can be an</font>
<font color="#000000">expensive process. For minor upgrades, you can simply install new executables</font>
<font color="#000000">and forget about upgrading existing data. But for major upgrades, you have to</font>
<font color="#000000">export all of your data (using pg_dump), install the new release, run initdb</font>
<font color="#000000">to create a new cluster, and then import your old data. If you have a lot of</font>
<font color="#000000">data, that can take a considerable amount of time (hours?, days?). If you have</font>
<font color="#000000">too much data, you may have to buy more storage since you need enough room to</font>
<font color="#000000">hold the original data plus the exported data.</font>

<font color="#000000">EnterpriseDB is contributing a new tool, pg_migrator, that can reduce the</font>
<font color="#000000">amount of time (and disk space) required for many upgrades.</font>

<font color="#000000">------------------------------------------------------------------------------</font>
<font color="#000000">WHAT IT DOES</font>
<font color="#000000">------------------------------------------------------------------------------</font>

<font color="#000000">PG_migrator is a tool (not a complete solution) that performs an in-place</font>
<font color="#000000">upgrade of existing data. For many upgrades, the data stored in user-defined</font>
<font color="#000000">tables does not have to change when moving from one version of PostgreSQL to</font>
<font color="#000000">another. Some upgrades require changes in the on-disk representation of data;</font>
<font color="#000000">pg_migrator cannot help in those upgrades. However, many upgrades require no</font>
<font color="#000000">changes to the on-disk representation of a user-defined table and, in those</font>
<font color="#000000">cases, pg_migrator will move existing user-defined tables from the old</font>
<font color="#000000">database cluster into the new cluster.</font>

<font color="#000000">There are two factors that determine whether an in-place upgrade is practical.</font>

<font color="#000000">Every table in a cluster (actually, every table created by a given version)</font>
<font color="#000000">shares the same infrastructure layout. By infrastructure, we mean the on-disk</font>
<font color="#000000">representation of the table headers and trailers and the on-disk</font>
<font color="#000000">representation of tuple headers. If the infrastructure changes between the old</font>
<font color="#000000">version of PostgreSQL and the new version, pg_migrator cannot move existing</font>
<font color="#000000">tables to the new cluster (you'll have to pg_dump the old data and then import</font>
<font color="#000000">that data into the new cluster).</font>

<font color="#000000">Occasionally, a PostgreSQL release introduces a change to the on-disk</font>
<font color="#000000">representation of a data type. For example, PostgreSQL version 8.2 changes the</font>
<font color="#000000">layout for values of type INET and CIDR. If you are not storing any values of</font>
<font color="#000000">type INET (or CIDR), pg_migrator can upgrade any table in your cluster. If you</font>
<font color="#000000">are storing values of type INET (or CIDR) in some tables, you must</font>
<font color="#000000">export/import those tables, but pg_migrator can in-place upgrade other tables</font>
<font color="#000000">(a change in infrastructure means that you have to export/import every table).</font>

<font color="#000000">If a new version of PostgreSQL does not change the infrastructure layout and</font>
<font color="#000000">does not change the on-disk representation of a data type (that you are</font>
<font color="#000000">using), you can pg_migrator to save a tremendous amount of time (and disk</font>
<font color="#000000">space).</font>

<font color="#000000">------------------------------------------------------------------------------</font>
<font color="#000000">HOW IT WORKS</font>
<font color="#000000">------------------------------------------------------------------------------</font>

<font color="#000000">To use pg_migrator during an upgrade, you start by installing a fresh cluster</font>
<font color="#000000">(using the newest version) in a new directory. When you've finished installing</font>
<font color="#000000">the new version, the new cluster will contain the new executables (postmaster,</font>
<font color="#000000">pg_dump, ...) and the usual template0, template1, and postgresql databases,</font>
<font color="#000000">but no user-defined tables. At this point, you can shutdown the new postmaster</font>
<font color="#000000">(we presume that you shutdown the old postmaster prior to creating the new</font>
<font color="#000000">cluster) and invoke pg_migrator.</font>

<font color="#000000">When pg_migrator starts, it runs through a verification process that ensures</font>
<font color="#000000">that all required executables (the old postmaster, the new postmaster,</font>
<font color="#000000">pg_dump, pg_resetxlog, ...) are present and contain the expected version</font>
<font color="#000000">numbers. The verification process also checks the old and new $PGDATA</font>
<font color="#000000">directories to ensure that the expected files and subdirectories (base,</font>
<font color="#000000">global, pg_clog, pg_xlog, ...) are in place.  If the verification process</font>
<font color="#000000">succeeds, pg_migrator starts the old postmaster and runs pg_dumpall</font>
<font color="#000000">--schema-only to capture the metadata contained in the old cluster. The script</font>
<font color="#000000">produced by pg_dumpall will be used in a later step to recreate the following</font>
<font color="#000000">user-defined objects in the new cluster:</font>

<font color="#000000">             databases</font>
<font color="#000000">             roles (users)</font>
<font color="#000000">             privileges (grants)</font>
<font color="#000000">             schemas</font>
<font color="#000000">             tables</font>
<font color="#000000">             indexes</font>
<font color="#000000">             views</font>
<font color="#000000">             packages (EnterpriseDB clusters only)</font>
<font color="#000000">             synonyms (EnterpriseDB clusters only)</font>
<font color="#000000">             functions</font>
<font color="#000000">             procedures (EnterpriseDB clusters only)</font>
<font color="#000000">             data types</font>
<font color="#000000">             procedural languages</font>
<font color="#000000">             aggregate functions</font>
<font color="#000000">             operators</font>
<font color="#000000">             operator classes</font>
<font color="#000000">             encoding conversions </font>
<font color="#000000">             inheritance relationships</font>
<font color="#000000">             rewrite rules</font>
<font color="#000000">             type casts</font>
<font color="#000000">             constraints</font>
<font color="#000000">             triggers</font>

<font color="#000000">Note that the script produced by pg_dumpall will only recreate user-defined</font>
<font color="#000000">objects, not system-defined objects (most object types are considered</font>
<font color="#000000">“system-defined' if they reside in a schema whose name begins with “pg_”, if</font>
<font color="#000000">they reside in the information_schema or, in the case of an EnterpriseDB</font>
<font color="#000000">cluster, reside in the sys or dbo schemas). The new cluster will contain the</font>
<font color="#000000">system-defined objects created by the latest version of PostgreSQL.</font>

<font color="#000000">Once pg_migrator has extracted the metadata from the old cluster, it performs</font>
<font color="#000000">a number of bookkeeping tasks required to “sync up” the new cluster with the</font>
<font color="#000000">existing data.</font>

<font color="#000000">First, pg_migrator temporarily renames any tablespace directories (in the old</font>
<font color="#000000">cluster) – the new cluster will point to the same tablespace directories and</font>
<font color="#000000">will complain (ERROR: directory not empty) if those directories exist when</font>
<font color="#000000">pg_migrator imports the metadata (in a later step).</font>

<font color="#000000">Next, pg_migrator copies the “next transaction ID” from the old cluster to the</font>
<font color="#000000">new cluster. This is the first step in ensuring that the proper tuples are</font>
<font color="#000000">visible (and other tuples are hidden) from the new cluster (remember,</font>
<font color="#000000">pg_migrator does not export/import the content of user-defined tables so the</font>
<font color="#000000">transaction ID's in the new cluster must match the transaction ID's in the old</font>
<font color="#000000">data). pg_migrator also copies the starting address for write-ahead logs from</font>
<font color="#000000">the old cluster to the new cluster.</font>

<font color="#000000">pg_migrator now shuts down the postmaster and copies the commit logs</font>
<font color="#000000">($PGDATA/pg_clog/*) from the old cluster to the new cluster. At this point,</font>
<font color="#000000">the new cluster and the old cluster agree on the visibility of tuples in</font>
<font color="#000000">user-defined tables.</font>

<font color="#000000">Now pg_migrator begins reconstructing the metadata (obtained from the old</font>
<font color="#000000">cluster), running createdb for each database defined in the old cluster.</font>

<font color="#000000">Once all of the databases have been created in the new cluster, pg_migrator</font>
<font color="#000000">tackles the problem of naming toast relations. Toast tables are used to store</font>
<font color="#000000">oversized data out-of-line (that is, in a separate file). When the server</font>
<font color="#000000">decides to move a datum out of a tuple and into a toast table, it stores a</font>
<font color="#000000">pointer in the original slot in the tuple. That pointer contains the</font>
<font color="#000000">relfilenode (i.e. filename) of the toast table. That means that any table</font>
<font color="#000000">which contains toasted data will contain the filename of the toast table (in</font>
<font color="#000000">each toast pointer). Therefore, it is very important that toast tables retain</font>
<font color="#000000">their old names when they are created in the new cluster. At the moment, the</font>
<font color="#000000">CREATE TABLE command does not offer any explicit support for naming toast</font>
<font color="#000000">tables (and it probably never will offer such a feature). To ensure that the</font>
<font color="#000000">toast table names retain their old names, pg_migrator “reserves” the name of</font>
<font color="#000000">each toast table before importing the metadata from the old cluster. To</font>
<font color="#000000">reserve a filename, pg_migrator simply creates an empty file (with the</font>
<font color="#000000">appropriate name) and the server avoids that name when it detects a</font>
<font color="#000000">collision. In a later step, pg_migrator removes the “fake” toast tables and</font>
<font color="#000000">patch the proper reltoastrelids back into the pg_class table so the new</font>
<font color="#000000">cluster can find the old toast tables.</font>

<font color="#000000">Next, pg_migrator executes the script produced earlier by pg_dumpall – this</font>
<font color="#000000">script effectively copies the complete user-defined metadata from the old</font>
<font color="#000000">cluster to the new cluster. When that script completes, pg_migrator (after</font>
<font color="#000000">shutting down the new postmaster) deletes the fake toast tables and patches</font>
<font color="#000000">the old toast tuple names into the new cluster (pg_migrator also removes and</font>
<font color="#000000">recreates the corresponding pg_class and pg_type entries).</font>

<font color="#000000">Finally, pg_migrator links (or copies, depending on the command-line supplied</font>
<font color="#000000">by the user) each user-defined table (including data tables, index tables, and</font>
<font color="#000000">toast tables) from the old cluster to the new cluster. This is the time-saver;</font>
<font color="#000000">instead of exporting and importing every tuple, pg_migrator simply renames</font>
<font color="#000000">each user-defined table (or, at worst, copies the entire table if the clusters</font>
<font color="#000000">reside on different filesystems). Note that in this last step, pg_migrator</font>
<font color="#000000">assigns a new name to each relation (the name of the relation matches the</font>
<font color="#000000">relfilenode in the new cluster).</font>

<font color="#000000">An important feature of the pg_migrator design is that it leaves the original</font>
<font color="#000000">cluster intact – if a problem occurs during the upgrade, you can still run the</font>
<font color="#000000">previous version.</font>

<font color="#000000">------------------------------------------------------------------------------</font>
<font color="#000000">WHERE IT FITS</font>
<font color="#000000">------------------------------------------------------------------------------</font>

<font color="#000000">We see pg_migrator as a component within a larger upgrade toolkit.  </font>

<font color="#000000">One of the consequences of this perspective is that pg_migrator relies on the</font>
<font color="#000000">user to determine whether an upgrade-in-place is possible. pg_migrator does</font>
<font color="#000000">not check the CATALOG_VERSION so it will happily perform an in-place upgrade</font>
<font color="#000000">even though it would be sufficient to simply replace the PostgreSQL</font>
<font color="#000000">executables. pg_migrator cannot detect changes in relation-file headers or</font>
<font color="#000000">trailers or in tuple headers. pg_migrator cannot detect changes in the on-disk</font>
<font color="#000000">representation of a data type. Instead, pg_migrator should be invoked by a</font>
<font color="#000000">tool (or user) that has already determined that an in-place upgrade is</font>
<font color="#000000">practical.</font>

<font color="#000000">Since pg_migrator is expected to function as part of a larger toolkit, it is</font>
<font color="#000000">designed to produce status messages (and error messages) that can be easily</font>
<font color="#000000">processed by other tools.</font>

<font color="#000000">------------------------------------------------------------------------------</font>
<font color="#000000">IMPLEMENTATION NOTES</font>
<font color="#000000">------------------------------------------------------------------------------</font>
<font color="#000000">pg_migrator is written in C. To reduce source-code dependencies, pg_migrator</font>
<font color="#000000">makes heavy use of existing tools such as pg_ctl, pg_dumpall, and pg_resetlog.</font>

<font color="#000000">pg_migrator can upgrade a cluster from one PostgreSQL release to another</font>
<font color="#000000">PostgreSQL release, from one EnterpriseDB release to another, or from a</font>
<font color="#000000">PostgreSQL cluster to an EnterpriseDB cluster.</font>
</pre><br />

pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: Extended protocol logging
Next
From: "luis garcia"
Date:
Subject: ¿¿¿past chunk end???