Re: [WIP] In-place upgrade - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [WIP] In-place upgrade
Date
Msg-id 200811061715.mA6HF9g00508@momjian.us
Whole thread Raw
In response to Re: [WIP] In-place upgrade  ("Robert Haas" <robertmhaas@gmail.com>)
Responses Re: [WIP] In-place upgrade  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Robert Haas wrote:
> > That's all fine and dandy, except that it presumes that you can perform
> > SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that
> > A-E aren't there until they get converted.  Which is exactly the
> > overhead we were looking to avoid.
> 
> I don't understand this comment at all.  Unless you have some sort of
> magical wand in your back pocket that will instantaneously transform
> the entire database, there is going to be a period of time when you
> have to cope with both V3 and V4 pages.  ISTM that what we should be
> talking about here is:
> 
> (1) How are we going to do that in a way that imposes near-zero
> overhead once the entire database has been converted?
> (2) How are we going to do that in a way that is minimally invasive to the code?
> (3) Can we accomplish (1) and (2) while still retaining somewhat
> reasonable performance for V3 pages?
> 
> Zdenek's initial proposal did this by replacing all of the tuple
> header macros with functions that were conditionalized on page
> version.  I think we agree that's not going to work.  That doesn't
> mean that there is no approach that can work, and we were discussing
> possible ways to make it work upthread until the thread got hijacked
> to discuss the right way of handling page expansion.  Now that it
> seems we agree that a transaction can be used to move tuples onto new
> pages, I think we'd be well served to stop talking about page
> expansion and get back to the original topic: where and how to insert
> the hooks for V3 tuple handling.

I think the above is a good summary.  For me, the problem with any
approach that has information about prior-version block formats in the
main code path is code complexity, and secondarily performance.

I know there is concern that converting all blocks on read-in might
expand the page beyond 8k in size.  One idea Heikki had was to require
some tool must be run on minor releases before a major upgrade to
guarantee there is enough free space to convert the block to the current
format on read-in, which would localize the information about prior
block formats.  We could release the tool in minor branches around the
time as a major release.  Also consider that there are very few releases
that expand the page size.

For these reasons, the expand-the-page-beyond-8k problem should not be
dictating what approach we take for upgrade-in-place because there are
workarounds for the problem, and the problem is rare.  I would like us
to again focus on converting the pages to the current version format on
read-in, and perhaps a tool to convert all old pages to the new format.

FYI, we are also going to need the ability to convert all pages to the
current format for multi-release upgrades.  For example, if you did
upgrade-in-place from 8.2 to 8.3, you are going to need to update all
pages to the 8.3 format before doing upgrade-in-place to 8.4;  perhaps
vacuum can do something like this on a per-table basis, and we can
record that status a pg_class column.

Also, consider that when we did PITR, we required commands before and
after the tar so that there was a consistent API for PITR, and later had
to add capabilities to those functions, but the user API didn't change.

I envision a similar system where we have utilities to guarantee all
pages have enough free space, and all pages are the current version,
before allowing an upgrade-in-place to the next version.  Such a
consistent API will make the job for users easier and our job simpler,
and with upgrade-in-place, where we have limited time and resources to
code this for each release, simplicity is important.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: RAM-only temporary tables
Next
From: Andrew Dunstan
Date:
Subject: Re: plperl needs upgrade for Fedora 10