Thread: Proposal: In-Place upgrade concept

Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
I attach In-Place upgrade project concept. Any technical details about
implementation of each part will be sent later (after concept acceptance).

    Please, let me know your comments.

        thanks Zdenek
In-place Upgrade project
---------------

Overview
--------
PostgreSQL community releases new version of PostgreSQL at least once a year but the released version is not able to
runwith old data files. For a user who wants to upgrade his database server is a nightmare to upgrade bigger database
andit is impossible to upgrade a very large database. This project will try to solve this issue and provide a
functionalityfor PostgreSQL which will cover all user's requirements. 


Current status
--------------
1) Data upgrade is supported only by exporting and importing data using pg_dump
   - slow and causes a long downtime
   - extra disk space needed
   - require both version (old&new)
2) pg_migrator
   - faster then 1) but it still has a long downtime when on disk structure has changed
   - previous version of PostgreSQL is needed.
   - no downgrade
   - require both version (old&new)


Project Goals
-----
1) Minimize the database downtime.
2) Do not require extra disk space.
3) Do not require an old version of PostgreSQL to be installed.
4) Allow a fallback to original PostgreSQL version if something would go wrong.
5) Support all data types, including user defined types
6) Should be easy to use.


Functional specification
----------------
There might be more different approaches on how to perform upgrade. We assume that we need to convert a very big
database.If we will upgrade the data files offline the database downtime might be critical. We will better  try to
convertdata online. This will bring a requirement that new major version of PostgreSQL must understand the old data
structuresand it must be able to work with them. 

The next important requirement is an ability to fallback (downgrade) to the previous PostgreSQL version. When new major
versiondoes not work well for the user, he needs to be able to go back to the previous version with minimal downtime.
Thiswill bring a requirement that PostgreSQL should also be able to write data in the old format. However, we might
expectsome performance penalty. 

The above requirements lead to PostgreSQL running in three modes:

compatibility mode (compat) - a new version is running on the old data files. It is possible to read and write data as
ina previous version. Features of a new version might not be available. Fallback to the previous version is possible. 

upgrade mode - a conversion of data files is running in the background. PostgreSQL have to be able to work with both
thenew data file format and the old data file format. Features of a new version might not be available. PostgreSQL have
tobe able to recover and continue in the upgrade mode in case of a power/system failure. Fallback to the previous
versionis not possible. 

normal mode - all required data files were upgraded to the new format. Features of a new version are available.
Fallbackto the previous version is not possible. 

Project Roadmap
-------------
1) Create a detailed design for each affected part
2) Design a testing environment/process
3) Implementation
4) Change development/release processes - patch author have to be responsible for providing the required upgrade
operations/functions.Some more development related to upgrade might be done also after feature freeze. In-place upgrade
shouldbe available since x.y.1 version. 


List of affected parts
----------------------

1) Data Page Structure
1.1) Page Layout
1.2) Tuple header and structure
1.3) Data encoding - data type changes
2) Control File
3) WAL
4) Catalog
5) Configuration files
7) Others (Procedure language, Communication protocol ...)



Re: Proposal: In-Place upgrade concept

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> Project Goals
> -----
> ...
> 3) Do not require an old version of PostgreSQL to be installed.

Why not? Having two versions installed at the same time doesn't seem 
like a problem to me. You can remove the old version as soon as the 
upgrade is done.

> 4) Allow a fallback to original PostgreSQL version if something would go wrong.
> ...
> compatibility mode (compat) - a new version is running on the old data files. It is possible to read and write data
asin a previous version. Features of a new version might not be available. Fallback to the previous version is
possible.

That's madness. Why would you want to do that? The complexity it adds is 
just mind-boggling.

You still wouldn't be able to downgrade after you switch from 
compatibility mode, and it seems likely that any problems, whatever they 
might be, would not appear until you switch. That means you'll need to 
be prepared to downgrade using a backup anyway, so the compatibility 
mode doesn't buy you much.

> upgrade mode - a conversion of data files is running in the background. PostgreSQL have to be able to work with both
thenew data file format and the old data file format. Features of a new version might not be available. PostgreSQL have
tobe able to recover and continue in the upgrade mode in case of a power/system failure. Fallback to the previous
versionis not possible.
 

That's feasible, though I don't see why new features wouldn't be available.

As before, upgrade can be done, it's just a matter of someone scratching 
the itch. pg_migrator can handle the catalog changes. Doing the page 
conversion from 8.2 -> 8.3 is possible, and it could be done on-the-fly 
inside PostgreSQL the first time a page is read in.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Proposal: In-Place upgrade concept

From
Gregory Stark
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> As before, upgrade can be done, it's just a matter of someone scratching the
> itch. pg_migrator can handle the catalog changes. Doing the page conversion
> from 8.2 -> 8.3 is possible, and it could be done on-the-fly inside PostgreSQL
> the first time a page is read in.

I was previously thinking a convertor for the packed varlena change wouldn't
be necessary since it handles things just fine when it finds a 4-byte header
where a 1-byte header might have been used. I just realized that's not true.

All varlena headers would have to be shifted two bits to the left (on
little-endian machines) and have their toast bits fiddled even if we don't
bother converting them to the shrink their size. Externally toasted varlenas
would however necessarily change size because they must use the new format.

This is actually a bit of a problem. We would need to know when we read in a
page what the tupledescriptor for that relation looks like to know which
fields are varlena. I'm not sure how easy it would be to arrange for the tuple
descriptor to be passed down that far.

Conceivably we could grab another infomask bit to indicate "uses new-style
varlenas" and then have heaptuple.c understand how to convert them in place.
But that leads to a ton of memory management or page locking problems.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Heikki Linnakangas wrote:
> Zdenek Kotala wrote:
>> Project Goals
>> -----
>> ...
>> 3) Do not require an old version of PostgreSQL to be installed.
> 
> Why not? Having two versions installed at the same time doesn't seem 
> like a problem to me. You can remove the old version as soon as the 
> upgrade is done.

This should be problem if you want to upgrade operation system together 
with DB and new operation system does not have this version. This 
requirement is based on operation system vendor and also on my 
experience with Oracle upgrade on tru64 cluster when two version of 
installed Oracle does not works good - ok postgresql is better in this 
way :-).


>> 4) Allow a fallback to original PostgreSQL version if something would 
>> go wrong.
>> ...
>> compatibility mode (compat) - a new version is running on the old data 
>> files. It is possible to read and write data as in a previous version. 
>> Features of a new version might not be available. Fallback to the 
>> previous version is possible.
> 
> That's madness. Why would you want to do that? The complexity it adds is 
> just mind-boggling.

I don't think that it increase complexity much. You will have conversion 
function to convert page/data from version x->x+1 and the same way could  be use to convert data back. It need more
investigation,but I'm not 
 
afraid about complexity (meanwhile :-).

> You still wouldn't be able to downgrade after you switch from 
> compatibility mode, and it seems likely that any problems, whatever they 
> might be, would not appear until you switch. That means you'll need to 
> be prepared to downgrade using a backup anyway, so the compatibility 
> mode doesn't buy you much.

One kind of problem is user defined type/function and extension. You can 
test a lot of things on test machines, but some problem could occur on 
big tables/storage. No many postgresql users have multiterrabyte disk 
array for testing.

Second kind of problem is operation system upgrade. If you upgrade OS 
and postgresql together and something fails (not only new version of 
postgresql, but many other things) then admin want to revert back to the 
old  OS version with old postgres version. In Solaris it is called Live 
Upgrade feature.

This feature enable possibility to do it easily. However, backup/restore 
is possible solution - unfortunately not much comfortable. I think it is 
nice to have requirement and if it will be possible we can do that, if 
it will bring a lot of complication, we can remove it.

>> upgrade mode - a conversion of data files is running in the 
>> background. PostgreSQL have to be able to work with both the new data 
>> file format and the old data file format. Features of a new version 
>> might not be available. PostgreSQL have to be able to recover and 
>> continue in the upgrade mode in case of a power/system failure. 
>> Fallback to the previous version is not possible.
> 
> That's feasible, though I don't see why new features wouldn't be available.

For example Oracle 10 implemented some new incremental backup feature. 
But you can use it only when you have converted database to correct 
format. I'm not oracle and I'm not able to predict which new feature in 
postgresql will require all updated pages for example. I don't expect 
that database upgrading mode will affect a lot of new features but we 
must have it in the mind.

Zdenek


Re: Proposal: In-Place upgrade concept

From
Heikki Linnakangas
Date:
Gregory Stark wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> 
>> As before, upgrade can be done, it's just a matter of someone scratching the
>> itch. pg_migrator can handle the catalog changes. Doing the page conversion
>> from 8.2 -> 8.3 is possible, and it could be done on-the-fly inside PostgreSQL
>> the first time a page is read in.
> 
> I was previously thinking a convertor for the packed varlena change wouldn't
> be necessary since it handles things just fine when it finds a 4-byte header
> where a 1-byte header might have been used. I just realized that's not true.
> 
> All varlena headers would have to be shifted two bits to the left (on
> little-endian machines) and have their toast bits fiddled even if we don't
> bother converting them to the shrink their size. Externally toasted varlenas
> would however necessarily change size because they must use the new format.
> 
> This is actually a bit of a problem. We would need to know when we read in a
> page what the tupledescriptor for that relation looks like to know which
> fields are varlena. I'm not sure how easy it would be to arrange for the tuple
> descriptor to be passed down that far.

Speaking of on-the-fly upgrading, ReadBuffer is already passed the 
Relation, which contains the TupleDesc, so I don't think that's a 
problem. Not sure how easy that would be to do in an external program 
like pg_migrator.

> Conceivably we could grab another infomask bit to indicate "uses new-style
> varlenas" and then have heaptuple.c understand how to convert them in place.
> But that leads to a ton of memory management or page locking problems.

My thinking is that when a page in the old format is read in, it's 
converted to the new format before doing anything else with it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Heikki Linnakangas wrote:
> Gregory Stark wrote:

> 
>> Conceivably we could grab another infomask bit to indicate "uses 
>> new-style
>> varlenas" and then have heaptuple.c understand how to convert them in 
>> place.
>> But that leads to a ton of memory management or page locking problems.
> 
> My thinking is that when a page in the old format is read in, it's 
> converted to the new format before doing anything else with it.

Yes, I agree with Heikki. Any other place for conversion will generate a 
big complexity of maintanace.
    Zdenek




Re: Proposal: In-Place upgrade concept

From
Martijn van Oosterhout
Date:
On Tue, Jul 03, 2007 at 12:05:07PM +0100, Heikki Linnakangas wrote:
> >This is actually a bit of a problem. We would need to know when we
> >read in a page what the tupledescriptor for that relation looks like
> >to know which fields are varlena. I'm not sure how easy it would be
> >to arrange for the tuple descriptor to be passed down that far.
>
> Speaking of on-the-fly upgrading, ReadBuffer is already passed the
> Relation, which contains the TupleDesc, so I don't think that's a
> problem. Not sure how easy that would be to do in an external program
> like pg_migrator.

My reading of this thread so far is that we're intending to upgrade
pages on the fly? Which means that at any point in time, some pages
will be converted and some not, so the tuple descriptor isn't going to
help, surely you need some identifier on the page telling you if it's
upgraded or not?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Proposal: In-Place upgrade concept

From
Heikki Linnakangas
Date:
Martijn van Oosterhout wrote:
> On Tue, Jul 03, 2007 at 12:05:07PM +0100, Heikki Linnakangas wrote:
>>> This is actually a bit of a problem. We would need to know when we
>>> read in a page what the tupledescriptor for that relation looks like
>>> to know which fields are varlena. I'm not sure how easy it would be
>>> to arrange for the tuple descriptor to be passed down that far.
>> Speaking of on-the-fly upgrading, ReadBuffer is already passed the 
>> Relation, which contains the TupleDesc, so I don't think that's a 
>> problem. Not sure how easy that would be to do in an external program 
>> like pg_migrator.
> 
> My reading of this thread so far is that we're intending to upgrade
> pages on the fly? Which means that at any point in time, some pages
> will be converted and some not, so the tuple descriptor isn't going to
> help, surely you need some identifier on the page telling you if it's
> upgraded or not?

Yes, we already have such an identifier. The page layout version number 
is stored on every page.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Martijn van Oosterhout wrote:
> On Tue, Jul 03, 2007 at 12:05:07PM +0100, Heikki Linnakangas wrote:
>>> This is actually a bit of a problem. We would need to know when we
>>> read in a page what the tupledescriptor for that relation looks like
>>> to know which fields are varlena. I'm not sure how easy it would be
>>> to arrange for the tuple descriptor to be passed down that far.
>> Speaking of on-the-fly upgrading, ReadBuffer is already passed the 
>> Relation, which contains the TupleDesc, so I don't think that's a 
>> problem. Not sure how easy that would be to do in an external program 
>> like pg_migrator.
> 
> My reading of this thread so far is that we're intending to upgrade
> pages on the fly? Which means that at any point in time, some pages
> will be converted and some not, so the tuple descriptor isn't going to
> help, surely you need some identifier on the page telling you if it's
> upgraded or not?

Currently there is page layout version number. But this information is 
not useful for conversion from 8.1 or 8.2, because both version has same 
number, but some datatypes (inet/cidr) have different storage format.

Small discussion about page version is there

http://archives.postgresql.org/pgsql-hackers/2007-06/msg00745.php

The page identification is one thing what must been improved.
        Zdenek


Re: Proposal: In-Place upgrade concept

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> My thinking is that when a page in the old format is read in, it's 
> converted to the new format before doing anything else with it.

Yeah, I'm with Heikki on this.  What I see as a sane project definition
is:

* pg_migrator or equivalent to convert the system catalogs
* a hook in ReadBuffer to allow a data page conversion procedure to be applied, on the basis of checking for old page
layoutversion.
 

I think insisting on a downgrade option is an absolutely certain way
of guaranteeing that the project will fail.

I'm not sure it's feasible to expect that we can change representations
of user-defined types, either.  I don't see how you would do that
without catalog access (to look up the UDT), and the page conversion
procedure is going to have to be able to operate without catalog
accesses.  (Thought experiment: a page is read in during crash recovery
or PITR slave operation, and discovered to have the old format.)


BTW, I thought of a likely upgrade problem that we haven't discussed
(AFAIR) in any of the many threads on this subject.  What about an index
access method change that involves an index-wide restructuring, such
that it can't be done one page at a time?  A plausible example is
changing hash indexes to have multiple buckets per page.  Presumably
you can fix the index with REINDEX, but that doesn't meet the goal of
limited downtime, if the index is big.  Is there another way?
        regards, tom lane


Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> My thinking is that when a page in the old format is read in, it's 
>> converted to the new format before doing anything else with it.
> 
> Yeah, I'm with Heikki on this.  What I see as a sane project definition
> is:
> 
> * pg_migrator or equivalent to convert the system catalogs
> * a hook in ReadBuffer to allow a data page conversion procedure to
>   be applied, on the basis of checking for old page layout version.
> 

pg_migrator is separate tool which requires old postgres version and I 
would like to have solution in postgres binary without old version 
presence. Very often new postgres version is store in same location 
(e.g. /usr/bin) and normal users could have a problem.

I see there three possible solution:

1) special postgres startup mode - postgres --upgrade-catalog
2) automatic conversion - when postgres convert catalog automatically on 
first startup on old db cluster
3) (in compat mode) catalog will be converted on fly (read/write), until 
upgrade mode is not start
> I think insisting on a downgrade option is an absolutely certain way> of guaranteeing that the project will fail.

How I mentioned before. This is nice to have requirement. I would like 
to have in the mind and when it starts complexity explosion we can 
remove it from the requirement list.

> I'm not sure it's feasible to expect that we can change representations
> of user-defined types, either.  I don't see how you would do that
> without catalog access (to look up the UDT), and the page conversion
> procedure is going to have to be able to operate without catalog
> accesses.  (Thought experiment: a page is read in during crash recovery
> or PITR slave operation, and discovered to have the old format.)

The idea how to solve problem in data type on disk representation change 
is to keep old and new datatype in/out function. New created tables will 
contains new type implementation and old tables could be converted with 
ALTER TABLE command on user request. Old data type could be store in 
compat library.

> BTW, I thought of a likely upgrade problem that we haven't discussed
> (AFAIR) in any of the many threads on this subject.  What about an index
> access method change that involves an index-wide restructuring, such
> that it can't be done one page at a time?  A plausible example is
> changing hash indexes to have multiple buckets per page.  Presumably
> you can fix the index with REINDEX, but that doesn't meet the goal of
> limited downtime, if the index is big.  Is there another way?
> 

Yes, there is way to keep old and new implementation of index and each 
will have different oid. Primary key for pg_am table will be 
name+pg_version - It is similar to UDT solution. CREATE INDEX as a 
REINDEX will use actual implementation.

    Zdenek


Re: Proposal: In-Place upgrade concept

From
Martijn van Oosterhout
Date:
On Tue, Jul 03, 2007 at 11:36:03AM -0400, Tom Lane wrote:
> I'm not sure it's feasible to expect that we can change representations
> of user-defined types, either.  I don't see how you would do that
> without catalog access (to look up the UDT), and the page conversion
> procedure is going to have to be able to operate without catalog
> accesses.  (Thought experiment: a page is read in during crash recovery
> or PITR slave operation, and discovered to have the old format.)

Well, there are two types of conversions:
1. Simple byte rearrangement. If it's not too many you could simply
build them into pg_migrator. Doesn't help with user-defined types, but
maybe you allow plugins to define a seperate hook whose only purpose is
to upgrade the value (without catalog access...).

2. Otherwise you could do a VACUUM over the table to touch every page,
thus solving it. Dunno what to do about crashing at this point.

Hmm, actually, what's the problem with PITR restoring a page in the old
format. As long as it's clear it's the old format it'll get fixed when
the page is actually used.

> BTW, I thought of a likely upgrade problem that we haven't discussed
> (AFAIR) in any of the many threads on this subject.  What about an index
> access method change that involves an index-wide restructuring, such
> that it can't be done one page at a time?  A plausible example is
> changing hash indexes to have multiple buckets per page.  Presumably
> you can fix the index with REINDEX, but that doesn't meet the goal of
> limited downtime, if the index is big.  Is there another way?

Well, we have concurrent index builds these days. I certainly don't
have ideas on how to fix this, especially if the index is on a datatype
that has changed format... I suppose those indexes will just have to be
rebuilt (a REINDEX will upgrade every page in the table anyway...). I
think it'd still be cheaper than dump/restore.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Proposal: In-Place upgrade concept

From
Martijn van Oosterhout
Date:
On Tue, Jul 03, 2007 at 07:13:29PM +0200, Zdenek Kotala wrote:
> pg_migrator is separate tool which requires old postgres version and I
> would like to have solution in postgres binary without old version
> presence. Very often new postgres version is store in same location
> (e.g. /usr/bin) and normal users could have a problem.

Possibly. But you have to demonstrate it actually works and it's best
to do that with a seperate process. Or fix pg_migrator to not require
the old version, either way will do.

> > I think insisting on a downgrade option is an absolutely certain way
> > of guaranteeing that the project will fail.
>
> How I mentioned before. This is nice to have requirement. I would like
> to have in the mind and when it starts complexity explosion we can
> remove it from the requirement list.

It seems to me that if every step of the process is WAL logged, then
downgrading is simply a matter of restoring and using PITR. Downgrades
are hard work, primarily because in the long run it's going to be
totally untested code because hardly anyone is going to need it.

> Yes, there is way to keep old and new implementation of index and each
> will have different oid. Primary key for pg_am table will be
> name+pg_version - It is similar to UDT solution. CREATE INDEX as a
> REINDEX will use actual implementation.

Bad idea. Indexes are already complicated peices of code, there's no
guarentee the old code will even work with the new version. Seems like
too much work considering REINDEX will simply fix the problem outright.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Proposal: In-Place upgrade concept

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Tom Lane wrote:
>> Yeah, I'm with Heikki on this.  What I see as a sane project definition
>> is:
>> 
>> * pg_migrator or equivalent to convert the system catalogs
>> * a hook in ReadBuffer to allow a data page conversion procedure to
>> be applied, on the basis of checking for old page layout version.

> pg_migrator is separate tool which requires old postgres version and I 
> would like to have solution in postgres binary without old version 
> presence. Very often new postgres version is store in same location 
> (e.g. /usr/bin) and normal users could have a problem.

Again, you are setting yourself up for complete failure if you insist
on having every possible nicety in the first version.  An incremental
approach is far more likely to succeed than a "big bang".

I don't see a strong need to have a solution in-the-binary at all.
I would envision that packagers of, say, 8.4 would include a minimal
8.3 build under an old/ subdirectory, and pg_migrator or a similar
tool could invoke the old postmaster from there to do the catalog
dumping.  (In an RPM or similar environment, the user could even
"rpm -e postgresql-upgrade" to get rid of the deadwood after completing
the upgrade, whereas with an integrated binary you're stuck carrying
around a lot of one-time-use code.)

This strikes me as approximately a thousand percent more maintainable
than trying to have a single set of code coping with multiple catalog
representations.  Also it scales easily to supporting more than one back
version, whereas doing the same inside one binary will not scale at all.

Keep in mind that if your proposal involves any serious limitation on
the developers' freedom to refactor internal backend APIs or change
catalog representations around, it *will be rejected*.  Do not have any
illusions on that point.  It'll be a tough enough sell freezing on-disk
representations for user data.  Demanding the internal ability to read
old catalog versions would be a large and ongoing drag on development;
I do not think we'll hold still for it.  (To point out just one of many
problems, it'd largely destroy the C-struct-overlay technique for
reading catalogs.)
        regards, tom lane


Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Martijn van Oosterhout wrote:
> On Tue, Jul 03, 2007 at 07:13:29PM +0200, Zdenek Kotala wrote:
>> pg_migrator is separate tool which requires old postgres version and I 
>> would like to have solution in postgres binary without old version 
>> presence. Very often new postgres version is store in same location 
>> (e.g. /usr/bin) and normal users could have a problem.
> 
> Possibly. But you have to demonstrate it actually works and it's best
> to do that with a seperate process. Or fix pg_migrator to not require
> the old version, either way will do.

Pg_migrator use pg_dump for dump catalog and main concept is based on 
presence of old postgres version. I think Fix it means rewrite it.

>>> I think insisting on a downgrade option is an absolutely certain way
>>> of guaranteeing that the project will fail.
>> How I mentioned before. This is nice to have requirement. I would like 
>> to have in the mind and when it starts complexity explosion we can 
>> remove it from the requirement list.
> 
> It seems to me that if every step of the process is WAL logged, then
> downgrading is simply a matter of restoring and using PITR. Downgrades
> are hard work, primarily because in the long run it's going to be
> totally untested code because hardly anyone is going to need it.

It is not downgrade. It is about keep old structure until user says 
convert to the new data structure.

>> Yes, there is way to keep old and new implementation of index and each 
>> will have different oid. Primary key for pg_am table will be 
>> name+pg_version - It is similar to UDT solution. CREATE INDEX as a 
>> REINDEX will use actual implementation.
> 
> Bad idea. Indexes are already complicated peices of code, there's no
> guarentee the old code will even work with the new version. Seems like
> too much work considering REINDEX will simply fix the problem outright.

I do not expect that old code will work with new index structure. I want 
to keep both implementation and old index will be processed by old code 
and new one will be processed by new implementation. Each will have 
different OID and pg_class.relam will point to correct implementation. 
These obsolete index method could be located in separate compat lib.
    Zdenek


Re: Proposal: In-Place upgrade concept

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> It is not downgrade. It is about keep old structure until user says 
> convert to the new data structure.

As Martijn already pointed out, the odds of problems surfacing only
after that conversion starts seem high enough to render the whole idea
a bit pointless.  IMHO it's not worth the enormous development costs
it will add ... and it's *certainly* unwise to tie success of the
entire in-place-upgrade project to that one feature.

The attractive point about pg_migrator plus page-at-a-time data upgrade
is that it'd solve 90% of the problem with 10% of the work.  If you get
that going, and people get accustomed to working with the development
restrictions associated with data upgradability, then you might be able
to come back and make a case for catalog upgradability and/or
downgradability in some future version.  But right now you're asking
people to do 90% of the work before having anything at all.

> I do not expect that old code will work with new index structure. I want 
> to keep both implementation and old index will be processed by old code 
> and new one will be processed by new implementation. Each will have 
> different OID and pg_class.relam will point to correct implementation. 

I don't think it's quite that easy when you consider user-defined
datatypes.  Where are you going to get two sets of opclasses from?
        regards, tom lane


Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Tom Lane wrote:

> 
> Again, you are setting yourself up for complete failure if you insist
> on having every possible nicety in the first version.  An incremental
> approach is far more likely to succeed than a "big bang".

Yes, I know. I don't want to solve everything in one patch. I just 
looking forward and I'm trying to summarize complexity of problem. I 
will not want to stay on dead road and say ups after couple of months of 
development ...
    Zdenek


Re: Proposal: In-Place upgrade concept

From
Richard Huxton
Date:
Tom Lane wrote:
> Keep in mind that if your proposal involves any serious limitation on
> the developers' freedom to refactor internal backend APIs or change
> catalog representations around, it *will be rejected*.  Do not have any
> illusions on that point.  It'll be a tough enough sell freezing on-disk
> representations for user data.  Demanding the internal ability to read
> old catalog versions would be a large and ongoing drag on development;
> I do not think we'll hold still for it.  (To point out just one of many
> problems, it'd largely destroy the C-struct-overlay technique for
> reading catalogs.)

One thing no-one's mentioned is how we're going to deal with definitive 
incompatibilities.

Examples:
- Tightening of UTF8 code. Means some text from old version won't transfer.
- Changing behaviour of greatest() - recently discussed. Might 
invalidate views/application queries.

It's the second example that I can see biting, the UTF stuff is big 
enough that it'll be noticed. It'd be all too easy to have a change in 
some inet-addr function that you don't notice your app is using. I can't 
think of any way of definitively auditing what features are in use (or 
have changed between versions).

Or are these examples of changes that will only be allowed e.g. every 
other major version.

--   Richard Huxton  Archonet Ltd


Re: Proposal: In-Place upgrade concept

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Tue, Jul 03, 2007 at 11:36:03AM -0400, Tom Lane wrote:
>> ... (Thought experiment: a page is read in during crash recovery
>> or PITR slave operation, and discovered to have the old format.)

> Hmm, actually, what's the problem with PITR restoring a page in the old
> format. As long as it's clear it's the old format it'll get fixed when
> the page is actually used.

Well, what I'm concerned about is something like a WAL record providing
a new-format tuple to be inserted into a page, and then you find that
the page contains old-format tuples.

[ thinks some more... ]  Actually, so long as we are willing to posit that

1. You're only allowed to upgrade a DB that's been cleanly shut down
(no replay of old-format WAL logs allowed)

2. Page format conversion is WAL-logged as a complete page replacement

then AFAICS WAL-reading operations should never have to apply any
updates to an old-format page; the first touch of any old page in the
WAL sequence should be a page replacement that updates it to new format.
This is not different from the argument why full_page_writes ensures
recovery from write failures.

So in principle the page-conversion stuff should always operate in a
live transaction.  (Which is good, because now that I think about it
we couldn't emit a WAL record for the page conversion in those other
contexts.)  I still feel pretty twitchy about letting it do catalog
access, though, because it has to operate at such a low level of the
system.  bufmgr.c has no business invoking anything that might do
catalog access.  If nothing else there are deadlock issues.

On the whole I think we could define format conversions for user-defined
types as "not our problem".  A new version of a UDT that has an
incompatible representation on disk can simply be treated as a new type
with a different OID, exactly as Zdenek was suggesting for index AMs.
To upgrade a database containing such a column, you install
"my_udt_old.so" that services the old representation, ALTER TYPE my_udt
RENAME TO my_udt_old, then install new type my_udt and start using that.
Anyway that seems good enough for version 1.0 --- I don't recall that
we've ever changed the on-disk representation of any contrib/ types,
so how important is this scenario in the real world?
        regards, tom lane


Re: Proposal: In-Place upgrade concept

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> 2. Page format conversion is WAL-logged as a complete page replacement

It seems we get that for free. By definition any modification to a page after
conversion will be the first record for that page since a checkpoint. It just
means we have to force full_page_writes to be true, at least for a buffer
that's been converted if not whenever we're processing a mixed format
database.

> On the whole I think we could define format conversions for user-defined
> types as "not our problem".  

Hm, perhaps we could do it by doing as much of the work up-front as possible.
Requiring datatypes to define a function in pg_proc to do the conversion and
require that it be a function which can stand on its own without catalog
lookups or other facilities. Then when we start up on a mixed format database
we load all those functions into a hash by typeoid and provide that hash to
bufmgr. By definition any datatypes we expect to find in the database must
already be in the catalog before we start.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Proposal: In-Place upgrade concept

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> 2. Page format conversion is WAL-logged as a complete page replacement

> It seems we get that for free. By definition any modification to a page after
> conversion will be the first record for that page since a checkpoint.

No, I don't think so.  Consider a page that's been sucked in for just a
SELECT.  Or did you plan to pay the page conversion cost over and over
again until it's finally modified?  Anyway the whole idea fails if the
first mod that needs to be made is just a hint-bit update.

> Hm, perhaps we could do it by doing as much of the work up-front as possible.
> Requiring datatypes to define a function in pg_proc to do the conversion and
> require that it be a function which can stand on its own without catalog
> lookups or other facilities.

I don't think you quite got the point of "no catalog lookups".  That
includes not looking up pg_type or pg_proc entries.
        regards, tom lane


Re: Proposal: In-Place upgrade concept

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> One thing no-one's mentioned is how we're going to deal with definitive 
> incompatibilities.

I don't really think that in-place update changes that story at all.
The advice has always been "read the release notes and test your
applications before updating".  The only thing that in-place update
will offer is shorter update downtime than dump/reload.
        regards, tom lane


Re: Proposal: In-Place upgrade concept

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> 2. Page format conversion is WAL-logged as a complete page replacement
>
>> It seems we get that for free. By definition any modification to a page after
>> conversion will be the first record for that page since a checkpoint.
>
> No, I don't think so.  Consider a page that's been sucked in for just a
> SELECT.  Or did you plan to pay the page conversion cost over and over
> again until it's finally modified?  Anyway the whole idea fails if the
> first mod that needs to be made is just a hint-bit update.

Ah, true, it would have to dirty the page and that means putting an LSN in and
that means needing a WAL record to point it to. But the actual record is
almost irrelevant as any record will cause the full page to be attached.

>> Hm, perhaps we could do it by doing as much of the work up-front as possible.
>> Requiring datatypes to define a function in pg_proc to do the conversion and
>> require that it be a function which can stand on its own without catalog
>> lookups or other facilities.
>
> I don't think you quite got the point of "no catalog lookups".  That
> includes not looking up pg_type or pg_proc entries.

ISTM that it's really just not looking up pg_type or pg_proc entries *from
within the low level storage code* that's evil. If they're all sucked in on
backend startup and restricted by policy to not be able to do anything fancy
themselves then I don't see your objection.

I do have an objection though: TOAST data. Even were the bufmgr to be provided
with a mechanism to find a data type conversion function I don't see what it
would do about a toasted datum. Obviously bufmgr can't fiddle with it, and if
it leaves it for pg_detoast_datum then even if there were some way for
pg_detoast_datum to know whether it's an old format datum or a new format one
there's no way for it replace the TOAST datum on disk with the new one.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Proposal: In-Place upgrade concept

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> I do have an objection though: TOAST data. Even were the bufmgr to be
> provided with a mechanism to find a data type conversion function I
> don't see what it would do about a toasted datum.

Urgh, that *is* a nasty thought :-(

Out-of-line datums aren't the only issue, either: consider inline
compressed datums.  A data representation change, even one that is known
not to increase the ordinary uncompressed size of the datum, could
easily render it slightly less compressible, resulting in a risk that
the tuple doesn't fit on the page anymore.  It hardly seems practical
(maybe not even possible) to guarantee that this cannot happen.

So maybe we are up against the conclusion that in-place updates cannot
support datatype representation changes, at least not for toastable
datatypes.  We could still handle 'em by the expedient suggested
upthread for user-defined types, ie the "new representation" is treated
as a whole new type.  That's not terribly appetizing though; I had
expected we could be more efficient for the case of changes in built-in
types.
        regards, tom lane


Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Tom Lane wrote:

> 
>> I do not expect that old code will work with new index structure. I want 
>> to keep both implementation and old index will be processed by old code 
>> and new one will be processed by new implementation. Each will have 
>> different OID and pg_class.relam will point to correct implementation. 
> 
> I don't think it's quite that easy when you consider user-defined
> datatypes.  Where are you going to get two sets of opclasses from?

Good point. I think if new index implementation does not change meaning 
of strategy and function number old opclasses set could be used. And it 
is only for new created or recreated indexes. Existing indexes will work 
fine with old opclasses set.

        Zdenek


Re: Proposal: In-Place upgrade concept

From
Martijn van Oosterhout
Date:
On Tue, Jul 03, 2007 at 09:51:59PM -0400, Tom Lane wrote:
> Out-of-line datums aren't the only issue, either: consider inline
> compressed datums.  A data representation change, even one that is known
> not to increase the ordinary uncompressed size of the datum, could
> easily render it slightly less compressible, resulting in a risk that
> the tuple doesn't fit on the page anymore.  It hardly seems practical
> (maybe not even possible) to guarantee that this cannot happen.

Does the converted page have to fit into a single page? It's a bit of a
corner-case, there's slack on every page. Wouldn't it be possible to
replace an oversize tuple with a pointer to a new tuple on a new page
(indexes are the problem here).

> So maybe we are up against the conclusion that in-place updates cannot
> support datatype representation changes, at least not for toastable
> datatypes.  We could still handle 'em by the expedient suggested
> upthread for user-defined types, ie the "new representation" is treated
> as a whole new type.  That's not terribly appetizing though; I had
> expected we could be more efficient for the case of changes in built-in
> types.

Well, it seems to me 99% of cases can be handled easily, we just might
have to have a backup method for the cases where the simple cases
doesn't work...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Proposal: In-Place upgrade concept

From
Zdenek Kotala
Date:
Martijn van Oosterhout wrote:
> On Tue, Jul 03, 2007 at 09:51:59PM -0400, Tom Lane wrote:
>> Out-of-line datums aren't the only issue, either: consider inline
>> compressed datums.  A data representation change, even one that is known
>> not to increase the ordinary uncompressed size of the datum, could
>> easily render it slightly less compressible, resulting in a risk that
>> the tuple doesn't fit on the page anymore.  It hardly seems practical
>> (maybe not even possible) to guarantee that this cannot happen.
> 
> Does the converted page have to fit into a single page? It's a bit of a
> corner-case, there's slack on every page. Wouldn't it be possible to
> replace an oversize tuple with a pointer to a new tuple on a new page
> (indexes are the problem here).

I think it is same problem as Tom mentioned about catalog access which 
you probably need for handle index modification. I have some wild idea 
how to handle, but it needs more experiments.

>> So maybe we are up against the conclusion that in-place updates cannot
>> support datatype representation changes, at least not for toastable
>> datatypes.  We could still handle 'em by the expedient suggested
>> upthread for user-defined types, ie the "new representation" is treated
>> as a whole new type.  That's not terribly appetizing though; I had
>> expected we could be more efficient for the case of changes in built-in
>> types.
> 
> Well, it seems to me 99% of cases can be handled easily, we just might
> have to have a backup method for the cases where the simple cases
> doesn't work...

The question is how to check if database is fit in 1% of corner cases. 
One suggested solution was to run preupgrade check on old version, but 
it also generates problem how to avoid "ugly" modification of already 
checked pages.

        Zdenek