Re: [WIP] In-place upgrade - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [WIP] In-place upgrade
Date
Msg-id 603c8f070811040632pf877480uf6f4d7ab7fd2525b@mail.gmail.com
Whole thread Raw
In response to Re: [WIP] In-place upgrade  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Responses Re: [WIP] In-place upgrade
List pgsql-hackers
> OK. It was original idea to make "Convert on read" which has several
> problems with no easy solution. One is that new data does not fit on the
> page and second big problem is how to convert TOAST table data. Another
> problem which is general is how to convert indexes...
>
> Convert on read has minimal impact on core when latest version is processed.
> But problem is what happen when you need to migrate tuple form page to new
> one modify index and also needs convert toast value(s)... Problem is that
> response could be long in some query, because it invokes a lot of changes
> and conversion.  I think in corner case it could requires converts all index
> when you request one record.

I don't think I'm proposing convert on read, exactly.  If you actually
try to convert the entire page when you read it in, I think you're
doomed to failure, because, as you rightly point out, there is
absolutely no guarantee that the page contents in their new format
will still fit into one block.  I think what you want to do is convert
the structures within the page one by one as you read them out of the
page.  The proposed refactoring of ExecStoreTuple will do exactly
this, for example.

HEAD uses a pointer into the actual buffer for a V4 tuple that comes
from an existing relation, and a pointer to a palloc'd structure for a
tuple that is generated during query execution.  The proposed
refactoring will keep these rules, plus add a new rule that if you
happen to read a V3 page, you will palloc space for a new V4 tuple
that is semantically equivalent to the V3 tuple on the page, and use
that pointer instead.  That, it seems to me, is exactly the right
balance - the PAGE is still a V3 page, but all of the tuples that the
upper-level code ever sees are V4 tuples.

I'm not sure how far this particular approach can be generalized.
ExecStoreTuple has the advantage that it already has to deal with both
direct buffer pointers and palloc'd structures, so the code doesn't
need to be much more complex to handle this case as well.  I think the
thing to do is go through and scrutinize all of the ReadBuffer call
sites and figure out an approach to each one.  I haven't looked at
your latest code yet, so you may have already done this, but just for
example, RelationGetBufferForTuple should probably just reject any V3
pages encountered as if they were full, including updating the FSM
where appropriate.  I would think that it would be possible to
implement that with almost zero performance impact.  I'm happy to look
at and discuss the problem cases with you, and hopefully others will
chime in as well since my knowledge of the code is far from
exhaustive.

...Robert


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: plperl needs upgrade for Fedora 10
Next
From: Peter Eisentraut
Date:
Subject: Re: Spurious Kerberos error messages