Re: Prototype: In-place upgrade v02 - Mailing list pgsql-hackers

From Zdenek Kotala
Subject Re: Prototype: In-place upgrade v02
Date
Msg-id 48C4DC1E.1080408@sun.com
Whole thread Raw
In response to Re: Prototype: In-place upgrade v02  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Prototype: In-place upgrade v02  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> Heikki Linnakangas napsal(a):
>>> The patch seems to be missing the new htup.c file.
>>
>> I'm sorry. I attached new version which is synchronized with current
>> head. I would like to say more comments as well.
>>
>> 1) The patch contains also changes which was discussed during July 
>> commit fest.     - PageGetTempPage modification suggested by Tom
>> - another hash.h backward compatible cleanup
> 
> It might be a good idea to split that into a separate patch. The sheer 
> size of this patch is quite daunting, even though the bulk of it is 
> straightforward search&replace.

Yes, I will do it.

>> 2) I add tuplimits.h header file which contains tuple limits for 
>> different access method. It is not finished yet, but idea is to keep 
>> all limits in one file and easily add limits for different page layout 
>> version - for example replace static computing with dynamic based on 
>> relation (maxtuplesize could be store in pg_class for each relation).
>>
>> I need this header also because I fallen in a cycle in header dependency.
>>
>> 3) I already sent Page API performance result in 
>> http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php
>>
>> I replaced call sequence PagetGetItemId, PageGetItemId with 
>> PageGetIndexTuple and PageGetHeapTuple function. It is main difference 
>> in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify 
>> correct tupleheader version.
>>
>> It would be good to mention that PageAPI (and tuple API) 
>> implementation is only prototype without any performance optimization.
> 
> You mentioned 5% performance degradation in that thread. What test case 
> was that? What would be a worst-case scanario, and how bad is it?

Paul van den Bogaart tested long run OLTP workload on it. He used iGen test.

> 5% is a pretty hefty price, especially when it's paid by not only 
> upgraded installations, but also freshly initialized clusters. I think 
> you'll need to pursue those performance optimizations.

5% is worst scenario. Current version is not optimized. It is written for easy 
debugging and (D)tracing. Pageheaders structures are very similar and we can 
easily remove switches for most of attributes and replace function with macros 
or inline function.

>> 4) This patch contains more topics for decision. First is general if 
>> this approach is acceptable.
> 
> I don't like the invasiveness of this approach. It's pretty invasive 
> already, and ISTM you'll need similar switch-case handling of all data 
> types that have changed the internal representation as well.

I agree in general. But for example new page API is not so invasive and by my 
opinion it should be implemented (with or without multiversion support), because 
it cleans a code. HeapTuple processing is easy too, but unfortunately it 
requires lot of modifications on many places. I has wonder how many pieces of 
code access directly to HeapTupleHeader and does not use HeapTuple data 
structure. I think we should make a conclusion what is recommended usage of 
HeapTupleHeader and HeapTuple. Most of changes in a code is like replacing 
HeapTupleHeaderGetXmax(tuple->t_data) with HeapTupleGetXmax(tuple) and so on. I 
think it should be cleanup anyway.

You mentioned data types, but it is not a problem. You can easily extend data 
type attribute about version information and call correct in/out functions. Or 
use different Oid for new data type version. There are more possible easy 
solutions for data types. And for conversion You can use ALTER TABLE command.
Main idea is keep data in all format in a relation. This approach should use 
also for integer/float datetime problem.

> We've talked about this before, so you'll remember that I favor teh 
> approach is to convert the page format, page at a time, when the pages 
> are read in. I grant you that there's non-trivial issues with that as 
> well, like if the converted data takes more space and don't fit in the 
> page anymore.

I like conversion on read too, because it is easy but there are more problems.

The non-fit page is one them.  Others problems are with indexes. For example 
hash index stores bitmap into page and it is not mentioned anywhere. Only hash 
am knows what page contains this kind of data. It is probably impossible to 
convert this page during a reading. :(

> I wonder if we could go with some sort of a hybrid approach? Convert the 
>  whole page when it's read in, but if it doesn't fit, fall back to 
> tricks like loosening the alignment requirements on platforms that can 
> handle non-aligned data, or support a special truncated page header, 
> without pd_tli and pd_prune_xid fields. Just a thought, not sure how 
> feasible those particular tricks are, but something along those lines..

OK, I have backup idea :-). Stay tuned :-)

> All in all, though. I find it a bit hard to see the big picture. For 
> upgrade-in-place, what are all the pieces that we need? To keep this 
> concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3 
> -> 8.4? That's fine with me as well, but let's pick one) and forget 
> about hypothetical changes that might occur in a future version. I can see:
> 1. Handling page layout changes (pd_prune_xid, pd_flags)
> 2. Handling tuple header changes (infomask2, HOT bits, combocid)
2.5 + composite data type
> 3. Handling changes in data type representation (packed varlens)
3.5 Data types generally (cidr/inet)
> 4. Toast chunk size
4.5 general MaxTupleSize for each different AM
> 5. Catalogs
6. AM methods

> 
> After putting all those together, how large a patch are we talking 
> about, and what's the performance penalty then? How much of all that 
> needs to be in core, and how much can live in a pgfoundry project or an 
> extra binary in src/bin or contrib? I realize that none of us have a 
> crystal ball, and one has to start somewhere, but I feel uneasy 
> committing to an approach until we have a full plan.

Unfortunately, I'm still in analyzing phase. Presented patch is prototype of one 
possible approach. I hit lot of problems and I don't have still answers on all 
of them. I'm going to update wiki page to share all these information.

At this moment, I think that I can implement offline heap conversion (8.2->8.4) 
and all indexed will be reindex. It is what we can have for 8.4. Online 
conversion has lot of problems which we are not able to answer at this moment.
    Zdenek



-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



pgsql-hackers by date:

Previous
From: M2Y
Date:
Subject: Re: Some newbie questions
Next
From: Zdenek Kotala
Date:
Subject: Re: Prototype: In-place upgrade v02