Re: [WIP] In-place upgrade - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [WIP] In-place upgrade |
Date | |
Msg-id | 603c8f070811040632pf877480uf6f4d7ab7fd2525b@mail.gmail.com Whole thread Raw |
In response to | Re: [WIP] In-place upgrade (Zdenek Kotala <Zdenek.Kotala@Sun.COM>) |
Responses |
Re: [WIP] In-place upgrade
|
List | pgsql-hackers |
> OK. It was original idea to make "Convert on read" which has several > problems with no easy solution. One is that new data does not fit on the > page and second big problem is how to convert TOAST table data. Another > problem which is general is how to convert indexes... > > Convert on read has minimal impact on core when latest version is processed. > But problem is what happen when you need to migrate tuple form page to new > one modify index and also needs convert toast value(s)... Problem is that > response could be long in some query, because it invokes a lot of changes > and conversion. I think in corner case it could requires converts all index > when you request one record. I don't think I'm proposing convert on read, exactly. If you actually try to convert the entire page when you read it in, I think you're doomed to failure, because, as you rightly point out, there is absolutely no guarantee that the page contents in their new format will still fit into one block. I think what you want to do is convert the structures within the page one by one as you read them out of the page. The proposed refactoring of ExecStoreTuple will do exactly this, for example. HEAD uses a pointer into the actual buffer for a V4 tuple that comes from an existing relation, and a pointer to a palloc'd structure for a tuple that is generated during query execution. The proposed refactoring will keep these rules, plus add a new rule that if you happen to read a V3 page, you will palloc space for a new V4 tuple that is semantically equivalent to the V3 tuple on the page, and use that pointer instead. That, it seems to me, is exactly the right balance - the PAGE is still a V3 page, but all of the tuples that the upper-level code ever sees are V4 tuples. I'm not sure how far this particular approach can be generalized. ExecStoreTuple has the advantage that it already has to deal with both direct buffer pointers and palloc'd structures, so the code doesn't need to be much more complex to handle this case as well. I think the thing to do is go through and scrutinize all of the ReadBuffer call sites and figure out an approach to each one. I haven't looked at your latest code yet, so you may have already done this, but just for example, RelationGetBufferForTuple should probably just reject any V3 pages encountered as if they were full, including updating the FSM where appropriate. I would think that it would be possible to implement that with almost zero performance impact. I'm happy to look at and discuss the problem cases with you, and hopefully others will chime in as well since my knowledge of the code is far from exhaustive. ...Robert
pgsql-hackers by date: