Thread: Prototype: In-place upgrade
Attached patch is prototype of in-place upgrade as was presented on PGCon this year. Main idea is to learn postgres to handle different version of page and tuple structures. 1) page - Patch contains new page API and all code access page through this API. Functions check page version and return correct data to caller. It is mostly complete now. Only ItemId flags need finish. 2) tuple - HeapTuple structure has been extended with t_ver attribute which contains page layout version and direct access to HeapTupleHeader is forbidden. It is possible now only through HeapTuple* functions (see htup.c). (HeapTupleHeader access still stays in a several functions like heap_form_tuple). This patch version still does not allow to read old database, but it shows how it should work. Main disadvantage of this approach is performance penalty. Please, let me know your opinion about this approach. Future work: 1) learn WAL to process different tuple structure version 2) tuple conversion to new version and put it into executor (ExecStoreTuple) 3) multiversion MaxItemSize constant thanks for your comments Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Attachment
The patch seems to be missing the new htup.c file. Zdenek Kotala wrote: > Attached patch is prototype of in-place upgrade as was presented on > PGCon this year. > > Main idea is to learn postgres to handle different version of page and > tuple structures. > > 1) page - Patch contains new page API and all code access page through > this API. Functions check page version and return correct data to > caller. It is mostly complete now. Only ItemId flags need finish. > > 2) tuple - HeapTuple structure has been extended with t_ver attribute > which contains page layout version and direct access to HeapTupleHeader > is forbidden. It is possible now only through HeapTuple* functions (see > htup.c). (HeapTupleHeader access still stays in a several functions like > heap_form_tuple). > > This patch version still does not allow to read old database, but it > shows how it should work. Main disadvantage of this approach is > performance penalty. > > Please, let me know your opinion about this approach. > > Future work: > 1) learn WAL to process different tuple structure version > 2) tuple conversion to new version and put it into executor > (ExecStoreTuple) > 3) multiversion MaxItemSize constant -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas napsal(a): > The patch seems to be missing the new htup.c file. Upps, I'm sorry I'm going to fix it and I will send new version asap. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Heikki Linnakangas napsal(a): > The patch seems to be missing the new htup.c file. I'm sorry. I attached new version which is synchronized with current head. I would like to say more comments as well. 1) The patch contains also changes which was discussed during July commit fest. - PageGetTempPage modification suggested by Tom - another hash.h backward compatible cleanup 2) I add tuplimits.h header file which contains tuple limits for different access method. It is not finished yet, but idea is to keep all limits in one file and easily add limits for different page layout version - for example replace static computing with dynamic based on relation (maxtuplesize could be store in pg_class for each relation). I need this header also because I fallen in a cycle in header dependency. 3) I already sent Page API performance result in http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php I replaced call sequence PagetGetItemId, PageGetItemId with PageGetIndexTuple and PageGetHeapTuple function. It is main difference in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify correct tupleheader version. It would be good to mention that PageAPI (and tuple API) implementation is only prototype without any performance optimization. 4) This patch contains more topics for decision. First is general if this approach is acceptable. Second is about new Page API if we replace all page access with new proposed macros/(inline)function. Third is how to name and where store different data structure version. My idea is use suffix with underscore and page layout version and keep all version in a same header file. 5) I got another idea about usage of page API. I call it "3 in 1". Because all page access will be through New API, it could be use for WAL logging and other WAL recording could be reduced. Replication could be easily added based on page modification. It is just idea for thinking. 6) it is probably all for Friday evening. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Attachment
Zdenek Kotala wrote: > Heikki Linnakangas napsal(a): >> The patch seems to be missing the new htup.c file. > > I'm sorry. I attached new version which is synchronized with current > head. I would like to say more comments as well. > > 1) The patch contains also changes which was discussed during July > commit fest. - PageGetTempPage modification suggested by Tom > - another hash.h backward compatible cleanup It might be a good idea to split that into a separate patch. The sheer size of this patch is quite daunting, even though the bulk of it is straightforward search&replace. > 2) I add tuplimits.h header file which contains tuple limits for > different access method. It is not finished yet, but idea is to keep all > limits in one file and easily add limits for different page layout > version - for example replace static computing with dynamic based on > relation (maxtuplesize could be store in pg_class for each relation). > > I need this header also because I fallen in a cycle in header dependency. > > 3) I already sent Page API performance result in > http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php > > I replaced call sequence PagetGetItemId, PageGetItemId with > PageGetIndexTuple and PageGetHeapTuple function. It is main difference > in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify > correct tupleheader version. > > It would be good to mention that PageAPI (and tuple API) implementation > is only prototype without any performance optimization. You mentioned 5% performance degradation in that thread. What test case was that? What would be a worst-case scanario, and how bad is it? 5% is a pretty hefty price, especially when it's paid by not only upgraded installations, but also freshly initialized clusters. I think you'll need to pursue those performance optimizations. > 4) This patch contains more topics for decision. First is general if > this approach is acceptable. I don't like the invasiveness of this approach. It's pretty invasive already, and ISTM you'll need similar switch-case handling of all data types that have changed the internal representation as well. We've talked about this before, so you'll remember that I favor teh approach is to convert the page format, page at a time, when the pages are read in. I grant you that there's non-trivial issues with that as well, like if the converted data takes more space and don't fit in the page anymore. I wonder if we could go with some sort of a hybrid approach? Convert the whole page when it's read in, but if it doesn'tfit, fall back to tricks like loosening the alignment requirements on platforms that can handle non-aligned data, or support a special truncated page header, without pd_tli and pd_prune_xid fields. Just a thought, not sure how feasible those particular tricks are, but something along those lines.. All in all, though. I find it a bit hard to see the big picture. For upgrade-in-place, what are all the pieces that we need? To keep this concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3 -> 8.4? That's fine with me as well, but let's pick one) and forget about hypothetical changes that might occur in a future version. I can see: 1. Handling page layout changes (pd_prune_xid, pd_flags) 2. Handling tuple header changes (infomask2, HOT bits, combocid) 3. Handling changes in data type representation (packed varlens) 4. Toast chunk size 5. Catalogs After putting all those together, how large a patch are we talking about, and what's the performance penalty then? How much of all that needs to be in core, and how much can live in a pgfoundry project or an extra binary in src/bin or contrib? I realize that none of us have a crystal ball, and one has to start somewhere, but I feel uneasy committing to an approach until we have a full plan. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 5 Sep 2008, Heikki Linnakangas wrote: > All in all, though. I find it a bit hard to see the big picture. I've been working on trying to see that myself lately, have been dumping links to all the interesting material at http://wiki.postgresql.org/wiki/In-place_upgrade if there's any of that you haven't seen before. > To keep this concrete, let's focus on PG 8.2 -> PG 8.3 (or are you > focusing on PG 8.3 -> 8.4? That's fine with me as well, but let's pick > one) From a complexity perspective, the changes needed to go from 8.2->8.3 seem much larger than what's needed for 8.3->8.4. There's also a huge PR win if 8.4 goes out the door saying that in-place upgrades are available from the previous version starting at the 8.4 release. Given the limited time left, I would think a focus on nailing the 8.3->8.4 conversion down first and then slipping in support for earlier revs later would be one way to get this into more managable chunks. Obviously if you can fit infrastructure that makes the 8.2 conversion easier that's worth doing, but I'd hate to see this get bogged down worrying too much about things that haven't actually changed since 8.3. The specific areas I am getting up to speed to help out with here are catalog updates and working on integration/testing. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Heikki Linnakangas wrote: > > 4) This patch contains more topics for decision. First is general if > > this approach is acceptable. > > I don't like the invasiveness of this approach. It's pretty invasive > already, and ISTM you'll need similar switch-case handling of all data > types that have changed the internal representation as well. > > We've talked about this before, so you'll remember that I favor teh > approach is to convert the page format, page at a time, when the pages > are read in. I grant you that there's non-trivial issues with that as > well, like if the converted data takes more space and don't fit in the > page anymore. I 100% agree with Heikki here; having the conversion spill out into the main backend is very expensive and adds lots of complexity. The only argument for the Zdenek's conversion spill appoach is that it allows conversion to happen at a more natural time than when the page is read in, but frankly I think the conversion needs are going to be pretty limited and are better done in a localized way at page read-in time. As far as the page not fitting after conversion, what about some user command that will convert an entire table to the new format if page expansion fails. > I wonder if we could go with some sort of a hybrid approach? Convert the > whole page when it's read in, but if it doesn't fit, fall back to > tricks like loosening the alignment requirements on platforms that can > handle non-aligned data, or support a special truncated page header, > without pd_tli and pd_prune_xid fields. Just a thought, not sure how > feasible those particular tricks are, but something along those lines.. > > All in all, though. I find it a bit hard to see the big picture. For > upgrade-in-place, what are all the pieces that we need? To keep this > concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3 > -> 8.4? That's fine with me as well, but let's pick one) and forget > about hypothetical changes that might occur in a future version. I can see: > 1. Handling page layout changes (pd_prune_xid, pd_flags) > 2. Handling tuple header changes (infomask2, HOT bits, combocid) > 3. Handling changes in data type representation (packed varlens) > 4. Toast chunk size > 5. Catalogs > > After putting all those together, how large a patch are we talking > about, and what's the performance penalty then? How much of all that > needs to be in core, and how much can live in a pgfoundry project or an > extra binary in src/bin or contrib? I realize that none of us have a > crystal ball, and one has to start somewhere, but I feel uneasy > committing to an approach until we have a full plan. Yes, another very good point. I am ready to focus on these issues for 8.4; all this needs to be fleshed out, perhaps on a wiki. As a starting point, what would be really nice is to start a wiki that lists all data format changes for every major release. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > As far as the page not fitting after conversion, what about some user > command that will convert an entire table to the new format if page > expansion fails. VACUUM? Having to run a manual command defeats the purpose somewhat, though. Especially if you have no way of knowing on what tables it needs to be run on. > I am ready to focus on these issues for 8.4; all this needs to be > fleshed out, perhaps on a wiki. As a starting point, what would be > really nice is to start a wiki that lists all data format changes for > every major release. Have you looked at http://wiki.postgresql.org/wiki/In-place_upgrade already, that Greg Smith mentioned elsewhere in this thread? That's a good starting point. In fact, I don't think there's any low-level data format changes yet between 8.3 and 8.4, so this would be a comparatively easy release to implement upgrade-in-place. There's just the catalog changes, but AFAICS nothing that would require scanning through relations. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Bruce Momjian wrote: > > As far as the page not fitting after conversion, what about some user > > command that will convert an entire table to the new format if page > > expansion fails. > > VACUUM? > > Having to run a manual command defeats the purpose somewhat, though. > Especially if you have no way of knowing on what tables it needs to be > run on. My assumption is that the page not fitting would be a rare case so requiring something like vacuum to fix it would be OK. What I don't want to do it to add lots of complexity to the code just to handle the page expansion case, when such a case is rare and perhaps can be fixed by a vacuum. > > I am ready to focus on these issues for 8.4; all this needs to be > > fleshed out, perhaps on a wiki. As a starting point, what would be > > really nice is to start a wiki that lists all data format changes for > > every major release. > > Have you looked at http://wiki.postgresql.org/wiki/In-place_upgrade > already, that Greg Smith mentioned elsewhere in this thread? That's a > good starting point. Agreed. > In fact, I don't think there's any low-level data format changes yet > between 8.3 and 8.4, so this would be a comparatively easy release to > implement upgrade-in-place. There's just the catalog changes, but AFAICS > nothing that would require scanning through relations. Yep. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > In fact, I don't think there's any low-level data format changes yet > between 8.3 and 8.4, so this would be a comparatively easy release to > implement upgrade-in-place. There's just the catalog changes, but AFAICS > nothing that would require scanning through relations. After a quick scan of the catversion.h changelog (which hopefully covers any such changes): we changed sequences incompatibly, we changed hash indexes incompatibly (even without the pending patch that would change their contents beyond recognition), and Teodor did some stuff to GIN indexes that might or might not represent an on-disk format change, you'd have to ask him. We also whacked around the sort order of bpchar_pattern_ops btree indexes. I didn't see anything that looked like an immediate change in user table contents, unless they used the "name" type; but what of relation forks? regards, tom lane
Tom Lane wrote: > I didn't see anything that looked like an immediate change in user table > contents, unless they used the "name" type; but what of relation forks? Relation forks didn't change anything inside relation files, so no scanning of relations is required because of that. Neither will the FSM rewrite. Not sure about DSM yet. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> I didn't see anything that looked like an immediate change in user table >> contents, unless they used the "name" type; but what of relation forks? > > Relation forks didn't change anything inside relation files, so no scanning of > relations is required because of that. Neither will the FSM rewrite. Not sure > about DSM yet. And just to confirm -- they don't change the name of the files the postmaster expects to find in its data directory, right? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Gregory Stark wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Relation forks didn't change anything inside relation files, so no scanning of >> relations is required because of that. Neither will the FSM rewrite. Not sure >> about DSM yet. > > And just to confirm -- they don't change the name of the files the postmaster > expects to find in its data directory, right? Right. But it wouldn't be a big issue anyway. Renaming would be quick regardless of the relation sizes, FSM and DSM will introduce new files, though, that probably need to be created as part of the upgrade, but again they're not very big. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas napsal(a): > Zdenek Kotala wrote: >> Heikki Linnakangas napsal(a): >>> The patch seems to be missing the new htup.c file. >> >> I'm sorry. I attached new version which is synchronized with current >> head. I would like to say more comments as well. >> >> 1) The patch contains also changes which was discussed during July >> commit fest. - PageGetTempPage modification suggested by Tom >> - another hash.h backward compatible cleanup > > It might be a good idea to split that into a separate patch. The sheer > size of this patch is quite daunting, even though the bulk of it is > straightforward search&replace. Yes, I will do it. >> 2) I add tuplimits.h header file which contains tuple limits for >> different access method. It is not finished yet, but idea is to keep >> all limits in one file and easily add limits for different page layout >> version - for example replace static computing with dynamic based on >> relation (maxtuplesize could be store in pg_class for each relation). >> >> I need this header also because I fallen in a cycle in header dependency. >> >> 3) I already sent Page API performance result in >> http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php >> >> I replaced call sequence PagetGetItemId, PageGetItemId with >> PageGetIndexTuple and PageGetHeapTuple function. It is main difference >> in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify >> correct tupleheader version. >> >> It would be good to mention that PageAPI (and tuple API) >> implementation is only prototype without any performance optimization. > > You mentioned 5% performance degradation in that thread. What test case > was that? What would be a worst-case scanario, and how bad is it? Paul van den Bogaart tested long run OLTP workload on it. He used iGen test. > 5% is a pretty hefty price, especially when it's paid by not only > upgraded installations, but also freshly initialized clusters. I think > you'll need to pursue those performance optimizations. 5% is worst scenario. Current version is not optimized. It is written for easy debugging and (D)tracing. Pageheaders structures are very similar and we can easily remove switches for most of attributes and replace function with macros or inline function. >> 4) This patch contains more topics for decision. First is general if >> this approach is acceptable. > > I don't like the invasiveness of this approach. It's pretty invasive > already, and ISTM you'll need similar switch-case handling of all data > types that have changed the internal representation as well. I agree in general. But for example new page API is not so invasive and by my opinion it should be implemented (with or without multiversion support), because it cleans a code. HeapTuple processing is easy too, but unfortunately it requires lot of modifications on many places. I has wonder how many pieces of code access directly to HeapTupleHeader and does not use HeapTuple data structure. I think we should make a conclusion what is recommended usage of HeapTupleHeader and HeapTuple. Most of changes in a code is like replacing HeapTupleHeaderGetXmax(tuple->t_data) with HeapTupleGetXmax(tuple) and so on. I think it should be cleanup anyway. You mentioned data types, but it is not a problem. You can easily extend data type attribute about version information and call correct in/out functions. Or use different Oid for new data type version. There are more possible easy solutions for data types. And for conversion You can use ALTER TABLE command. Main idea is keep data in all format in a relation. This approach should use also for integer/float datetime problem. > We've talked about this before, so you'll remember that I favor teh > approach is to convert the page format, page at a time, when the pages > are read in. I grant you that there's non-trivial issues with that as > well, like if the converted data takes more space and don't fit in the > page anymore. I like conversion on read too, because it is easy but there are more problems. The non-fit page is one them. Others problems are with indexes. For example hash index stores bitmap into page and it is not mentioned anywhere. Only hash am knows what page contains this kind of data. It is probably impossible to convert this page during a reading. :( > I wonder if we could go with some sort of a hybrid approach? Convert the > whole page when it's read in, but if it doesn't fit, fall back to > tricks like loosening the alignment requirements on platforms that can > handle non-aligned data, or support a special truncated page header, > without pd_tli and pd_prune_xid fields. Just a thought, not sure how > feasible those particular tricks are, but something along those lines.. OK, I have backup idea :-). Stay tuned :-) > All in all, though. I find it a bit hard to see the big picture. For > upgrade-in-place, what are all the pieces that we need? To keep this > concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3 > -> 8.4? That's fine with me as well, but let's pick one) and forget > about hypothetical changes that might occur in a future version. I can see: > 1. Handling page layout changes (pd_prune_xid, pd_flags) > 2. Handling tuple header changes (infomask2, HOT bits, combocid) 2.5 + composite data type > 3. Handling changes in data type representation (packed varlens) 3.5 Data types generally (cidr/inet) > 4. Toast chunk size 4.5 general MaxTupleSize for each different AM > 5. Catalogs 6. AM methods > > After putting all those together, how large a patch are we talking > about, and what's the performance penalty then? How much of all that > needs to be in core, and how much can live in a pgfoundry project or an > extra binary in src/bin or contrib? I realize that none of us have a > crystal ball, and one has to start somewhere, but I feel uneasy > committing to an approach until we have a full plan. Unfortunately, I'm still in analyzing phase. Presented patch is prototype of one possible approach. I hit lot of problems and I don't have still answers on all of them. I'm going to update wiki page to share all these information. At this moment, I think that I can implement offline heap conversion (8.2->8.4) and all indexed will be reindex. It is what we can have for 8.4. Online conversion has lot of problems which we are not able to answer at this moment. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Bruce Momjian napsal(a): > > As far as the page not fitting after conversion, what about some user > command that will convert an entire table to the new format if page > expansion fails. Keep in a mind that there are more kind of pages. Heap is easy, but each index AM has own specific :(. Better approach is move tuple to the new page and invalidate all related table indexes. Following reindex automatically convert whole table. >> After putting all those together, how large a patch are we talking >> about, and what's the performance penalty then? How much of all that >> needs to be in core, and how much can live in a pgfoundry project or an >> extra binary in src/bin or contrib? I realize that none of us have a >> crystal ball, and one has to start somewhere, but I feel uneasy >> committing to an approach until we have a full plan. > > Yes, another very good point. > > I am ready to focus on these issues for 8.4; all this needs to be > fleshed out, perhaps on a wiki. As a starting point, what would be > really nice is to start a wiki that lists all data format changes for > every major release. As Greg mentioned in his mail there wiki page is already there. Unfortunately, I did not time to put actual information there. I'm going to do soon.Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Bruce Momjian napsal(a): > Heikki Linnakangas wrote: >> Bruce Momjian wrote: >>> As far as the page not fitting after conversion, what about some user >>> command that will convert an entire table to the new format if page >>> expansion fails. >> VACUUM? >> >> Having to run a manual command defeats the purpose somewhat, though. >> Especially if you have no way of knowing on what tables it needs to be >> run on. > > My assumption is that the page not fitting would be a rare case so > requiring something like vacuum to fix it would be OK. It is 1-2% records per heap. I assume that is is more for BTree. > What I don't want to do it to add lots of complexity to the code just to > handle the page expansion case, when such a case is rare and perhaps can > be fixed by a vacuum. Unfortunately it is not so rare. And only heap on 32bit x86 platform (4byte Max alignment) is no problem. But all index pages are affected. > >> In fact, I don't think there's any low-level data format changes yet >> between 8.3 and 8.4, so this would be a comparatively easy release to >> implement upgrade-in-place. There's just the catalog changes, but AFAICS >> nothing that would require scanning through relations. > > Yep. I did not test now but pg_upgrade.sh script worked fine in May without any modification for conversion 8.3->8.4. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Tom Lane napsal(a): > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> In fact, I don't think there's any low-level data format changes yet >> between 8.3 and 8.4, so this would be a comparatively easy release to >> implement upgrade-in-place. There's just the catalog changes, but AFAICS >> nothing that would require scanning through relations. > > After a quick scan of the catversion.h changelog (which hopefully covers > any such changes): we changed sequences incompatibly, we changed hash > indexes incompatibly (even without the pending patch that would change > their contents beyond recognition), and Teodor did some stuff to GIN > indexes that might or might not represent an on-disk format change, > you'd have to ask him. We also whacked around the sort order of > bpchar_pattern_ops btree indexes. Hmm, It seems that reindex is only good answer on all these changes. Sequence should be converted during catalog conversion. Another idea is to create backward compatible AM and put them into separate library. If these AM will work also with old page structure then there should not be reason for reindexing or index page conversion after upgrade. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Heikki Linnakangas napsal(a): > Tom Lane wrote: >> I didn't see anything that looked like an immediate change in user table >> contents, unless they used the "name" type; but what of relation forks? > > Relation forks didn't change anything inside relation files, so no > scanning of relations is required because of that. Neither will the FSM > rewrite. Not sure about DSM yet. > Does it mean, that if you "inject" old data file after catalog upgrade, then FSM will works without any problem? Zdenek PS: I plan to review FSM this week. -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Zdenek Kotala wrote: > Heikki Linnakangas napsal(a): >> Relation forks didn't change anything inside relation files, so no >> scanning of relations is required because of that. Neither will the >> FSM rewrite. Not sure about DSM yet. > > Does it mean, that if you "inject" old data file after catalog upgrade, > then FSM will works without any problem? Yes. You'll need to construct an FSM, but it doesn't necessarily need to reflect the reality. You could just fill it with zeros, meaning that there's no free space anywhere, and let the next vacuum fill it with real information. Or you could read the old pg_fsm.cache file and fill the new FSM accordingly. > PS: I plan to review FSM this week. Thanks! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Zdenek Kotala wrote: > Tom Lane napsal(a): >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> In fact, I don't think there's any low-level data format changes yet >>> between 8.3 and 8.4, so this would be a comparatively easy release to >>> implement upgrade-in-place. There's just the catalog changes, but >>> AFAICS nothing that would require scanning through relations. >> >> After a quick scan of the catversion.h changelog (which hopefully covers >> any such changes): we changed sequences incompatibly, we changed hash >> indexes incompatibly (even without the pending patch that would change >> their contents beyond recognition), and Teodor did some stuff to GIN >> indexes that might or might not represent an on-disk format change, >> you'd have to ask him. We also whacked around the sort order of >> bpchar_pattern_ops btree indexes. > > Hmm, It seems that reindex is only good answer on all these changes. Isn't that exactly what we want to avoid with upgrade-in-place? As long as the conversion can be done page-at-a-time, without consulting other pages, we can do it when the page is read in. I'm not sure what the GIN changes were, but I didn't see any changes to the page layout at a quick glance. The bpchar_pattern_ops change you mentioned must be this one: > A not-immediately-obvious incompatibility is that the sort order within > bpchar_pattern_ops indexes changes --- it had been identical to plain > strcmp, but is now trailing-blank-insensitive. This will impact > in-place upgrades, if those ever happen. The way I read that, bpchar_pattern_ops just became less sensitive. Some values are now considered equal that weren't before, and thus can now be stored in any order. That's not an incompatible change, right? > Sequence should be converted during catalog conversion. Agreed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > Another idea is to create backward compatible AM and put them into separate > library. If these AM will work also with old page structure then there should > not be reason for reindexing or index page conversion after upgrade. I don't think that'd be real workable. It would require duplicating all the entries for that AM in pg_opfamily, pg_amop, etc. Which we could do for the built-in entries, I suppose, but what happens to user-defined operator classes? At least for the index changes proposed so far for 8.4, it seems to me that the best solution is to mark affected indexes as not "indisvalid" and require a post-conversion REINDEX to fix 'em. Obviously a better solution would be nice later, but we have to avoid putting huge amounts of work into noncritical problems, else the whole feature is just not going to get finished. regards, tom lane
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > The bpchar_pattern_ops change you mentioned must be this one: >> A not-immediately-obvious incompatibility is that the sort order within >> bpchar_pattern_ops indexes changes --- it had been identical to plain >> strcmp, but is now trailing-blank-insensitive. This will impact >> in-place upgrades, if those ever happen. Yup. > The way I read that, bpchar_pattern_ops just became less sensitive. Some > values are now considered equal that weren't before, and thus can now be > stored in any order. That's not an incompatible change, right? No, consider 'abc^I' vs 'abc ' (^I denoting a tab character). These are unequal in either case, but the sort order has flipped. regards, tom lane
Heikki Linnakangas napsal(a): > Zdenek Kotala wrote: >> Tom Lane napsal(a): >>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>>> In fact, I don't think there's any low-level data format changes yet >>>> between 8.3 and 8.4, so this would be a comparatively easy release >>>> to implement upgrade-in-place. There's just the catalog changes, but >>>> AFAICS nothing that would require scanning through relations. >>> >>> After a quick scan of the catversion.h changelog (which hopefully covers >>> any such changes): we changed sequences incompatibly, we changed hash >>> indexes incompatibly (even without the pending patch that would change >>> their contents beyond recognition), and Teodor did some stuff to GIN >>> indexes that might or might not represent an on-disk format change, >>> you'd have to ask him. We also whacked around the sort order of >>> bpchar_pattern_ops btree indexes. >> >> Hmm, It seems that reindex is only good answer on all these changes. > > Isn't that exactly what we want to avoid with upgrade-in-place? As long > as the conversion can be done page-at-a-time, without consulting other > pages, we can do it when the page is read in. Yes, but I meant what we can do for 8.4. Zdenek
Heikki Linnakangas napsal(a): > Zdenek Kotala wrote: >> Heikki Linnakangas napsal(a): >>> Relation forks didn't change anything inside relation files, so no >>> scanning of relations is required because of that. Neither will the >>> FSM rewrite. Not sure about DSM yet. >> >> Does it mean, that if you "inject" old data file after catalog >> upgrade, then FSM will works without any problem? > > Yes. You'll need to construct an FSM, but it doesn't necessarily need to > reflect the reality. You could just fill it with zeros, meaning that > there's no free space anywhere, and let the next vacuum fill it with > real information. Or you could read the old pg_fsm.cache file and fill > the new FSM accordingly. I think zeroed FSM is good, because new items should not be added on to old page. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Tom Lane napsal(a): > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: >> Another idea is to create backward compatible AM and put them into separate >> library. If these AM will work also with old page structure then there should >> not be reason for reindexing or index page conversion after upgrade. > > I don't think that'd be real workable. It would require duplicating all > the entries for that AM in pg_opfamily, pg_amop, etc. Which we could do > for the built-in entries, I suppose, but what happens to user-defined > operator classes? When catalog upgrade will be performed directly, user-defined op classes should stay in the catalog. But question is what's happen with regproc records and if all functions will be compatible with a new server ... It invokes idea that we need stable API for operator and data types implementation. All datatype which will use only this API, then can be used on new PostgreSQL versions without recompilation. > At least for the index changes proposed so far for 8.4, it seems to me > that the best solution is to mark affected indexes as not "indisvalid" > and require a post-conversion REINDEX to fix 'em. Obviously a better > solution would be nice later, but we have to avoid putting huge amounts > of work into noncritical problems, else the whole feature is just not > going to get finished. Agree. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
Zdenek Kotala wrote: > You mentioned data types, but it is not a problem. You can easily extend data > type attribute about version information and call correct in/out functions. Or > use different Oid for new data type version. There are more possible easy > solutions for data types. And for conversion You can use ALTER TABLE command. > Main idea is keep data in all format in a relation. This approach should use > also for integer/float datetime problem. This kind of code structure scares me that our system will become so complex that it will hinder our ability to continue making improvements. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +