Thread: [PATCH] Space reservation v02
I attached second version of space reservation patch. You can see first version here: http://archives.postgresql.org/pgsql-hackers/2008-12/msg00886.php I thought about Heikki'es comments and I removed all catalog changes, because there are not necessary to be in pg_class. Instead of pg_preugrade script should create own schema (pg_upgrade) and tables on its needs. This patch implement settings like relation options. Tom had objection about this approach. I can rewrite it and extend pg_class instead. However before I will do it I would like to know opinion about rest of the patch. And last thing is most important. Space reservation MUST TO be implemented if we want to have 8.4->8.5 upgrade. Else we will be at the begging... Zdenek
Attachment
Zdenek Kotala wrote: > > I attached second version of space reservation patch. You can see first > version here: > http://archives.postgresql.org/pgsql-hackers/2008-12/msg00886.php > > I thought about Heikki'es comments and I removed all catalog changes, > because there are not necessary to be in pg_class. Instead of > pg_preugrade script should create own schema (pg_upgrade) and tables on > its needs. > > This patch implement settings like relation options. Tom had objection > about this approach. I can rewrite it and extend pg_class instead. > However before I will do it I would like to know opinion about rest of > the patch. > > And last thing is most important. Space reservation MUST TO be implemented if we > want to have 8.4->8.5 upgrade. Else we will be at the begging... The patch has two space reservations, one per page, another per tuple. Now, thinking back, what types of changes have we made that increase storage size. The one that I can think of first is where we made a data type require larger storage. (I think inet/cidr.) This could not be handled by this patch because if a row had _two_ values of that type, there would be no way to specify this using the two supplied parameters. I think we should try backpatching space reservations routines for a few releases to see if we need to have these fixed parameters. One thing I think would help would be a pg_class column that says whether the table is ready for upgrading. This is something we can't easily backpatch and would be helpful so people could do their upgrade preparation in a staged manner, rather than having to do it all at once, and would give the upgrade scripts confidence that the backpatch had done everying needed. The backpatched code would set this pg_class column value when it was done making sure the table is ready for upgrade (probably via vacuum). I recommend an int2 column to store PG_VERSION_NUM / 100. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 29 Jan 2009, Bruce Momjian wrote: > One thing I think would help would be a pg_class column that says > whether the table is ready for upgrading. Some quick reminders here; Tom included that idea from one of your earlier discussions but also suggested a similar column in pg_database: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00436.php The main objection I saw was Heikki suggesting that this isn't necessarily forward progress if that's not actually sufficient to save all the state information needed: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00449.php This particular idea keeps coming up though, and I would imagine the worst case here is that it gets included in 8.4 but isn't enough; it's gotta at least help open options for future in-place upgrade work. (The other complaints revolved around multi-version upgrades, which are a pipe dream now anyway) Zdenek suggested that "latest processed block" would also be useful here: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00540.php P.S. If anybody would like a 275 message "best of" archive just following the last four months of the pg_upgrade saga, let me know off-list; that's how I found all those so fast. As just that subset consists of 22 threads, it's kind of messy to sort through most other ways. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Bruce Momjian wrote: > The patch has two space reservations, one per page, another per tuple. > Now, thinking back, what types of changes have we made that increase > storage size. The one that I can think of first is where we made a data > type require larger storage. (I think inet/cidr.) This could not be > handled by this patch because if a row had _two_ values of that type, > there would be no way to specify this using the two supplied parameters. Well, I believe the idea was that the pre-upgrade script that sets the space reservation would look at the catalogs to decide the right reservation for each table. There's still nasty cases like arrays of inets, however. > I think we should try backpatching space reservations routines for a few > releases to see if we need to have these fixed parameters. Agreed. > One thing I think would help would be a pg_class column that says > whether the table is ready for upgrading. This is something we can't > easily backpatch and would be helpful so people could do their upgrade > preparation in a staged manner, rather than having to do it all at once, > and would give the upgrade scripts confidence that the backpatch had > done everying needed. The backpatched code would set this pg_class > column value when it was done making sure the table is ready for upgrade > (probably via vacuum). I recommend an int2 column to store > PG_VERSION_NUM / 100. I think that being able to stop and restart the pre-upgrade process is a luxury we can add later. Also note that the pre-upgrade tool can use a flat file in the data directory to store state in a more free-form fashion. To implement restartability, for example, you could dump a list of relfilenodes not yet scanned to the file at start, and strike them out as you go. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Bruce Momjian wrote: > > The patch has two space reservations, one per page, another per tuple. > > Now, thinking back, what types of changes have we made that increase > > storage size. The one that I can think of first is where we made a data > > type require larger storage. (I think inet/cidr.) This could not be > > handled by this patch because if a row had _two_ values of that type, > > there would be no way to specify this using the two supplied parameters. > > Well, I believe the idea was that the pre-upgrade script that sets the > space reservation would look at the catalogs to decide the right > reservation for each table. Interesting --- so you set the reservation per table --- that seems much better than a GUC, certainly. I assume we would still need a per-page GUC that affects all tables? Or one for heap and one for index pages? > > One thing I think would help would be a pg_class column that says > > whether the table is ready for upgrading. This is something we can't > > easily backpatch and would be helpful so people could do their upgrade > > preparation in a staged manner, rather than having to do it all at once, > > and would give the upgrade scripts confidence that the backpatch had > > done everying needed. The backpatched code would set this pg_class > > column value when it was done making sure the table is ready for upgrade > > (probably via vacuum). I recommend an int2 column to store > > PG_VERSION_NUM / 100. > > I think that being able to stop and restart the pre-upgrade process is a > luxury we can add later. Also note that the pre-upgrade tool can use a > flat file in the data directory to store state in a more free-form > fashion. To implement restartability, for example, you could dump a list > of relfilenodes not yet scanned to the file at start, and strike them > out as you go. Well, I was thinking the new pg_class column would allow the upgrade to verify the pre-upgrade script was run properly, but a flat file works just as well if we assume we are going to pre-upgrade in one pass. However, I am afraid requiring this pre-upgrade to run while the server is basically in single-user mode will make upgrade-in-place be a long process for many users, and if it takes a significant time compared to dump/reload, they might as well dump/reload. But again, all this is trying to handle cases where the data size increases, which is a rare event for us. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian píše v pá 30. 01. 2009 v 10:41 -0500: > Heikki Linnakangas wrote: > > Bruce Momjian wrote: > > > The patch has two space reservations, one per page, another per tuple. > > > Now, thinking back, what types of changes have we made that increase > > > storage size. The one that I can think of first is where we made a data > > > type require larger storage. (I think inet/cidr.) This could not be > > > handled by this patch because if a row had _two_ values of that type, > > > there would be no way to specify this using the two supplied parameters. > > > > Well, I believe the idea was that the pre-upgrade script that sets the > > space reservation would look at the catalogs to decide the right > > reservation for each table. > > Interesting --- so you set the reservation per table --- that seems much > better than a GUC, certainly. I assume we would still need a per-page > GUC that affects all tables? Or one for heap and one for index pages? Each access methods has different requirements and it heavily depends on specific relations. Also TOAST tables has different requirements. GUC variable is not good option. > > > One thing I think would help would be a pg_class column that says > > > whether the table is ready for upgrading. This is something we can't > > > easily backpatch and would be helpful so people could do their upgrade > > > preparation in a staged manner, rather than having to do it all at once, > > > and would give the upgrade scripts confidence that the backpatch had > > > done everying needed. The backpatched code would set this pg_class > > > column value when it was done making sure the table is ready for upgrade > > > (probably via vacuum). I recommend an int2 column to store > > > PG_VERSION_NUM / 100. > > > > I think that being able to stop and restart the pre-upgrade process is a > > luxury we can add later. Also note that the pre-upgrade tool can use a > > flat file in the data directory to store state in a more free-form > > fashion. To implement restartability, for example, you could dump a list > > of relfilenodes not yet scanned to the file at start, and strike them > > out as you go. > > Well, I was thinking the new pg_class column would allow the upgrade to > verify the pre-upgrade script was run properly, but a flat file works > just as well if we assume we are going to pre-upgrade in one pass. Flat file or special table for pg_upgrade will work fine. > However, I am afraid requiring this pre-upgrade to run while the server > is basically in single-user mode will make upgrade-in-place be a long > process for many users, and if it takes a significant time compared to > dump/reload, they might as well dump/reload. pre_upgrade script should be run during normal operation. There will be some limitation. For example CREATE/ALTER TABLE can cause problems. Zdenek
Zdenek Kotala wrote: > Bruce Momjian píše v pá 30. 01. 2009 v 10:41 -0500: >> Well, I was thinking the new pg_class column would allow the upgrade to >> verify the pre-upgrade script was run properly, but a flat file works >> just as well if we assume we are going to pre-upgrade in one pass. > > Flat file or special table for pg_upgrade will work fine. Right, there's no difference in what you can achieve, whether you store the additional info in a flat file, special table or extra pg_class columns. If you can store something in pg_class, you can store it elsewhere just as well. >> However, I am afraid requiring this pre-upgrade to run while the server >> is basically in single-user mode will make upgrade-in-place be a long >> process for many users, and if it takes a significant time compared to >> dump/reload, they might as well dump/reload. > > pre_upgrade script should be run during normal operation. There will be > some limitation. Right. That's the whole point of having a pre-upgrade script. Otherwise you might as well run the conversion in the new version. > For example CREATE/ALTER TABLE can cause problems. Yeah, if the pre-upgrade script determines the amount of reserved space for each table, and sets it in pg_class or reloptions or whatever, it's not clear how mwhat to do with tables created after the script is run. I guess we need quick scan of pg_class before the actual upgrade to check that you don't have newly-created tables, and refuse the upgrade if there is. However, if we have the logic to determine how much space to reserve for a table in the backend, as a back-ported patch, then we can invoke it for new tables just as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Zdenek Kotala wrote: >> Bruce Momjian píše v pá 30. 01. 2009 v 10:41 -0500: >>> Well, I was thinking the new pg_class column would allow the upgrade to >>> verify the pre-upgrade script was run properly, but a flat file works >>> just as well if we assume we are going to pre-upgrade in one pass. >> >> Flat file or special table for pg_upgrade will work fine. > > Right, there's no difference in what you can achieve, whether you store the > additional info in a flat file, special table or extra pg_class columns. If you > can store something in pg_class, you can store it elsewhere just as well. Well having a column in pg_class does have some advantages. Like, you could look at the value from an sql session more easily. And if there are operations which we know are unsafe -- such as adding columns -- we could clear it from the server side easily. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
Heikki Linnakangas wrote: > > For example CREATE/ALTER TABLE can cause problems. > > Yeah, if the pre-upgrade script determines the amount of reserved space > for each table, and sets it in pg_class or reloptions or whatever, it's > not clear how mwhat to do with tables created after the script is run. I > guess we need quick scan of pg_class before the actual upgrade to check > that you don't have newly-created tables, and refuse the upgrade if > there is. This is where a pg_class column would be useful. You default the column value to -1. The pre-upgrade script sets the proper reserved space, and new tables get a -1 and you check for those just before the upgrade. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Heikki Linnakangas wrote: >>> For example CREATE/ALTER TABLE can cause problems. >> Yeah, if the pre-upgrade script determines the amount of reserved space >> for each table, and sets it in pg_class or reloptions or whatever, it's >> not clear how mwhat to do with tables created after the script is run. I >> guess we need quick scan of pg_class before the actual upgrade to check >> that you don't have newly-created tables, and refuse the upgrade if >> there is. > > This is where a pg_class column would be useful. You default the column > value to -1. The pre-upgrade script sets the proper reserved space, and > new tables get a -1 and you check for those just before the upgrade. You can do that with a flat file too. If there's any tables in the database that were not present when pre-upgrade script was started, throw an error. It might be a bit simpler with a pg_class column, but if we don't know what exactly we need to store there, and might need to resort to different storage anyway, it doesn't seem worth it. An extra table as Zdenek suggested in the passing might give the best of both worlds. The pre-upgrade script can create it when it's run, so we don't need to decide beforehand what columns we need, and it's a table so you can query it etc. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Bruce Momjian wrote: > > Heikki Linnakangas wrote: > >>> For example CREATE/ALTER TABLE can cause problems. > >> Yeah, if the pre-upgrade script determines the amount of reserved space > >> for each table, and sets it in pg_class or reloptions or whatever, it's > >> not clear how mwhat to do with tables created after the script is run. I > >> guess we need quick scan of pg_class before the actual upgrade to check > >> that you don't have newly-created tables, and refuse the upgrade if > >> there is. > > > > This is where a pg_class column would be useful. You default the column > > value to -1. The pre-upgrade script sets the proper reserved space, and > > new tables get a -1 and you check for those just before the upgrade. > > You can do that with a flat file too. If there's any tables in the > database that were not present when pre-upgrade script was started, > throw an error. > > It might be a bit simpler with a pg_class column, but if we don't know > what exactly we need to store there, and might need to resort to > different storage anyway, it doesn't seem worth it. > > An extra table as Zdenek suggested in the passing might give the best of > both worlds. The pre-upgrade script can create it when it's run, so we > don't need to decide beforehand what columns we need, and it's a table > so you can query it etc. Yep, makes sense. I had forgotten that idea; sorry. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Gregory Stark <stark@enterprisedb.com> writes: > Well having a column in pg_class does have some advantages. Like, you could > look at the value from an sql session more easily. And if there are operations > which we know are unsafe -- such as adding columns -- we could clear it from > the server side easily. Why would there be any unsafe operations? Surely the patch would add sufficient logic to prevent the old version from de-fixing any page that had already been fixed. If this is not so, the entire concept is broken, because you're still going to have to go to single-user mode for a long time to make sure that the whole database is in good shape. On the whole I agree with Heikki's earlier criticism: this is all about guessing the future, and the odds seem high that the actual requirements will not be what you designed for anyway. regards, tom lane
On Tue, 2009-01-27 at 13:06 +0100, Zdenek Kotala wrote: > Space reservation MUST TO be implemented if we > want to have 8.4->8.5 upgrade. Why not just add a few dummy columns onto each catalog table? If we need space to expand a row we can just drop one of the dummy columns from the new catalog definition. That's an old DBA trick to avoid having to rewrite a table when you want to add a column. Seems much simpler to add columns than write special code, especially when we might find the new code has bugs and hasn't done what we thought it might. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sat, Jan 31, 2009 at 4:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2009-01-27 at 13:06 +0100, Zdenek Kotala wrote: >> Space reservation MUST TO be implemented if we >> want to have 8.4->8.5 upgrade. > > Why not just add a few dummy columns onto each catalog table? If we need > space to expand a row we can just drop one of the dummy columns from the > new catalog definition. > > That's an old DBA trick to avoid having to rewrite a table when you want > to add a column. > > Seems much simpler to add columns than write special code, especially > when we might find the new code has bugs and hasn't done what we thought > it might. Wow, that is really sneaky. I like it! Actually, this even handles cases that the patch won't. For example, suppose there's an ARRAY[] of 2-byte objects and in PG 8.5 we make all of them 4 bytes. A fixed amount of space reservation per tuple is useless, but your idea works fine. Just add a new column of the same type and set them equal. As long as you can roughly predict how much extra crap to stuff in there, you're good. Can some variant of this be made to work if the index tuples expand? What if we're expanding the page header? Even if it can, space reservation (whether through adding dummy columns or an explicit facility) seems like an effort to avoid the hard problems of moving stuff around between pages during the upgrade process itself. I'm not sure that any such effort can be 100% successful, but then again if we don't need any special code changes to support it, maybe that doesn't matter right now. ...Robert
Simon Riggs escreveu: > On Tue, 2009-01-27 at 13:06 +0100, Zdenek Kotala wrote: > >> Space reservation MUST TO be implemented if we >> want to have 8.4->8.5 upgrade. > > Why not just add a few dummy columns onto each catalog table? If we need > space to expand a row we can just drop one of the dummy columns from the > new catalog definition. > The problem is: how much space do we need at each catalog table? Suppose that after 2 releases we need x + 1 bytes per tuple but we only foresaw x bytes; so we need to deprecate support for this versionand advise DBA to run pg_upgrade two times. I prefer a solution where we know beforehand the size per tuple so when we're rewriting we could use this information to upgrade. -- Euler Taveira de Oliveira http://www.timbira.com/
On Sat, Jan 31, 2009 at 01:28:58PM -0200, Euler Taveira de Oliveira wrote: > The problem is: how much space do we need at each catalog table? Suppose that > after 2 releases we need x + 1 bytes per tuple but we only foresaw x bytes; so > we need to deprecate support for this version and advise DBA to run > pg_upgrade two times. I prefer a solution where we know beforehand the size > per tuple so when we're rewriting we could use this information to upgrade. Just add one column of a variable length type, let pg_upgrade fill it with the required amount of padding prior to upgrade and you're done. Ofcourse, the simplest way to me for handling type changes seems to be to keep the old type OID reserved and have the new version of the type with a new OID. Then the entire problem vanishes. But it was decided a long time ago not to do that. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
> Ofcourse, the simplest way to me for handling type changes seems to be > to keep the old type OID reserved and have the new version of the type > with a new OID. Then the entire problem vanishes. But it was decided a > long time ago not to do that. Why was that decision made? Suppose you have a type called widget and you decide it sucks and you want to reimplement it. So in release N+1, you rename the old type to old_shitty_widget and leave it with the same OID, add the new type under the name widget with a different oid, and document that old_shitty_widget should not be used. Then in release N+2 you remove old_shitty_widget altogether. People who upgrade via pg_dump will automatically get the new and improved widget type because that is what is now called widget. But people who in-place upgrade will end up with the old_shitty_widget type. Then you just run some dead simple postupdate script that goes through and issues ALTER TABLE commands to change each old_shitty_widget column to a widget column. ...Robert
Robert Haas wrote: > People who upgrade via pg_dump will automatically get the new and > improved widget type because that is what is now called widget. But > people who in-place upgrade will end up with the old_shitty_widget > type. Then you just run some dead simple postupdate script that goes > through and issues ALTER TABLE commands to change each > old_shitty_widget column to a widget column. Altering column type with ALTER TABLE needs to exclusively-lock and rewrite the whole table, so you might as well pg_dump+restore that table. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Robert Haas wrote: >> Ofcourse, the simplest way to me for handling type changes seems to be >> to keep the old type OID reserved and have the new version of the type >> with a new OID. Then the entire problem vanishes. But it was decided a >> long time ago not to do that. > > Why was that decision made? Suppose you have a type called widget and > you decide it sucks and you want to reimplement it. So in release > N+1, you rename the old type to old_shitty_widget and leave it with > the same OID, add the new type under the name widget with a different > oid, and document that old_shitty_widget should not be used. Then in > release N+2 you remove old_shitty_widget altogether. Yeah, that works. The other approach is to convert the data types along with the new page format. That works too, and avoids having to keep around the old type and all that deprecation and stuff. I don't remember any hard decision on that, and we're not facing any data type changes in this release IIRC. It is something we should consider on a case-by-case basis when we get there. There might be reasons to do it like that, if for example the old format can't be converted to new format in a non-lossy fashion (e.g. float-timestamps -> integer-timestamps). And authors of 3rd party data types are naturally free to do what they want. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Jan 31, 2009, at 2:44 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com > wrote: > Robert Haas wrote: >> People who upgrade via pg_dump will automatically get the new and >> improved widget type because that is what is now called widget. But >> people who in-place upgrade will end up with the old_shitty_widget >> type. Then you just run some dead simple postupdate script that goes >> through and issues ALTER TABLE commands to change each >> old_shitty_widget column to a widget column. > > Altering column type with ALTER TABLE needs to exclusively-lock and > rewrite the whole table, so you might as well pg_dump+restore that > table. Not at all. Lots of DDL operations take table locks, but they're still a lot more convenient than dump+restore. Think dependencies. A stickier wicket is how to handle functions and views that depend on the old type, so maybe this isn't quite as clean as I thought. ...Robert
Gregory Stark píše v pá 30. 01. 2009 v 16:56 +0000: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > > Zdenek Kotala wrote: > >> Bruce Momjian píše v pá 30. 01. 2009 v 10:41 -0500: > >>> Well, I was thinking the new pg_class column would allow the upgrade to > >>> verify the pre-upgrade script was run properly, but a flat file works > >>> just as well if we assume we are going to pre-upgrade in one pass. > >> > >> Flat file or special table for pg_upgrade will work fine. > > > > Right, there's no difference in what you can achieve, whether you store the > > additional info in a flat file, special table or extra pg_class columns. If you > > can store something in pg_class, you can store it elsewhere just as well. > > Well having a column in pg_class does have some advantages. Like, you could > look at the value from an sql session more easily. And if there are operations > which we know are unsafe -- such as adding columns -- we could clear it from > the server side easily. I think, For pg_upgrade script is more useful to have possibility to registry triggers on metadata change. It is general feature and after that you can do what you want. Zdenek
Heikki Linnakangas píše v so 31. 01. 2009 v 21:56 +0200: > Robert Haas wrote: > >> Ofcourse, the simplest way to me for handling type changes seems to be > >> to keep the old type OID reserved and have the new version of the type > >> with a new OID. Then the entire problem vanishes. But it was decided a > >> long time ago not to do that. > > > > Why was that decision made? Suppose you have a type called widget and > > you decide it sucks and you want to reimplement it. So in release > > N+1, you rename the old type to old_shitty_widget and leave it with > > the same OID, add the new type under the name widget with a different > > oid, and document that old_shitty_widget should not be used. Then in > > release N+2 you remove old_shitty_widget altogether. > > Yeah, that works. The other approach is to convert the data types along > with the new page format. That works too, and avoids having to keep > around the old type and all that deprecation and stuff. > > I don't remember any hard decision on that, and we're not facing any > data type changes in this release IIRC. It is something we should > consider on a case-by-case basis when we get there. There might be > reasons to do it like that, if for example the old format can't be > converted to new format in a non-lossy fashion (e.g. float-timestamps -> > integer-timestamps). And authors of 3rd party data types are naturally > free to do what they want. I think there is a confusion, because tuple change size when: 1) on disk structure like tupleheader, varlena, array header ... changed size or 2) datatype representation changed size. We discussed mostly #1 case. It maybe invoked meaning that ALTER TABLE is ignored. But it is not true. I agree with Heikki, data type conversion should be case-by-case and ALTER TABLE is also good solution. Zdenek