Thread: Lots 'o patches
I've just committed a bunch of patches, mostly to help with parsing and type conversion. The quick summary: 1) The UNION construct will now try to coerce types across each UNION clause. At the moment, the types are converted to match the _first_ select clause, rather than matching the "best" data type across all the clauses. I can see arguments for either behavior, and I'm pretty sure either behavior can be implemented. Since the first clause is a bit "special" anyway (that is the one which can name output columns, for example), it seemed that perhaps this was a good choice. Any comments?? 2) The name data type will now transparently convert to and from other string types. For example, SELECT USER || ' is me'; now works. 3) A regression test for UNIONs has been added. SQL92 string functions are now included in the "strings" regression test. Other regression tests have been updated, and all tests pass on my Linux/i686 box. I'm planning on writing a section in the new docs discussing type conversion and coercion, once the behavior becomes set for v6.4. I think the new type conversion/coercion stuff is pretty solid, and I've tested as much as I can think of wrt behavior. It can benefit from testing by others to uncover any unanticipated problems, so let me know what you find... - Tom Oh, requires a dump/reload to get the string conversions for the name data type.
> I've just committed a bunch of patches, mostly to help with parsing and > type conversion. The quick summary: > > 1) The UNION construct will now try to coerce types across each UNION > clause. At the moment, the types are converted to match the _first_ > select clause, rather than matching the "best" data type across all the > clauses. I can see arguments for either behavior, and I'm pretty sure > either behavior can be implemented. Since the first clause is a bit > "special" anyway (that is the one which can name output columns, for > example), it seemed that perhaps this was a good choice. Any comments?? I think this is good. The important thing really is that we have a consistant "story" we can tell about how and why it works so that a user can form a mental model of the system that is useful when trying to compose a query. Ie, the principal of least surprise. The story "the first select picks the names and types for the columns and the other selects are cooerced match" seems quite clear and easy to understand. The story "the first select picks the names and then we consider all the possible conversions throughout the other selects and resolve them using the type heirarchy" is not quite as obvious. What we don't want is a story that approximates "we sacrifice a goat and examine the entrails". > 2) The name data type will now transparently convert to and from other > string types. For example, > > SELECT USER || ' is me'; > > now works. Good. > 3) A regression test for UNIONs has been added. SQL92 string functions > are now included in the "strings" regression test. Other regression > tests have been updated, and all tests pass on my Linux/i686 box. Very good. > I'm planning on writing a section in the new docs discussing type > conversion and coercion, once the behavior becomes set for v6.4. Even better. > I think the new type conversion/coercion stuff is pretty solid, and I've > tested as much as I can think of wrt behavior. It can benefit from > testing by others to uncover any unanticipated problems, so let me know > what you find... Will do. > - Tom > > Oh, requires a dump/reload to get the string conversions for the name > data type. Ooops. I guess we need to add "make a useful upgrade procedure" to our todo list. I am not picking on this patch, it is a problem of long standing but as we get into real applications it will become increasingly unacceptable. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 "Of course, someone who knows more about this will correct me if I'm wrong, and someone who knows less will correct me if I'm right." --David Palmer (palmer@tybalt.caltech.edu)
I don't quite understand "to get the string conversions for the name data type" (unless it refers to inserting the appropriate info into the system catalogs), but dump/reload it isn't a problem at all for me. It used to really suck, mostly because it was broken, but now it works great. On Sun, 31 May 1998, at 16:31:27, David Gould wrote: > > Oh, requires a dump/reload to get the string conversions for the name > > data type. > > Ooops. I guess we need to add "make a useful upgrade procedure" to our > todo list. I am not picking on this patch, it is a problem of long standing > but as we get into real applications it will become increasingly > unacceptable. > > -dg > > David Gould dg@illustra.com 510.628.3783 or 510.305.9468 > Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 > "Of course, someone who knows more about this will correct me if I'm wrong, > and someone who knows less will correct me if I'm right." > --David Palmer (palmer@tybalt.caltech.edu) >
> Ooops. I guess we need to add "make a useful upgrade procedure" to our > todo list. I am not picking on this patch, it is a problem of long standing > but as we get into real applications it will become increasingly > unacceptable. You don't like the fact that upgrades require a dump/reload? I am not sure we will ever succeed in not requiring that. We change the system tables too much, because we are a type-neutral system. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> I don't quite understand "to get the string conversions for the name > data type" (unless it refers to inserting the appropriate info into > the system catalogs), but dump/reload it isn't a problem at all for > me. It used to really suck, mostly because it was broken, but now it > works great. > > On Sun, 31 May 1998, at 16:31:27, David Gould wrote: > > > > Oh, requires a dump/reload to get the string conversions for the name > > > data type. > > > > Ooops. I guess we need to add "make a useful upgrade procedure" to our > > todo list. I am not picking on this patch, it is a problem of long standing > > but as we get into real applications it will become increasingly > > unacceptable. One of the Illustra customers moving to Informix UDO that I have had the pleasure of working with is Egghead software. They sell stuff over the web. 24 hours a day. Every day. Their database takes something like 20 hours to dump and reload. The last time they did that they were down the whole time and it made the headline spot on cnet news. Not good. I don't think they want to do it again. If we want postgresql to be usable by real businesses, requiring downtime is not acceptable. A proper upgrade would just update the catalogs online and fix any other issues without needing a dump / restore cycle. As a Sybase customer once told one of our support people in a very loud voice "THIS is NOT a "Name and Address" database. WE SELL STOCKS!". -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software 300 Lakeside Drive Oakland, CA 94612 - A child of five could understand this! Fetch me a child of five.
> One of the Illustra customers moving to Informix UDO that I have had the > pleasure of working with is Egghead software. They sell stuff over the web. > 24 hours a day. Every day. Their database takes something like 20 hours to > dump and reload. The last time they did that they were down the whole time > and it made the headline spot on cnet news. Not good. I don't think they > want to do it again. > > If we want postgresql to be usable by real businesses, requiring downtime is > not acceptable. > > A proper upgrade would just update the catalogs online and fix any other > issues without needing a dump / restore cycle. > > As a Sybase customer once told one of our support people in a very loud voice > "THIS is NOT a "Name and Address" database. WE SELL STOCKS!". That is going to be difficult to do. We used to have some SQL scripts that could make the required database changes, but when system table structure changes, I can't imagine how we would migrate that without a dump/reload. I suppose we could keep the data/index files with user data, run initdb, and move the data files back, but we need the system table info reloaded into the new system tables. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > That is going to be difficult to do. We used to have some SQL scripts > that could make the required database changes, but when system table > structure changes, I can't imagine how we would migrate that without a > dump/reload. I suppose we could keep the data/index files with user data, > run initdb, and move the data files back, but we need the system table > info reloaded into the new system tables. If the tuple header info doesn't change, this doesn't seem that tough. Just do a dump the pg_* tables and reload them. The system tables are "small" compared to the size of user data/indexes, no? Or is there some extremely obvious reason that this is harder than it seems? But then again, what are the odds that changes for a release will only affect system tables so not to require a data dump? Not good I'd say. darrenk
> > > > > That is going to be difficult to do. We used to have some SQL scripts > > that could make the required database changes, but when system table > > structure changes, I can't imagine how we would migrate that without a > > dump/reload. I suppose we could keep the data/index files with user data, > > run initdb, and move the data files back, but we need the system table > > info reloaded into the new system tables. > > If the tuple header info doesn't change, this doesn't seem that tough. > Just do a dump the pg_* tables and reload them. The system tables are > "small" compared to the size of user data/indexes, no? I like this idea. > Or is there some extremely obvious reason that this is harder than it > seems? > > But then again, what are the odds that changes for a release will only > affect system tables so not to require a data dump? Not good I'd say. Hmmm, not bad either, especially if we are a little bit careful not to break existing on disk structures, or to make things downward compatible. For example, if we added a b-tree clustered index access method, this should not invalidate all existing tables and indexes, they just couldn't take advantage of it until rebuilt. On the other hand, if we decided to change to say 64 bit oids, I can see a reload being required. I guess that in our situation we will occassionally have changes that require a dump/load. But this should really only be required for the addition of a major feature that offers enough benifit to the user that they can see that it is worth the pain. Without knowing the history, the impression I have formed is that we have sort of assumed that each release will require a dump/load to do the upgrade. I would like to see us adopt a policy of trying to avoid this unless there is a compelling reason to make an exception. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 "Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." -- Howard Aiken
David Gould wrote: > > > > > > > > > That is going to be difficult to do. We used to have some SQL scripts > > > that could make the required database changes, but when system table > > > structure changes, I can't imagine how we would migrate that without a > > > dump/reload. I suppose we could keep the data/index files with user data, > > > run initdb, and move the data files back, but we need the system table > > > info reloaded into the new system tables. > > > > If the tuple header info doesn't change, this doesn't seem that tough. > > Just do a dump the pg_* tables and reload them. The system tables are > > "small" compared to the size of user data/indexes, no? > > I like this idea. > > > Or is there some extremely obvious reason that this is harder than it > > seems? > > > > But then again, what are the odds that changes for a release will only > > affect system tables so not to require a data dump? Not good I'd say. > > Hmmm, not bad either, especially if we are a little bit careful not to > break existing on disk structures, or to make things downward compatible. > > For example, if we added a b-tree clustered index access method, this should > not invalidate all existing tables and indexes, they just couldn't take > advantage of it until rebuilt. > > On the other hand, if we decided to change to say 64 bit oids, I can see > a reload being required. > > I guess that in our situation we will occassionally have changes that require > a dump/load. But this should really only be required for the addition of a > major feature that offers enough benifit to the user that they can see that > it is worth the pain. > > Without knowing the history, the impression I have formed is that we have > sort of assumed that each release will require a dump/load to do the upgrade. > I would like to see us adopt a policy of trying to avoid this unless there > is a compelling reason to make an exception. How about making a file specifying what to do when upgrading from one version of pg to another? Then a program, let's call it 'pgconv', would read this file and do the conversions from the old to the new format using pg_dump and psql and/or some other helper programs. pgconv should be able to skip versions (upgrade from 6.2 to 6.4 for example, skipping 6.2.1, 6.3 and 6.3.2) by simply going through all steps from version to version. Wouldn't this be much easier than having to follow instructions written in HRF? Nobody could mess up their data, because the program would always do the correct conversions. Btw, does pg_dump quote identifiers? CREATE TABLE "table" ("int" int, "char" char) for example? I know it did not use to, but perhaps it does now? (Very simplified example follows): ---------------------------------- % cd /usr/src/pgsql6.4 % pgconv /usr/local/pgsql -y -- PgConv1.0 - PostgreSQL data conversion program -- Found old version 6.3 in /usr/local/pgsql/ Convert to 6.4 (y/n)? (yes) >> Converting 6.3->6.3.2 > Creating shadow passwords >> Converting 6.3.2->6.3.4 > System tables converted > Data files converted PgConv done. Now delete the old binaries, install the new binaries with 'make install' and make sure you have your PATH set correctly. Please don't forget to run 'ldconfig' after installing the new libraries. (pgconv.data): -------------- #From To What to do # epoch 6.2 ERROR("Can not upgrade - too old version") 6.2 6.3 SQL("some-sql-commands-here") DELETE("obsolete-file") OLDVER_DUMPALL() # To temp file NEWVER_LOADALL() # From temp file 6.3 6.3.2 PRINT("Creating shadow passwords") SQL("create-pg_shadow") SYSTEM("chmod go-rwx pg_user") SQL("some-sql-commands") 6.3.2 6.4 SQL("some-commands") SYSTEM("chmod some-files") PRINT("System tables converted") SQL("some-other-commands") PRINT("Data files converted") /* m */
Mattias Kregert wrote: > How about making a file specifying what to do when upgrading from one > version of pg to another? Then a program, let's call it 'pgconv', would > read this file and do the conversions from the old to the new format > using pg_dump and psql and/or some other helper programs. I think what is needed is a replication program, since pgsql uses socket comunication it is quiet easy to run 2 concurrent systems say one each of 6.3.2 and 6.4 and copy beteewn them at run-time. The easiest way would be to use dump&load but as David pointed out in a case where dump&load takes 20 hours it means 20 hours downtime unless we want inconsistent data (data inserted/updated while copying). A smarter replication would make the downtime shorter since most data would be upto date and only latest changes need to be transfer during "update downtime". Such a mechanism would be even more useful for other proposes like clustering/backup/redundancy etc. Has anyone looked as this? The only thing I have seen is the Mariposa project which seems to be somewhat overkill for most applications. -- --------------------------------------------- Göran Thyni, sysadm, JMS Bildbasen, Kiruna
Attachment
> > ... if we are a little bit careful not to > > break existing on disk structures, or to make things downward > > compatible. > > For example, if we added a b-tree clustered index access method, > > this should not invalidate all existing tables and indexes, they > > just couldn't take advantage of it until rebuilt. > > On the other hand, if we decided to change to say 64 bit oids, I can > > see a reload being required. > > I guess that in our situation we will occassionally have changes > > that require a dump/load. But this should really only be required > > for the addition of a major feature that offers enough benifit to > > the user that they can see that it is worth the pain. > > Without knowing the history, the impression I have formed is that we > > have sort of assumed that each release will require a dump/load to > > do the upgrade. I would like to see us adopt a policy of trying to > > avoid this unless there is a compelling reason to make an exception. We tried pretty hard to do this at the start of the v6.x releases, and failed. A few of the reasons as I recall: 1) most changes/improvements involve changes to one or more system catalogs 2) postgres does not allow updates/inserts to at least some system catalogs (perhaps because of interactions with the compiled catalog cache?). 3) system catalogs appear in every database directory, so all databases would need to be upgraded > How about making a file specifying what to do when upgrading from one > version of pg to another? Then a program, let's call it 'pgconv', > would read this file and do the conversions from the old to the new > format using pg_dump and psql and/or some other helper programs. > > pgconv should be able to skip versions (upgrade from 6.2 to 6.4 for > example, skipping 6.2.1, 6.3 and 6.3.2) by simply going through all > steps from version to version. > > Wouldn't this be much easier than having to follow instructions > written in HRF? Nobody could mess up their data, because the > program would always do the correct conversions. This will be a good bit of work, and would be nice to have but we'd probably need a few people to take this on as a project. Right now, the most active developers are already spending more time than they should working on Postgres :) I haven't been too worried about this, but then I don't run big databases which need to be upgraded. Seems the dump/reload frees us to make substantial improvements with each release without a huge burden of ensuring backward compatibility. At the prices we charge, it might be a good tradeoff for users... > Btw, does pg_dump quote identifiers? CREATE TABLE "table" > ("int" int, "char" char) for example? I know it did not > use to, but perhaps it does now? If it doesn't yet (I assume it doesn't), I'm planning on looking at it for v6.4. Or do you want to look at it Bruce? We should be looking to have all identifiers double-quoted, to preserve case, reserved words, and weird characters in names. - Tom
On Tue, 2 Jun 1998, Goran Thyni wrote: > Mattias Kregert wrote: > > How about making a file specifying what to do when upgrading from one > > version of pg to another? Then a program, let's call it 'pgconv', would > > read this file and do the conversions from the old to the new format > > using pg_dump and psql and/or some other helper programs. > > I think what is needed is a replication program, since pgsql uses > socket comunication it is quiet easy to run 2 concurrent systems > say one each of 6.3.2 and 6.4 and copy beteewn them at run-time. > > The easiest way would be to use dump&load but as David pointed out in > a case where dump&load takes 20 hours it means 20 hours downtime unless > we want inconsistent data (data inserted/updated while copying). > > A smarter replication would make the downtime shorter since most data > would be upto date and only latest changes need to be transfer during > "update downtime". > > Such a mechanism would be even more useful for other proposes like > clustering/backup/redundancy etc. > > Has anyone looked as this? > The only thing I have seen is the Mariposa project which seems to be > somewhat overkill for most applications. Someone had scripts for this that they were going to submit, but I never heard further on it :(
> If it doesn't yet (I assume it doesn't), I'm planning on looking at it > for v6.4. Or do you want to look at it Bruce? We should be looking to > have all identifiers double-quoted, to preserve case, reserved words, > and weird characters in names. Would someone research this, and I can add it to the todo list. Never used quoted identifiers. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
1> How about making a file specifying what to do when upgrading from one > version of pg to another? Then a program, let's call it 'pgconv', would > read this file and do the conversions from the old to the new format > using pg_dump and psql and/or some other helper programs. We already have the migration directory, but it only text, no scripts currently. During 1.*, we did supply script for the upgrade, but the feature changes were small. > > (pgconv.data): > -------------- > #From To What to do > # > epoch 6.2 ERROR("Can not upgrade - too old version") > 6.2 6.3 SQL("some-sql-commands-here") > DELETE("obsolete-file") > OLDVER_DUMPALL() # To temp file > NEWVER_LOADALL() # From temp file > 6.3 6.3.2 PRINT("Creating shadow passwords") > SQL("create-pg_shadow") > SYSTEM("chmod go-rwx pg_user") > SQL("some-sql-commands") > 6.3.2 6.4 SQL("some-commands") > SYSTEM("chmod some-files") > PRINT("System tables converted") > SQL("some-other-commands") > PRINT("Data files converted") Interesting ideas, but in fact, all installs will probably require a new initdb. Because of the interdependent nature of the system tables, it is hard to make changes to them using SQL statements. What we could try is doing a pg_dump_all -schema-only, moving all the non pg_* files to a separate directory, running initdb, loading the pg_dumped schema, then moving the data files back into place. That may work. But if we change the on-disk layout of the data, like we did when we made varchar() variable length, a dump-reload would be required. Vadim made on-disk data improvements for many releases. We could make it happen even for complex cases, but then we come up on the problem of whether it is wise to allocate limited development time to migration issues. I think the requirement of running the new initdb, and moving the data files back into place is our best bet. I would be intested to see if that works. Does someone want to try doing this with the regression test database? Do a pg_dump with data before and after the operation, and see if it the same. This is a good way to test pg_dump too. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)