Thread: Lots 'o patches

Lots 'o patches

From
"Thomas G. Lockhart"
Date:
I've just committed a bunch of patches, mostly to help with parsing and
type conversion. The quick summary:

1) The UNION construct will now try to coerce types across each UNION
clause. At the moment, the types are converted to match the _first_
select clause, rather than matching the "best" data type across all the
clauses. I can see arguments for either behavior, and I'm pretty sure
either behavior can be implemented. Since the first clause is a bit
"special" anyway (that is the one which can name output columns, for
example), it seemed that perhaps this was a good choice. Any comments??

2) The name data type will now transparently convert to and from other
string types. For example,

  SELECT USER || ' is me';

now works.

3) A regression test for UNIONs has been added. SQL92 string functions
are now included in the "strings" regression test. Other regression
tests have been updated, and all tests pass on my Linux/i686 box.

I'm planning on writing a section in the new docs discussing type
conversion and coercion, once the behavior becomes set for v6.4.

I think the new type conversion/coercion stuff is pretty solid, and I've
tested as much as I can think of wrt behavior. It can benefit from
testing by others to uncover any unanticipated problems, so let me know
what you find...

                      - Tom

Oh, requires a dump/reload to get the string conversions for the name
data type.

Re: [HACKERS] Lots 'o patches

From
dg@illustra.com (David Gould)
Date:
> I've just committed a bunch of patches, mostly to help with parsing and
> type conversion. The quick summary:
>
> 1) The UNION construct will now try to coerce types across each UNION
> clause. At the moment, the types are converted to match the _first_
> select clause, rather than matching the "best" data type across all the
> clauses. I can see arguments for either behavior, and I'm pretty sure
> either behavior can be implemented. Since the first clause is a bit
> "special" anyway (that is the one which can name output columns, for
> example), it seemed that perhaps this was a good choice. Any comments??

I think this is good. The important thing really is that we have a
consistant "story" we can tell about how and why it works so that a user can
form a mental model of the system that is useful when trying to compose
a query. Ie, the principal of least surprise.

The story "the first select picks the names and types for the columns and
the other selects are cooerced match" seems quite clear and easy to understand.

The story "the first select picks the names and then we consider all the
possible conversions throughout the other selects and resolve them using
the type heirarchy" is not quite as obvious.

What we don't want is a story that approximates "we sacrifice a goat and
examine the entrails".

> 2) The name data type will now transparently convert to and from other
> string types. For example,
>
>   SELECT USER || ' is me';
>
> now works.

Good.

> 3) A regression test for UNIONs has been added. SQL92 string functions
> are now included in the "strings" regression test. Other regression
> tests have been updated, and all tests pass on my Linux/i686 box.

Very good.

> I'm planning on writing a section in the new docs discussing type
> conversion and coercion, once the behavior becomes set for v6.4.

Even better.

> I think the new type conversion/coercion stuff is pretty solid, and I've
> tested as much as I can think of wrt behavior. It can benefit from
> testing by others to uncover any unanticipated problems, so let me know
> what you find...

Will do.

>                       - Tom
>
> Oh, requires a dump/reload to get the string conversions for the name
> data type.

Ooops. I guess we need to add "make a useful upgrade procedure" to our
todo list. I am not picking on this patch, it is a problem of long standing
but as we get into real applications it will become increasingly
unacceptable.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Of course, someone who knows more about this will correct me if I'm wrong,
 and someone who knows less will correct me if I'm right."
               --David Palmer (palmer@tybalt.caltech.edu)

Re: [HACKERS] Lots 'o patches

From
Brett McCormick
Date:
I don't quite understand "to get the string conversions for the name
data type" (unless it refers to inserting the appropriate info into
the system catalogs), but dump/reload it isn't a problem at all for
me.  It used to really suck, mostly because it was broken, but now it
works great.

On Sun, 31 May 1998, at 16:31:27, David Gould wrote:

> > Oh, requires a dump/reload to get the string conversions for the name
> > data type.
>
> Ooops. I guess we need to add "make a useful upgrade procedure" to our
> todo list. I am not picking on this patch, it is a problem of long standing
> but as we get into real applications it will become increasingly
> unacceptable.
>
> -dg
>
> David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
> Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
> "Of course, someone who knows more about this will correct me if I'm wrong,
>  and someone who knows less will correct me if I'm right."
>                --David Palmer (palmer@tybalt.caltech.edu)
>

Re: [HACKERS] Lots 'o patches

From
Bruce Momjian
Date:
> Ooops. I guess we need to add "make a useful upgrade procedure" to our
> todo list. I am not picking on this patch, it is a problem of long standing
> but as we get into real applications it will become increasingly
> unacceptable.

You don't like the fact that upgrades require a dump/reload?  I am not
sure we will ever succeed in not requiring that.  We change the system
tables too much, because we are a type-neutral system.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] Lots 'o patches

From
dg@illustra.com (David Gould)
Date:
> I don't quite understand "to get the string conversions for the name
> data type" (unless it refers to inserting the appropriate info into
> the system catalogs), but dump/reload it isn't a problem at all for
> me.  It used to really suck, mostly because it was broken, but now it
> works great.
>
> On Sun, 31 May 1998, at 16:31:27, David Gould wrote:
>
> > > Oh, requires a dump/reload to get the string conversions for the name
> > > data type.
> >
> > Ooops. I guess we need to add "make a useful upgrade procedure" to our
> > todo list. I am not picking on this patch, it is a problem of long standing
> > but as we get into real applications it will become increasingly
> > unacceptable.

One of the Illustra customers moving to Informix UDO that I have had the
pleasure of working with is Egghead software. They sell stuff over the web.
24 hours a day. Every day. Their database takes something like 20 hours to
dump and reload. The last time they did that they were down the whole time
and it made the headline spot on cnet news. Not good. I don't think they
want to do it again.

If we want postgresql to be usable by real businesses, requiring downtime is
not acceptable.

A proper upgrade would just update the catalogs online and fix any other
issues without needing a dump / restore cycle.

As a Sybase customer once told one of our support people in a very loud voice
 "THIS is NOT a "Name and Address" database. WE SELL STOCKS!".

-dg


David Gould           dg@illustra.com            510.628.3783 or 510.305.9468
Informix Software                      300 Lakeside Drive   Oakland, CA 94612
 - A child of five could understand this!  Fetch me a child of five.


Re: [HACKERS] Lots 'o patches

From
Bruce Momjian
Date:
> One of the Illustra customers moving to Informix UDO that I have had the
> pleasure of working with is Egghead software. They sell stuff over the web.
> 24 hours a day. Every day. Their database takes something like 20 hours to
> dump and reload. The last time they did that they were down the whole time
> and it made the headline spot on cnet news. Not good. I don't think they
> want to do it again.
>
> If we want postgresql to be usable by real businesses, requiring downtime is
> not acceptable.
>
> A proper upgrade would just update the catalogs online and fix any other
> issues without needing a dump / restore cycle.
>
> As a Sybase customer once told one of our support people in a very loud voice
>  "THIS is NOT a "Name and Address" database. WE SELL STOCKS!".

That is going to be difficult to do.  We used to have some SQL scripts
that could make the required database changes, but when system table
structure changes, I can't imagine how we would migrate that without a
dump/reload.  I suppose we could keep the data/index files with user data,
run initdb, and move the data files back, but we need the system table
info reloaded into the new system tables.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

RE: [HACKERS] Lots 'o patches

From
"Stupor Genius"
Date:
>
> That is going to be difficult to do.  We used to have some SQL scripts
> that could make the required database changes, but when system table
> structure changes, I can't imagine how we would migrate that without a
> dump/reload.  I suppose we could keep the data/index files with user data,
> run initdb, and move the data files back, but we need the system table
> info reloaded into the new system tables.

If the tuple header info doesn't change, this doesn't seem that tough.
Just do a dump the pg_* tables and reload them.  The system tables are
"small" compared to the size of user data/indexes, no?

Or is there some extremely obvious reason that this is harder than it
seems?

But then again, what are the odds that changes for a release will only
affect system tables so not to require a data dump?  Not good I'd say.

darrenk


Re: [HACKERS] Lots 'o patches

From
dg@illustra.com (David Gould)
Date:
>
> >
> > That is going to be difficult to do.  We used to have some SQL scripts
> > that could make the required database changes, but when system table
> > structure changes, I can't imagine how we would migrate that without a
> > dump/reload.  I suppose we could keep the data/index files with user data,
> > run initdb, and move the data files back, but we need the system table
> > info reloaded into the new system tables.
>
> If the tuple header info doesn't change, this doesn't seem that tough.
> Just do a dump the pg_* tables and reload them.  The system tables are
> "small" compared to the size of user data/indexes, no?

I like this idea.

> Or is there some extremely obvious reason that this is harder than it
> seems?
>
> But then again, what are the odds that changes for a release will only
> affect system tables so not to require a data dump?  Not good I'd say.

Hmmm, not bad either, especially if we are a little bit careful not to
break existing on disk structures, or to make things downward compatible.

For example, if we added a b-tree clustered index access method, this should
not invalidate all existing tables and indexes, they just couldn't take
advantage of it until rebuilt.

On the other hand, if we decided to change to say 64 bit oids, I can see
a reload being required.

I guess that in our situation we will occassionally have changes that require
a dump/load. But this should really only be required for the addition of a
major feature that offers enough benifit to the user that they can see that
it is worth the pain.

Without knowing the history, the impression I have formed is that we have
sort of assumed that each release will require a dump/load to do the upgrade.
I would like to see us adopt a policy of trying to avoid this unless there
is a compelling reason to make an exception.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Don't worry about people stealing your ideas.  If your ideas are any
 good, you'll have to ram them down people's throats." -- Howard Aiken



An easier way to upgrade (Was: Lots 'o patches)

From
Mattias Kregert
Date:
David Gould wrote:
>
> >
> > >
> > > That is going to be difficult to do.  We used to have some SQL scripts
> > > that could make the required database changes, but when system table
> > > structure changes, I can't imagine how we would migrate that without a
> > > dump/reload.  I suppose we could keep the data/index files with user data,
> > > run initdb, and move the data files back, but we need the system table
> > > info reloaded into the new system tables.
> >
> > If the tuple header info doesn't change, this doesn't seem that tough.
> > Just do a dump the pg_* tables and reload them.  The system tables are
> > "small" compared to the size of user data/indexes, no?
>
> I like this idea.
>
> > Or is there some extremely obvious reason that this is harder than it
> > seems?
> >
> > But then again, what are the odds that changes for a release will only
> > affect system tables so not to require a data dump?  Not good I'd say.
>
> Hmmm, not bad either, especially if we are a little bit careful not to
> break existing on disk structures, or to make things downward compatible.
>
> For example, if we added a b-tree clustered index access method, this should
> not invalidate all existing tables and indexes, they just couldn't take
> advantage of it until rebuilt.
>


> On the other hand, if we decided to change to say 64 bit oids, I can see
> a reload being required.
>
> I guess that in our situation we will occassionally have changes that require
> a dump/load. But this should really only be required for the addition of a
> major feature that offers enough benifit to the user that they can see that
> it is worth the pain.
>
> Without knowing the history, the impression I have formed is that we have
> sort of assumed that each release will require a dump/load to do the upgrade.
> I would like to see us adopt a policy of trying to avoid this unless there
> is a compelling reason to make an exception.


How about making a file specifying what to do when upgrading from one
version of pg to another? Then a program, let's call it 'pgconv', would
read this file and do the conversions from the old to the new format
using pg_dump and psql and/or some other helper programs.

pgconv should be able to skip versions (upgrade from 6.2 to 6.4 for
example, skipping 6.2.1, 6.3 and 6.3.2) by simply going through all
steps from version to version.

Wouldn't this be much easier than having to follow instructions
written in HRF? Nobody could mess up their data, because the
program would always do the correct conversions.

Btw, does pg_dump quote identifiers? CREATE TABLE "table"
("int" int, "char" char) for example? I know it did not
use to, but perhaps it does now?


(Very simplified example follows):
----------------------------------
% cd /usr/src/pgsql6.4
% pgconv /usr/local/pgsql -y
-- PgConv1.0 - PostgreSQL data conversion program --

Found old version 6.3 in /usr/local/pgsql/
Convert to 6.4 (y/n)? (yes)

>> Converting 6.3->6.3.2
> Creating shadow passwords

>> Converting 6.3.2->6.3.4
> System tables converted
> Data files converted

PgConv done. Now delete the old binaries, install
the new binaries with 'make install' and make sure
you have your PATH set correctly.
Please don't forget to run 'ldconfig' after
installing the new libraries.


(pgconv.data):
--------------
#From    To    What to do
#
epoch    6.2    ERROR("Can not upgrade - too old version")
6.2    6.3    SQL("some-sql-commands-here")
        DELETE("obsolete-file")
        OLDVER_DUMPALL()        # To temp file
        NEWVER_LOADALL()        # From temp file
6.3    6.3.2    PRINT("Creating shadow passwords")
        SQL("create-pg_shadow")
        SYSTEM("chmod go-rwx pg_user")
        SQL("some-sql-commands")
6.3.2    6.4    SQL("some-commands")
        SYSTEM("chmod some-files")
        PRINT("System tables converted")
        SQL("some-other-commands")
        PRINT("Data files converted")

/* m */

Re: [HACKERS] An easier way to upgrade (Was: Lots 'o patches)

From
Goran Thyni
Date:
Mattias Kregert wrote:
> How about making a file specifying what to do when upgrading from one
> version of pg to another? Then a program, let's call it 'pgconv', would
> read this file and do the conversions from the old to the new format
> using pg_dump and psql and/or some other helper programs.

I think what is needed is a replication program, since pgsql uses
socket comunication it is quiet easy to run 2 concurrent systems
say one each of 6.3.2 and 6.4 and copy beteewn them at run-time.

The easiest way would be to use dump&load but as David pointed out in
a case where dump&load takes 20 hours it means 20 hours downtime unless
we want inconsistent data (data inserted/updated while copying).

A smarter replication would make the downtime shorter since most data
would be upto date and only latest changes need to be transfer during
"update downtime".

Such a mechanism would be even more useful for other proposes like
clustering/backup/redundancy etc.

Has anyone looked as this?
The only thing I have seen is the Mariposa project which seems to be
somewhat overkill for most applications.

--
---------------------------------------------
Göran Thyni, sysadm, JMS Bildbasen, Kiruna

Attachment

Re: [HACKERS] An easier way to upgrade (Was: Lots 'o patches)

From
"Thomas G. Lockhart"
Date:
> > ... if we are a little bit careful not to
> > break existing on disk structures, or to make things downward
> > compatible.
> > For example, if we added a b-tree clustered index access method,
> > this should not invalidate all existing tables and indexes, they
> > just couldn't take advantage of it until rebuilt.
> > On the other hand, if we decided to change to say 64 bit oids, I can
> > see a reload being required.
> > I guess that in our situation we will occassionally have changes
> > that require a dump/load. But this should really only be required
> > for the addition of a major feature that offers enough benifit to
> > the user that they can see that it is worth the pain.
> > Without knowing the history, the impression I have formed is that we
> > have sort of assumed that each release will require a dump/load to
> > do the upgrade. I would like to see us adopt a policy of trying to
> > avoid this unless there is a compelling reason to make an exception.

We tried pretty hard to do this at the start of the v6.x releases, and
failed. A few of the reasons as I recall:
1) most changes/improvements involve changes to one or more system
catalogs
2) postgres does not allow updates/inserts to at least some system
catalogs (perhaps because of interactions with the compiled catalog
cache?).
3) system catalogs appear in every database directory, so all databases
would need to be upgraded

> How about making a file specifying what to do when upgrading from one
> version of pg to another? Then a program, let's call it 'pgconv',
> would read this file and do the conversions from the old to the new
> format using pg_dump and psql and/or some other helper programs.
>
> pgconv should be able to skip versions (upgrade from 6.2 to 6.4 for
> example, skipping 6.2.1, 6.3 and 6.3.2) by simply going through all
> steps from version to version.
>
> Wouldn't this be much easier than having to follow instructions
> written in HRF? Nobody could mess up their data, because the
> program would always do the correct conversions.

This will be a good bit of work, and would be nice to have but we'd
probably need a few people to take this on as a project. Right now, the
most active developers are already spending more time than they should
working on Postgres :)

I haven't been too worried about this, but then I don't run big
databases which need to be upgraded. Seems the dump/reload frees us to
make substantial improvements with each release without a huge burden of
ensuring backward compatibility. At the prices we charge, it might be a
good tradeoff for users...

> Btw, does pg_dump quote identifiers? CREATE TABLE "table"
> ("int" int, "char" char) for example? I know it did not
> use to, but perhaps it does now?

If it doesn't yet (I assume it doesn't), I'm planning on looking at it
for v6.4. Or do you want to look at it Bruce? We should be looking to
have all identifiers double-quoted, to preserve case, reserved words,
and weird characters in names.

                          - Tom

Re: [HACKERS] An easier way to upgrade (Was: Lots 'o patches)

From
The Hermit Hacker
Date:
On Tue, 2 Jun 1998, Goran Thyni wrote:

> Mattias Kregert wrote:
> > How about making a file specifying what to do when upgrading from one
> > version of pg to another? Then a program, let's call it 'pgconv', would
> > read this file and do the conversions from the old to the new format
> > using pg_dump and psql and/or some other helper programs.
>
> I think what is needed is a replication program, since pgsql uses
> socket comunication it is quiet easy to run 2 concurrent systems
> say one each of 6.3.2 and 6.4 and copy beteewn them at run-time.
>
> The easiest way would be to use dump&load but as David pointed out in
> a case where dump&load takes 20 hours it means 20 hours downtime unless
> we want inconsistent data (data inserted/updated while copying).
>
> A smarter replication would make the downtime shorter since most data
> would be upto date and only latest changes need to be transfer during
> "update downtime".
>
> Such a mechanism would be even more useful for other proposes like
> clustering/backup/redundancy etc.
>
> Has anyone looked as this?
> The only thing I have seen is the Mariposa project which seems to be
> somewhat overkill for most applications.

    Someone had scripts for this that they were going to submit, but I
never heard further on it :(



Re: [HACKERS] An easier way to upgrade (Was: Lots 'o patches)

From
Bruce Momjian
Date:
> If it doesn't yet (I assume it doesn't), I'm planning on looking at it
> for v6.4. Or do you want to look at it Bruce? We should be looking to
> have all identifiers double-quoted, to preserve case, reserved words,
> and weird characters in names.

Would someone research this, and I can add it to the todo list.  Never
used quoted identifiers.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] An easier way to upgrade (Was: Lots 'o patches)

From
Bruce Momjian
Date:
1> How about making a file specifying what to do when upgrading from one
> version of pg to another? Then a program, let's call it 'pgconv', would
> read this file and do the conversions from the old to the new format
> using pg_dump and psql and/or some other helper programs.

We already have the migration directory, but it only text, no scripts
currently.  During 1.*, we did supply script for the upgrade, but the
feature changes were small.

>
> (pgconv.data):
> --------------
> #From    To    What to do
> #
> epoch    6.2    ERROR("Can not upgrade - too old version")
> 6.2    6.3    SQL("some-sql-commands-here")
>         DELETE("obsolete-file")
>         OLDVER_DUMPALL()        # To temp file
>         NEWVER_LOADALL()        # From temp file
> 6.3    6.3.2    PRINT("Creating shadow passwords")
>         SQL("create-pg_shadow")
>         SYSTEM("chmod go-rwx pg_user")
>         SQL("some-sql-commands")
> 6.3.2    6.4    SQL("some-commands")
>         SYSTEM("chmod some-files")
>         PRINT("System tables converted")
>         SQL("some-other-commands")
>         PRINT("Data files converted")

Interesting ideas, but in fact, all installs will probably require a new
initdb.  Because of the interdependent nature of the system tables, it
is hard to make changes to them using SQL statements.  What we could try
is doing a pg_dump_all -schema-only, moving all the non pg_* files to a
separate directory, running initdb, loading the pg_dumped schema, then
moving the data files back into place.

That may work.  But if we change the on-disk layout of the data, like we
did when we made varchar() variable length, a dump-reload would be
required.  Vadim made on-disk data improvements for many releases.

We could make it happen even for complex cases, but then we come up on
the problem of whether it is wise to allocate limited development time
to migration issues.

I think the requirement of running the new initdb, and moving the data
files back into place is our best bet.

I would be intested to see if that works.  Does someone want to try
doing this with the regression test database?  Do a pg_dump with data
before and after the operation, and see if it the same.  This is a good
way to test pg_dump too.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)