Thread: (A) native Windows port

(A) native Windows port

From
Jan Wieck
Date:
Hackers,

as some of you figured already, Katie Ward and I are working fulltime on
PostgreSQL and are actually doing a native Win32 port. This port is not
based on CygWIN, Apache or any other compatibility library but uses 100%
native Windows functionality only.

We already have it far enough to create and drop databases, tables and
of course do the usual stuff (like INSERT, UPDATE, DELETE and SELECT).
But there is still plenty of work, so don't worry, all of you will have
a chance to leave their finger- and/or footprints.

What I want to start today is discussion about project coordination and
code management. Our proposal is to provide a diff first. I have no clue
when exactly this will happen, but assuming the usual PostgreSQL
schedule behaviour I would say it's measured in weeks :-). A given is
that we will contribute this work under the BSD license.

We will upload the diff to developer.postgresql.org and post a link
together with build instructions to hackers. After some discussion we
can create a CVS branch and apply that patch to there. Everyone who
wants to contribute to the Win32 port can then work in that branch.
Katie and I will take care that changes in trunk will periodically get
merged into the Win32 branch.

This model guarantees that we don't change the mainstream PostgreSQL
until the developers community decides to follow this road and choose
this implementation as the PostgreSQL Win32 port. At that point we can
merge the Win32 port into the trunk and ship it with the next release.

As for project coordination, I am willing to setup and maintain a page
similar to the (horribly outdated) ones that I did for Toast and RI.
Summarizing project status, pointing to resources, instructions, maybe a
roadmap, TODO, you name it.

Comments? Suggestions?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: (A) native Windows port

From
"Dave Page"
Date:

> -----Original Message-----
> From: Jan Wieck [mailto:JanWieck@Yahoo.com]
> Sent: 26 June 2002 15:45
> To: HACKERS
> Subject: [HACKERS] (A) native Windows port
>
>
> As for project coordination, I am willing to setup and
> maintain a page similar to the (horribly outdated) ones that
> I did for Toast and RI. Summarizing project status, pointing
> to resources, instructions, maybe a roadmap, TODO, you name it.
>
> Comments? Suggestions?

Great, can't wait to see your work.

I can probably sort out an installer shortly after you have the first
code available - that way we can work out kinks in a binary
distribution, as well as hopefully get some more testers who may not
have compilers etc on their windows boxes. Let me know if you'd like me
to work on that...

Regards, Dave.





Re: (A) native Windows port

From
Bruce Momjian
Date:
Jan Wieck wrote:
> As for project coordination, I am willing to setup and maintain a page
> similar to the (horribly outdated) ones that I did for Toast and RI.
> Summarizing project status, pointing to resources, instructions, maybe a
> roadmap, TODO, you name it.

Great.  Please see roadmap in TODO.detail/win32 for a list of items and
possible approaches.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: (A) native Windows port

From
"Christopher Kings-Lynne"
Date:
> As for project coordination, I am willing to setup and maintain a page
> similar to the (horribly outdated) ones that I did for Toast and RI.
> Summarizing project status, pointing to resources, instructions, maybe a
> roadmap, TODO, you name it.

I am willing to supply a complete, friendly, powerful and pretty installer
program, based on NSIS.

http://www.winamp.com/nsdn/nsis/index.jhtml

I suggest that pgAdmin is included in the install process.  Imagine it - a
win32 person downloads a single .exe, with contents bzip2'd.  They run the
installer, it asks them to agree to license, shows splash screen, asks them
where to install it, gets them to supply an installation password and
installs pgadmin.  It could set up a folder in their start menu with
start/stop, edit configs, uninstall and run pgadmin.

It would all work out of the box and would do wonderful things for the
Postgres community.

Chris






Re: (A) native Windows port

From
Lamar Owen
Date:
On Wednesday 26 June 2002 11:48 pm, Christopher Kings-Lynne wrote:
> I suggest that pgAdmin is included in the install process.  Imagine it - a
> win32 person downloads a single .exe, with contents bzip2'd.  They run the
> installer, it asks them to agree to license, shows splash screen, asks them
> where to install it, gets them to supply an installation password and
> installs pgadmin.  It could set up a folder in their start menu with
> start/stop, edit configs, uninstall and run pgadmin.

> It would all work out of the box and would do wonderful things for the
> Postgres community.

I like this idea, but let me just bring one little issue to note: are you 
going to handle upgrades, and if so, how?  How are you going to do a major 
version upgrade?
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11




Re: (A) native Windows port

From
"Christopher Kings-Lynne"
Date:
> > It would all work out of the box and would do wonderful things for the
> > Postgres community.
>
> I like this idea, but let me just bring one little issue to note: are you
> going to handle upgrades, and if so, how?  How are you going to
> do a major
> version upgrade?

Well, the easiest way would be to get them to uninstall the old version
first, but I'm sure it can be worked out.  Perhaps even we shouldn't
overwrite the old version anyway?

Chris





Re: (A) native Windows port

From
"Nicolas Bazin"
Date:
How does the upgrade work on UNIX? Is there anything available apart from
reading the release note?
----- Original Message -----
From: "Christopher Kings-Lynne" <chriskl@familyhealth.com.au>
To: "Lamar Owen" <lamar.owen@wgcr.org>; "Jan Wieck" <JanWieck@Yahoo.com>;
"HACKERS" <pgsql-hackers@postgresql.org>
Sent: Tuesday, July 02, 2002 12:48 PM
Subject: Re: [HACKERS] (A) native Windows port


> > > It would all work out of the box and would do wonderful things for the
> > > Postgres community.
> >
> > I like this idea, but let me just bring one little issue to note: are
you
> > going to handle upgrades, and if so, how?  How are you going to
> > do a major
> > version upgrade?
>
> Well, the easiest way would be to get them to uninstall the old version
> first, but I'm sure it can be worked out.  Perhaps even we shouldn't
> overwrite the old version anyway?
>
> Chris
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>
>






Re: (A) native Windows port

From
Jan Wieck
Date:
Christopher Kings-Lynne wrote:
> 
> > > It would all work out of the box and would do wonderful things for the
> > > Postgres community.
> >
> > I like this idea, but let me just bring one little issue to note: are you
> > going to handle upgrades, and if so, how?  How are you going to
> > do a major
> > version upgrade?
> 
> Well, the easiest way would be to get them to uninstall the old version
> first, but I'm sure it can be worked out.  Perhaps even we shouldn't
> overwrite the old version anyway?

The question is not how to replace some .EXE and .DLL files or modify
something in the registry. The question is what to do with the existing
databases in the case of a catalog version change. You have to dump and
restore. 


Jan

-- 

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: (A) native Windows port

From
Lamar Owen
Date:
On Tuesday 02 July 2002 09:52 am, Jan Wieck wrote:
> Christopher Kings-Lynne wrote:
> > > > It would all work out of the box and would do wonderful things for
> > > > the Postgres community.

> > > I like this idea, but let me just bring one little issue to note: are
> > > you going to handle upgrades, and if so, how?  How are you going to do
> > > a major
> > > version upgrade?

> > Well, the easiest way would be to get them to uninstall the old version
> > first, but I'm sure it can be worked out.  Perhaps even we shouldn't
> > overwrite the old version anyway?

> The question is not how to replace some .EXE and .DLL files or modify
> something in the registry. The question is what to do with the existing
> databases in the case of a catalog version change. You have to dump and
> restore.

Now, riddle me this: we're going to explain the vagaries of 
dump/initdb/restore to a typical Windows user, and further explain why the 
dump won't necessarily restore because of a bug in the older version's 
dump....

The typical Windows user is going to barf when confronted with our extant 
'upgrade' process.  While I really could not care less if PostgreSQL goes to 
Windows or not, I am of a mind to support the Win32 effort if it gets an 
upgrade path done so that everyone can upgrade sanely.  At least the Windows 
installer can check for existing database structures and ask what to do -- 
the RPM install cannot do this.  In fact, the Windows installer *must* check 
for an existing database installation, or we're going to get fried by typical 
Windows users.

And if having a working, usable, Win32 native port gets the subject of good 
upgrading higher up the priority list, BY ALL MEANS LET'S SUPPORT WIN32 
NATIVELY! :-) (and I despise Win32....)

But it shouldn't be an installer issue -- this is an issue which cause pain 
for all of our users, not just Windows or RPM (or Debian) users.  Upgrading 
(pg_upgrade is a start -- but it's not going to work as written on Windows) 
needs to be core functionality.  If I can't easily upgrade my database, what 
good are new features going to do for me?

Martin O has come up with a 'pg_fsck' utility that, IMHO, holds a great deal 
of promise for seamless binary 'in place' upgrading.  He has been able to 
write code to read multiple versions' database structures -- proving that it 
CAN be done.

Windows programs such as Lotus Organizer, Microsoft Access, Lotus Approach, 
and others, allow you to convert the old to the new as part of initial 
startup.  This will be a prerequisite for wide acceptance in the Windows 
world, methinks.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11




Re: (A) native Windows port

From
"Matthew T. O'Connor"
Date:
> The question is not how to replace some .EXE and .DLL files or modify
> something in the registry. The question is what to do with the existing
> databases in the case of a catalog version change. You have to dump and
> restore. 

pg_upgrade?

Otherwise: no upgrades persay, but you can intall the new version into a new 
directory and then have an automated pg_dump / restore between the old and 
the new.  This would require a lot of disk space, but I don't see any other 
clean way to automate it.




Re: (A) native Windows port

From
Jan Wieck
Date:
Lamar Owen wrote:
> [...]
> 
> And if having a working, usable, Win32 native port gets the subject of good
> upgrading higher up the priority list, BY ALL MEANS LET'S SUPPORT WIN32
> NATIVELY! :-) (and I despise Win32....)

Hehehe :-)

> [...]
> Martin O has come up with a 'pg_fsck' utility that, IMHO, holds a great deal
> of promise for seamless binary 'in place' upgrading.  He has been able to
> write code to read multiple versions' database structures -- proving that it
> CAN be done.

Unfortunately it's not the on-disk binary format of files that causes
the big problems. Our dump/initdb/restore sequence is also the solution
for system catalog changes. If we add/remove internal functions, there
will be changes to pg_proc. When the representation of parsetrees
changes, there will be changes to pg_rewrite (dunno how to convert
that). Consider adding another attribute to pg_class. You'd have to add
a row in pg_attribute, possibly (because it likely isn't added at the
end) increment the attno for 50% of all pg_attribute entries, and of
course insert an attribute in the middle of all existing pg_class rows
... ewe.


Jan

-- 

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: (A) native Windows port

From
Lamar Owen
Date:
On Tuesday 02 July 2002 03:14 pm, Jan Wieck wrote:
> Lamar Owen wrote:
> > [...]
> > Martin O has come up with a 'pg_fsck' utility that, IMHO, holds a great
> > deal of promise for seamless binary 'in place' upgrading.  He has been
> > able to write code to read multiple versions' database structures --
> > proving that it CAN be done.

> Unfortunately it's not the on-disk binary format of files that causes
> the big problems. Our dump/initdb/restore sequence is also the solution
> for system catalog changes.

Hmmm.  They get in there via the bki interface, right?  Is there an OID issue 
with these?  Could differential BKI files be possible, with known system 
catalog changes that can be applied via a 'patchdb' utility?  I know pretty 
much how pg_upgrade is doing things now -- and, frankly, it's a little bit of 
a kludge.

Yes, I do understand the things a dump restore does on somewhat of a detailed 
level.  I know the restore repopulates the entries in the system catalogs for 
the restored data, etc, etc.

Currently dump/restore handles the catalog changes.  But by what other means 
could we upgrade the system catalog in place?

Our very extensibility is our weakness for upgrades.  Can it be worked around?  
Anyone have any ideas?

Improving pg_upgrade may be the ticket -- but if the on-disk binary format 
changes (like it has before), then something will have to do the binary 
format translation -- something like pg_fsck. 

Incidentally, pg_fsck, or a program like it, should be in the core 
distribution.  Maybe not named pg_fsck, as our database isn't a filesystem, 
but pg_dbck, or pg_dbcheck, pr pg_dbfix, or similar.  Although pg_fsck is 
more of a pg_dbdump.

I've seen too many people bitten by upgrades gone awry.  The more we can do in 
the regard, the better.

And the Windows user will likely demand it.  I never thought I'd be grateful 
for a Win32 native PostgreSQL port... :-)
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11




Re: (A) native Windows port

From
Jean-Michel POURE
Date:
Le Jeudi 27 Juin 2002 05:48, Christopher Kings-Lynne a écrit :
> I am willing to supply a complete, friendly, powerful and pretty installer
> program, based on NSIS.

Maybe you should contact Dave Page, who wrote pgAdmin2 and the ODBC
installers. Maybe you can both work on the installer.

By the way, when will Dave be added to the main developper list? He wrote 99%
of pgAdmin on his own.

Cheers, Jean-Michel POURE




Re: (A) native Windows port

From
Hannu Krosing
Date:
On Tue, 2002-07-02 at 21:50, Lamar Owen wrote:
> On Tuesday 02 July 2002 03:14 pm, Jan Wieck wrote:
> > Lamar Owen wrote:
> > > [...]
> > > Martin O has come up with a 'pg_fsck' utility that, IMHO, holds a great
> > > deal of promise for seamless binary 'in place' upgrading.  He has been
> > > able to write code to read multiple versions' database structures --
> > > proving that it CAN be done.
> 
> > Unfortunately it's not the on-disk binary format of files that causes
> > the big problems. Our dump/initdb/restore sequence is also the solution
> > for system catalog changes.
> 
> Hmmm.  They get in there via the bki interface, right?  Is there an OID issue 
> with these?  Could differential BKI files be possible, with known system 
> catalog changes that can be applied via a 'patchdb' utility?  I know pretty 
> much how pg_upgrade is doing things now -- and, frankly, it's a little bit of 
> a kludge.
> 
> Yes, I do understand the things a dump restore does on somewhat of a detailed 
> level.  I know the restore repopulates the entries in the system catalogs for 
> the restored data, etc, etc.
> 
> Currently dump/restore handles the catalog changes.  But by what other means 
> could we upgrade the system catalog in place?
> 
> Our very extensibility is our weakness for upgrades.  Can it be worked around?  
> Anyone have any ideas?

Perhaps we can keep an old postgres binary + old backend around and then
use it in single-user mode to do a pg_dump into our running backend.

IIRC Access does its upgrade databse by copying old databse to new.

Our approach could be like

$OLD/postgres -D $OLD_DATA <pg_dump_cmds | $NEW/postgres -D NEW_BACKEND

or perhaps, while old backend is still running:

pg_dumpall | path_to_new_backend/bin/postgres


I dont think we should assume that we will be able to do an upgrade
while we have less free space than currently used by databases (or at
least by data - indexes can be added later)

Trying to do an in-place upgrade is an interesting CS project, but any
serious DBA will have backups, so they can do
$ psql < dumpfile

Speeding up COPY FROM could be a good thing (perhaps enabling it to run
without any checks and outside transactions when used in loading dumps)

And home users will have databases small enough that they should have
enough free space to have both old and new version for some time.

What we do need is more-or-less solid upgrade path using pg_dump

BTW, how hard would it be to move pg_dump inside the backend (perhaps
using a dynamically loaded function to save space when not used) so that
it could be used like COPY ?

pg> DUMP  table [ WITH 'other cmdline options' ] TO stdout ;

pg> DUMP * [ WITH 'other cmdline options' ] TO stdout ;

----------------
Hannu





Re: (A) native Windows port

From
Bruce Momjian
Date:
Lamar Owen wrote:
> On Tuesday 02 July 2002 03:14 pm, Jan Wieck wrote:
> > Lamar Owen wrote:
> > > [...]
> > > Martin O has come up with a 'pg_fsck' utility that, IMHO, holds a great
> > > deal of promise for seamless binary 'in place' upgrading.  He has been
> > > able to write code to read multiple versions' database structures --
> > > proving that it CAN be done.
> 
> > Unfortunately it's not the on-disk binary format of files that causes
> > the big problems. Our dump/initdb/restore sequence is also the solution
> > for system catalog changes.
> 
> Hmmm.  They get in there via the bki interface, right?  Is there an OID issue 
> with these?  Could differential BKI files be possible, with known system 
> catalog changes that can be applied via a 'patchdb' utility?  I know pretty 
> much how pg_upgrade is doing things now -- and, frankly, it's a little bit of 
> a kludge.

Sure, if it wasn't a kludge, I wouldn't have written it.  ;-)

Does everyone remember my LIKE indexing kludge in gram.y?  Until people
found a way to get it into the optimizer, it did its job.  I guess
that's where pg_upgrade is at this point.

Actually, how can pg_upgrade be improved?  

Also, we have committed to making file format changes for 7.3, so it
seems pg_upgrade will not be useful for that release unless we get some
binary conversion tool working.


> Yes, I do understand the things a dump restore does on somewhat of a detailed 
> level.  I know the restore repopulates the entries in the system catalogs for 
> the restored data, etc, etc.
> 
> Currently dump/restore handles the catalog changes.  But by what other means 
> could we upgrade the system catalog in place?
> 
> Our very extensibility is our weakness for upgrades.  Can it be worked around?  
> Anyone have any ideas?
> 
> Improving pg_upgrade may be the ticket -- but if the on-disk binary format 
> changes (like it has before), then something will have to do the binary 
> format translation -- something like pg_fsck. 

Yep.

> Incidentally, pg_fsck, or a program like it, should be in the core 
> distribution.  Maybe not named pg_fsck, as our database isn't a filesystem, 
> but pg_dbck, or pg_dbcheck, pr pg_dbfix, or similar.  Although pg_fsck is 
> more of a pg_dbdump.
> 
> I've seen too many people bitten by upgrades gone awry.  The more we can do in 
> the regard, the better.

I should mention that 7.3 will have pg_depend, which should make our
post-7.3 reload process much cleaner because we will not have dangling
objects as often.

> And the Windows user will likely demand it.  I never thought I'd be grateful 
> for a Win32 native PostgreSQL port... :-)

Yea, the trick is to get an something working that will require minimal
change from release to release.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: (A) native Windows port

From
Bruce Momjian
Date:
Hannu Krosing wrote:
> > Our very extensibility is our weakness for upgrades.  Can it be worked around?  
> > Anyone have any ideas?
> 
> Perhaps we can keep an old postgres binary + old backend around and then
> use it in single-user mode to do a pg_dump into our running backend.

That brings up an interesting idea.  Right now we dump the entire
database out to a file, delete the old database, and load in the file.

What if we could move over one table at a time?  Copy out the table,
load it into the new database, then delete the old table and move on to
the next.  That would allow use to upgrade having free space for just
the largest table.  Another idea would be to record and remove all
indexes in the old database.  That certainly would save disk space
during the upgrade.

However, the limiting factor is that we don't have a mechanism to have
both databases running at the same time currently.  Seems this may be
the direction to head in.

> BTW, how hard would it be to move pg_dump inside the backend (perhaps
> using a dynamically loaded function to save space when not used) so that
> it could be used like COPY ?
> 
> pg> DUMP  table [ WITH 'other cmdline options' ] TO stdout ;
> 
> pg> DUMP * [ WITH 'other cmdline options' ] TO stdout ;

Intersting idea, but I am not sure what that buys us.  Having pg_dump
separate makes maintenance easier.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: (A) native Windows port

From
Hannu Krosing
Date:
On Wed, 2002-07-03 at 17:28, Bruce Momjian wrote:
> Hannu Krosing wrote:
> > > Our very extensibility is our weakness for upgrades.  Can it be worked around?  
> > > Anyone have any ideas?
> > 
> > Perhaps we can keep an old postgres binary + old backend around and then
> > use it in single-user mode to do a pg_dump into our running backend.
> 
> That brings up an interesting idea.  Right now we dump the entire
> database out to a file, delete the old database, and load in the file.
> 
> What if we could move over one table at a time?  Copy out the table,
> load it into the new database, then delete the old table and move on to
> the next.  That would allow use to upgrade having free space for just
> the largest table.  Another idea would be to record and remove all
> indexes in the old database.  That certainly would save disk space
> during the upgrade.
> 
> However, the limiting factor is that we don't have a mechanism to have
> both databases running at the same time currently. 

How so ?

AFAIK I can run as many backends as I like (up to some practical limit)
on the same comuter at the same time, as long as they use different
ports and different data directories.

> Seems this may be
> the direction to head in.
> 
> > BTW, how hard would it be to move pg_dump inside the backend (perhaps
> > using a dynamically loaded function to save space when not used) so that
> > it could be used like COPY ?
> > 
> > pg> DUMP  table [ WITH 'other cmdline options' ] TO stdout ;
> > 
> > pg> DUMP * [ WITH 'other cmdline options' ] TO stdout ;
> 
> Intersting idea, but I am not sure what that buys us.  Having pg_dump
> separate makes maintenance easier.

can pg_dump connect to single-user-mode backend ?

--------------------
Hannu





Re: (A) native Windows port

From
Bruce Momjian
Date:
Hannu Krosing wrote:
> > 
> > However, the limiting factor is that we don't have a mechanism to have
> > both databases running at the same time currently. 
> 
> How so ?
> 
> AFAIK I can run as many backends as I like (up to some practical limit)
> on the same comuter at the same time, as long as they use different
> ports and different data directories.

We don't have an automated system for doing this.  Certainly it is done
all the time.

> > Intersting idea, but I am not sure what that buys us.  Having pg_dump
> > separate makes maintenance easier.
> 
> can pg_dump connect to single-user-mode backend ?

Uh, no, I don't think so.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: (A) native Windows port

From
Lamar Owen
Date:
On Wednesday 03 July 2002 12:09 pm, Bruce Momjian wrote:
> Hannu Krosing wrote:
> > AFAIK I can run as many backends as I like (up to some practical limit)
> > on the same comuter at the same time, as long as they use different
> > ports and different data directories.

> We don't have an automated system for doing this.  Certainly it is done
> all the time.

Good.  Dialog.  This is better than what I am used to when I bring up 
upgrading. :-)

Bruce, pg_upgrade isn't as kludgey as what I have been doing with the RPMset 
for these nearly three years.

No, what I envisioned was a standalone dumper that can produce dump output 
without having a backend at all.  If this dumper knows about the various 
binary formats, and knows how to get my data into a form I can then restore 
reliably, I will be satisfied.  If it can be easily automated so much the 
better.  Doing it table by table would be ok as well.

I'm looking for a sequence such as:

----
PGDATA=location/of/data/base
TEMPDATA=location/of/temp/space/on/same/file/system

mv $PGDATA/* $TEMPDATA
initdb -D $PGDATA
pg_dbdump $TEMPDATA |pg_restore  {with its associated options, etc}

With an rm -rf of $TEMPDATA much further down the pike.....

Keys to this working:
1.)    Must not require the old version executable backend.  There are a number 
of reasons why this might be, but the biggest is due to the way much 
upgrading works in practice -- the old executables are typically gone by the 
time the new package is installed.

2.)    Uses pg_dbdump of the new version.  This dumper can be tailored to provide 
the input pg_restore wants to see.  The dump-restore sequence has always had 
dumped-data version mismatch as its biggest problem -- there have been issues 
before where you would have to install the new version of pg_dump to run 
against the old backend.  This is unacceptable in the real world of binary 
packages.

One other usability note: why can't postmaster perform the steps of an initdb 
if -D points to an empty directory?  It's not that much code, is it?  (I know 
that one extra step isn't backbreaking, but I'm looking at this from a rank 
newbie's point of view -- or at least I'm trying to look at it in that way, 
as it's been a while since I was a rank newbie at PostgreSQL)  Oh well, just 
a random thought.

But I believe a backend-independent data dumper would be very useful in many 
contexts, particularly those where a backend cannot be run for whatever 
reason, but you need your data (corrupted system catalogs, high system load, 
whatever).  Upgrading is just one of those contexts.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11






Re: (A) native Windows port

From
Bruce Momjian
Date:
Lamar Owen wrote:
> On Wednesday 03 July 2002 12:09 pm, Bruce Momjian wrote:
> > Hannu Krosing wrote:
> > > AFAIK I can run as many backends as I like (up to some practical limit)
> > > on the same comuter at the same time, as long as they use different
> > > ports and different data directories.
> 
> > We don't have an automated system for doing this.  Certainly it is done
> > all the time.
> 
> Good.  Dialog.  This is better than what I am used to when I bring up 
> upgrading. :-)
> 
> Bruce, pg_upgrade isn't as kludgey as what I have been doing with the RPMset 
> for these nearly three years.
> 
> No, what I envisioned was a standalone dumper that can produce dump output 
> without having a backend at all.  If this dumper knows about the various 
> binary formats, and knows how to get my data into a form I can then restore 
> reliably, I will be satisfied.  If it can be easily automated so much the 
> better.  Doing it table by table would be ok as well.

The problem with a standalone dumper is that you would have to recode
this for every release, with little testing possible.  Having the old
backend active saves us that step.  If we get it working, we can use it
over and over again for each release with little work on our part.

> Keys to this working:
> 1.)    Must not require the old version executable backend.  There are a number 
> of reasons why this might be, but the biggest is due to the way much 
> upgrading works in practice -- the old executables are typically gone by the 
> time the new package is installed.

Oh, that is a problem.  We would have to require the old executables.

> 2.)    Uses pg_dbdump of the new version.  This dumper can be tailored to provide 
> the input pg_restore wants to see.  The dump-restore sequence has always had 
> dumped-data version mismatch as its biggest problem -- there have been issues 
> before where you would have to install the new version of pg_dump to run 
> against the old backend.  This is unacceptable in the real world of binary 
> packages.
> 
> One other usability note: why can't postmaster perform the steps of an initdb 
> if -D points to an empty directory?  It's not that much code, is it?  (I know 
> that one extra step isn't backbreaking, but I'm looking at this from a rank 
> newbie's point of view -- or at least I'm trying to look at it in that way, 
> as it's been a while since I was a rank newbie at PostgreSQL)  Oh well, just 
> a random thought.

The issue is that if you have PGDATA pointed to the wrong place, it
creates a new instance automatically.  Could be strange for people, but
we could prompt them to run initdb I guess.

> But I believe a backend-independent data dumper would be very useful in many 
> contexts, particularly those where a backend cannot be run for whatever 
> reason, but you need your data (corrupted system catalogs, high system load, 
> whatever).  Upgrading is just one of those contexts.

Yes, but who wants to write one of those for every release?  That is
where we get stuck, and with our limited resources, it is desirable to
encourage people to work on it?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: (A) native Windows port

From
Oliver Elphick
Date:
On Fri, 2002-07-05 at 17:39, Lamar Owen wrote:
> No, what I envisioned was a standalone dumper that can produce dump output 
> without having a backend at all.  If this dumper knows about the various 
> binary formats, and knows how to get my data into a form I can then restore 
> reliably, I will be satisfied.  If it can be easily automated so much the 
> better.  Doing it table by table would be ok as well.
...
> 1.)    Must not require the old version executable backend.  There are a number 
> of reasons why this might be, but the biggest is due to the way much 
> upgrading works in practice -- the old executables are typically gone by the 
> time the new package is installed.
> 
> 2.)    Uses pg_dbdump of the new version.  This dumper can be tailored to provide 
> the input pg_restore wants to see.  The dump-restore sequence has always had 
> dumped-data version mismatch as its biggest problem -- there have been issues 
> before where you would have to install the new version of pg_dump to run 
> against the old backend.  This is unacceptable in the real world of binary 
> packages.

I concur completely!

As a package maintainer, this would remove my biggest problem.


Oliver Elphick
(Debian maintainer)






Re: (A) native Windows port

From
Andrew Sullivan
Date:
On Fri, Jul 05, 2002 at 12:39:13PM -0400, Lamar Owen wrote:

> One other usability note: why can't postmaster perform the steps of
> an initdb if -D points to an empty directory?  It's not that much
> code, is it?  (I know that one extra step isn't backbreaking, but
> I'm looking at this from a rank newbie's point of view -- or at
> least I'm trying to look at it in that way, as it's been a while
> since I was a rank newbie at PostgreSQL) Oh well, just a random
> thought.

Rank newbies shouldn't be protected in this way, partly because if
something goes wrong, _they won't know what to do_.  Please, please,
don't be putting automagic, database destroying functions like that
into the postmaster.  It's a sure way to cause a disaster at aome
point.

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3                                        +1 416 646 3304
x110





Re: (A) native Windows port

From
"Dann Corbit"
Date:
On generic recovery...

What is wrong with this strategy...

0. Put the database in single user mode.

1. Dump the Schema, with creation order properly defined, and with all
constraints written to a separate file.  (IOW, one file contains the
bare tables with no index, constraint or trigger stuff, and the other
contains all the RI stuff.)

2. Dump the tables (one by one) to text files with "copy"

3. Create a new database in a new location.

4. Feed it the bare table schema

5. Pump in the table data using "copy" from the saved text files

6. Run the RI script to rebuild index, trigger, PKey, FKey, etc.

I find that is the most trouble free way to do it with most DBMS
systems.

Am attempted dump from DBMS X.Y and a load to DBMS (X+1).Y is always a
pile of trouble waiting to happen -- no matter what the system is.




Re: (A) native Windows port

From
Tom Lane
Date:
Andrew Sullivan <andrew@libertyrms.info> writes:
> On Fri, Jul 05, 2002 at 12:39:13PM -0400, Lamar Owen wrote:
>> One other usability note: why can't postmaster perform the steps of
>> an initdb if -D points to an empty directory?

> Rank newbies shouldn't be protected in this way, partly because if
> something goes wrong, _they won't know what to do_.  Please, please,
> don't be putting automagic, database destroying functions like that
> into the postmaster.

I agree completely with Andrew, even though an auto-initdb on an empty
directory presumably won't destroy any data.  What it *does* do is
effectively mask a DBA error.  We'll be getting panic-stricken support
calls/emails saying "all my databases are gone!  Postgres sucks!" when
the problem is just that PG was restarted with the wrong -D pointer.
The existing behavior points that out loud and clear, in a context
where the DBA shouldn't have too much trouble figuring out what he
did wrong.
        regards, tom lane




Re: (A) native Windows port

From
Lamar Owen
Date:
On Saturday 06 July 2002 11:15 am, Tom Lane wrote:
> Andrew Sullivan <andrew@libertyrms.info> writes:
> > On Fri, Jul 05, 2002 at 12:39:13PM -0400, Lamar Owen wrote:
> >> One other usability note: why can't postmaster perform the steps of
> >> an initdb if -D points to an empty directory?

> > Rank newbies shouldn't be protected in this way, partly because if
> > something goes wrong, _they won't know what to do_.

> I agree completely with Andrew, even though an auto-initdb on an empty
> directory presumably won't destroy any data.

Good grief, I was just asking a question. :-)

>  What it *does* do is
> effectively mask a DBA error.

This is a satisfactory answer.  In the context of the RPM distribution, if the 
initscript is used the DBA error probability is greatly reduced, thus the 
initscript can safely initdb.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11




Re: (A) native Windows port

From
"Marc G. Fournier"
Date:
On Sat, 6 Jul 2002, Tom Lane wrote:

> Andrew Sullivan <andrew@libertyrms.info> writes:
> > On Fri, Jul 05, 2002 at 12:39:13PM -0400, Lamar Owen wrote:
> >> One other usability note: why can't postmaster perform the steps of
> >> an initdb if -D points to an empty directory?
>
> > Rank newbies shouldn't be protected in this way, partly because if
> > something goes wrong, _they won't know what to do_.  Please, please,
> > don't be putting automagic, database destroying functions like that
> > into the postmaster.
>
> I agree completely with Andrew, even though an auto-initdb on an empty
> directory presumably won't destroy any data.  What it *does* do is
> effectively mask a DBA error.  We'll be getting panic-stricken support
> calls/emails saying "all my databases are gone!  Postgres sucks!" when
> the problem is just that PG was restarted with the wrong -D pointer. The
> existing behavior points that out loud and clear, in a context where the
> DBA shouldn't have too much trouble figuring out what he did wrong.

Okay, I'm sitting on the fence on this one ... but, as DBA for several
PgSQL installs on at least a half dozen machines or more, if someone
restarts PG with the wrong -D pointer, they haven't setup their machine to
live through a reboot ... first thing any DBA *should* be doing after they
have 'initdb'd their system is add the appropriate start-up scripts for
after the reboot ...

Also, what is the difference between forgetting where you put it in an
initdb or on the first postmaster?  Why not put in a 'safety'?  If you
start up postmaster with -D on a directory that doesn't yet exist, it
prompts the DBA as to whether they are certain that they wish to do this?

Just thoughts ... I'm happy enough with initdb *shrug*





Re: (A) native Windows port

From
Tom Lane
Date:
Lamar Owen <lamar.owen@wgcr.org> writes:
>> What it *does* do is effectively mask a DBA error.

> This is a satisfactory answer.  In the context of the RPM distribution, if the 
> initscript is used the DBA error probability is greatly reduced, thus the 
> initscript can safely initdb.

Fair enough --- if the upper-layer script thinks it has enough safeties
in place, let it auto-initdb.  I just don't think the postmaster should
do that.
        regards, tom lane




Re: (A) native Windows port

From
"Zeugswetter Andreas SB SD"
Date:
> No, what I envisioned was a standalone dumper that can produce dump output
> without having a backend at all.  If this dumper knows about the various
> binary formats, and knows how to get my data into a form I can then restore
> reliably, I will be satisfied.  If it can be easily automated so much the
> better.  Doing it table by table would be ok as well.

Unless it dumps binary representation of columns, a standalone dumper
would still need to load all the output function shared libs for custom types
(or not support custom types which would imho not be good).

Andreas




Re: (A) native Windows port

From
Jan Wieck
Date:
Zeugswetter Andreas SB SD wrote:
> 
> > No, what I envisioned was a standalone dumper that can produce dump output
> > without having a backend at all.  If this dumper knows about the various
> > binary formats, and knows how to get my data into a form I can then restore
> > reliably, I will be satisfied.  If it can be easily automated so much the
> > better.  Doing it table by table would be ok as well.
> 
> Unless it dumps binary representation of columns, a standalone dumper
> would still need to load all the output function shared libs for custom types
> (or not support custom types which would imho not be good).

And now we change the internal representation of NUMERIC to a short
integer array holding the number in base 10,000 and what exactly does
the standalone dumpster do with our data?

Another good example: let's add a field to some parsenode struct (was
there a release where this didn't happen?). This causes the NodeOut()
results to become a little different, which actually changes the textual
content of a likely toasted pg_rewrite attribute. Stored compressed and
sliced. I am quite a bit familiar with TOAST and the rewrite system.
Yet, someone has to help me a little to understand how we can do this
conversion in binary on the fly with an external tool. Especially where
this conversion results in different raw and compressed sizes of the
TOASTed attribute, which has to propagate up into the TOAST reference in
the main table ... not to speak of possible required block splits in the
toast table and index because of needing one more slice!


Jan

-- 

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: (A) native Windows port

From
Hannu Krosing
Date:
On Tue, 2002-07-09 at 00:20, Jan Wieck wrote:
> Zeugswetter Andreas SB SD wrote:
> > 
> > > No, what I envisioned was a standalone dumper that can produce dump output
> > > without having a backend at all.  If this dumper knows about the various
> > > binary formats, and knows how to get my data into a form I can then restore
> > > reliably, I will be satisfied.  If it can be easily automated so much the
> > > better.  Doing it table by table would be ok as well.
> > 
> > Unless it dumps binary representation of columns, a standalone dumper
> > would still need to load all the output function shared libs for custom types
> > (or not support custom types which would imho not be good).
> 
> And now we change the internal representation of NUMERIC to a short
> integer array holding the number in base 10,000 and what exactly does
> the standalone dumpster do with our data?
>
> Another good example: let's add a field to some parsenode struct (was
> there a release where this didn't happen?). This causes the NodeOut()
> results to become a little different, which actually changes the textual
> content of a likely toasted pg_rewrite attribute. Stored compressed and
> sliced. I am quite a bit familiar with TOAST and the rewrite system.
> Yet, someone has to help me a little to understand how we can do this
> conversion in binary on the fly with an external tool. Especially where
> this conversion results in different raw and compressed sizes of the
> TOASTed attribute, which has to propagate up into the TOAST reference in
> the main table ... not to speak of possible required block splits in the
> toast table and index because of needing one more slice!

This brings us back to my original proposal : this "external tool" needs
to be either a full postgres backend with added DUMP command or
something that can use a (possibly single-user) backend as either a 
library or, yes, a "backend" ;)

---------------------
Hannu






Re: (A) native Windows port

From
"Matthew T. O'Connor"
Date:
> > Keys to this working:
> > 1.)    Must not require the old version executable backend.  There are a 
number 
> > of reasons why this might be, but the biggest is due to the way much 
> > upgrading works in practice -- the old executables are typically gone by 
the 
> > time the new package is installed.
> 
> Oh, that is a problem.  We would have to require the old executables.

Could this be solved with packaging? Meaning can postmasters from old versions 
be packed with a new release strictly for the purpose of upgrading?  It is my 
understanding that the only old executable needed is the postmaster is that 
correct?  Perhaps this also requires adding functionality so that pg_dump can 
run against a singer user postmaster.

Example: When PG 7.3 is released, the RPM / deb / setup.exe include the 
postmaster binary for v 7.2 (perhaps two or three older versions...).  An 
upgrade script is included that does the automatic dump / restore described 
eariler in this thread.  Effectivly, you are using old versions of the 
postmaster as your standalone dumper. 

I think this could sidestep the problem of having to create / test / maintain 
new version of a dumper or pg_upgrade for every release.

By default perhaps the postmaster for the previous version of postgres is 
included, and postmasters from older versions are distrubuted in separate 
packages, so if I am still runnig 6.5.3 and I want to upgrade to 7.3, I have 
do install the 6.5.3 upgrade package.  Or perhaps there i one pg_upgrade rpm 
package that includes every postmaster since 6.4.  This would allow the 
upgrade script to know that it all backends are availble to it depeding on 
what it finds in PG_VERSION, it also allows the admin to removed them all 
easily once they are no longer needed.




Re: (A) native Windows port

From
Oliver Elphick
Date:
On Tue, 2002-07-09 at 01:30, Matthew T. O'Connor wrote:
> > Oh, that is a problem.  We would have to require the old executables.
> 
> Could this be solved with packaging? Meaning can postmasters from old versions 
> be packed with a new release strictly for the purpose of upgrading?  It is my 
> understanding that the only old executable needed is the postmaster is that 
> correct?  Perhaps this also requires adding functionality so that pg_dump can 
> run against a singer user postmaster.
> 
> Example: When PG 7.3 is released, the RPM / deb / setup.exe include the 
> postmaster binary for v 7.2 (perhaps two or three older versions...). 

That isn't usable for Debian.  A package must be buildable from source;
so I would have to include separate (though possibly cut-down) source
for n previous packages.  It's a horrid prospect and a dreadful kludge
of a solution - a maintainer's nightmare.

Oliver  





Re: (A) native Windows port

From
Lamar Owen
Date:
On Monday 08 July 2002 03:20 pm, Jan Wieck wrote:
> Zeugswetter Andreas SB SD wrote:
> > Unless it dumps binary representation of columns, a standalone dumper
> > would still need to load all the output function shared libs for custom
> > types (or not support custom types which would imho not be good).

> And now we change the internal representation of NUMERIC to a short
> integer array holding the number in base 10,000 and what exactly does
> the standalone dumpster do with our data?

What does a standard dump/restore do then as well?  Is the restore process 
complicated by a rebuild of the function(s) involved in custom types?  This, 
IMHO, is a pathological case even for a standard dump/restore.  Someone doing 
this sort of thing is going to have more to do that a simple package upgrade.

> Another good example: let's add a field to some parsenode struct (was
> there a release where this didn't happen?). This causes the NodeOut()
> results to become a little different, which actually changes the textual
> content of a likely toasted pg_rewrite attribute. Stored compressed and
> sliced. I am quite a bit familiar with TOAST and the rewrite system.
> Yet, someone has to help me a little to understand how we can do this
> conversion in binary on the fly with an external tool. Especially where
> this conversion results in different raw and compressed sizes of the
> TOASTed attribute, which has to propagate up into the TOAST reference in
> the main table ... not to speak of possible required block splits in the
> toast table and index because of needing one more slice!

This is more difficult, certainly.  Martijn, how does pg_fsck handle such 
things now?

Again, this tool has utility outside upgrading.  And I'm talking about dumping 
the binary down to ASCII to be restored, not binary to binary on the fly.

This is the best dialog yet on the issue of upgrading.  Keep it coming! :-)
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Hannu Krosing
Date:
On Tue, 2002-07-09 at 17:19, Lamar Owen wrote:
> On Monday 08 July 2002 03:20 pm, Jan Wieck wrote:
> > Another good example: let's add a field to some parsenode struct (was
> > there a release where this didn't happen?). This causes the NodeOut()
> > results to become a little different, which actually changes the textual
> > content of a likely toasted pg_rewrite attribute. Stored compressed and
> > sliced. I am quite a bit familiar with TOAST and the rewrite system.
> > Yet, someone has to help me a little to understand how we can do this
> > conversion in binary on the fly with an external tool. Especially where
> > this conversion results in different raw and compressed sizes of the
> > TOASTed attribute, which has to propagate up into the TOAST reference in
> > the main table ... not to speak of possible required block splits in the
> > toast table and index because of needing one more slice!
> 
> This is more difficult, certainly.  Martijn, how does pg_fsck handle such 
> things now?
> 
> Again, this tool has utility outside upgrading.  And I'm talking about dumping 
> the binary down to ASCII to be restored, not binary to binary on the fly.

You seem to be talking about pg_dump + old backend, no ? ;)

For me it seems a given that you need old binary-reading code to read
old binary format data. The most convenient place to keep it is inside
an old backend. 

It may be possible to migrate simple user table data from old version to
new, but it gets complicated real fast for most other things, especially
for stuff in system tables.

---------------
Hannu



Re: (A) native Windows port

From
Lamar Owen
Date:
On Tuesday 09 July 2002 11:41 am, Hannu Krosing wrote:
> The old postmaster should not be built/distributed. As it is for
> _upgrading_ only, you just have to _keep_ it when doing an upgrade, not
> build a new "old" one ;)

Let me reiterate one thing about this.  In the midst of a total OS upgrade, 
during which PostgreSQL is being upgraded as well (the new OS release 
includes a 'better' PostgreSQL), you also get library upgrades.  If the 
upgrade is from an old enough version of the OS, the old postmaster/postgres 
may not even be able to execute AT ALL.

Some may say that this is a problem for the vendor.  Well I know of one vendor 
that has thrown up its hands in disgust over our lack of upgradability that 
they have now quit supporting even the kludgy semi-automatic upgrade process 
I did up three years ago.  They will refuse to support any mechanism that 
requires any portion of an old package to remain around.  The new package 
must be self-contained and must be able to upgrade the old data, or they will 
not accept it.

Their statement now is simply that PostgreSQL upgrading is broken; dump before 
upgrading and complain to the PostgreSQL developers.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Oliver Elphick
Date:
On Tue, 2002-07-09 at 16:41, Hannu Krosing wrote:
> On Tue, 2002-07-09 at 13:48, Oliver Elphick wrote:
> > On Tue, 2002-07-09 at 01:30, Matthew T. O'Connor wrote:
> > > Example: When PG 7.3 is released, the RPM / deb / setup.exe include the 
> > > postmaster binary for v 7.2 (perhaps two or three older versions...). 
> > 
> > That isn't usable for Debian.  A package must be buildable from source;
> > so I would have to include separate (though possibly cut-down) source
> > for n previous packages.  It's a horrid prospect and a dreadful kludge
> > of a solution - a maintainer's nightmare.
> 
> The old postmaster should not be built/distributed. As it is for
> _upgrading_ only, you just have to _keep_ it when doing an upgrade, not
> build a new "old" one ;)

No, it doesn't work like that.  You cannot rely on anything's being left
from an old distribution; apt is quite likely to delete it altogether
before installing the new version (to enable dependencies to be
satisfied).  At present I have the preremoval script copy the old
binaries to a special location in case they will be needed, but that
fails if the version is very old (and doesn't contain that code), and
it's a very fragile mechanism.

I never have understood why the basic table structure changes so much
that it can't be read; just what is involved in getting the ability to
read old versions?




Re: (A) native Windows port

From
Hannu Krosing
Date:
On Tue, 2002-07-09 at 17:49, Oliver Elphick wrote:
> On Tue, 2002-07-09 at 16:41, Hannu Krosing wrote:
> > On Tue, 2002-07-09 at 13:48, Oliver Elphick wrote:
> > > On Tue, 2002-07-09 at 01:30, Matthew T. O'Connor wrote:
> > > > Example: When PG 7.3 is released, the RPM / deb / setup.exe include the 
> > > > postmaster binary for v 7.2 (perhaps two or three older versions...). 
> > > 
> > > That isn't usable for Debian.  A package must be buildable from source;
> > > so I would have to include separate (though possibly cut-down) source
> > > for n previous packages.  It's a horrid prospect and a dreadful kludge
> > > of a solution - a maintainer's nightmare.
> > 
> > The old postmaster should not be built/distributed. As it is for
> > _upgrading_ only, you just have to _keep_ it when doing an upgrade, not
> > build a new "old" one ;)
> 
> No, it doesn't work like that.  You cannot rely on anything's being left
> from an old distribution; apt is quite likely to delete it altogether
> before installing the new version (to enable dependencies to be
> satisfied).  At present I have the preremoval script copy the old
> binaries to a special location in case they will be needed, but that
> fails if the version is very old (and doesn't contain that code), and
> it's a very fragile mechanism.
> 
> I never have understood why the basic table structure changes so much
> that it can't be read; just what is involved in getting the ability to
> read old versions?

The big change was from 6.x to 7.x where a chunk of data moved from end
of page to start of page and tableoid column was added. Otherways the
table structure is quite simple. The difficulties with user _data_ can
be mainly because of binary format changes for some types and such.

But I still can't see how will having a binary dumper that does mostly
the work of [ old_backend -c "COPY tablex TO STDOUT" ] help us here. 

IIRC the main difficulties in upgrading have always been elsewhere, like
migrating always changing system table data.

----------
Hannu


Re: (A) native Windows port

From
Oliver Elphick
Date:
On Tue, 2002-07-09 at 18:05, Hannu Krosing wrote:
> The big change was from 6.x to 7.x where a chunk of data moved from end
> of page to start of page and tableoid column was added. Otherways the
> table structure is quite simple. The difficulties with user _data_ can
> be mainly because of binary format changes for some types and such.
> 
> But I still can't see how will having a binary dumper that does mostly
> the work of [ old_backend -c "COPY tablex TO STDOUT" ] help us here. 
> 
> IIRC the main difficulties in upgrading have always been elsewhere, like
> migrating always changing system table data.

The main problem is getting access to the user data after an upgrade. 
There's no particular problem in having to do an initdb, though it is an
inconvenience; the difficulty is simply that any packaged distribution
(rpm, deb, xxx) is going to have to replace all the old binaries.  So by
the time the package is ready to do the database upgrade, it has
destroyed the means of dumping the old data.  Lamar and I have to jump
through hoops to get round this -- small hoops with flaming rags round
them!

The current upgrade process for PostgreSQL is founded on the idea that
people build from source.  With binary distributions, half the users
wouldn't know what to do with source; they expect (and are entitled to
expect) that an upgrade will progress without the need for significant
intervention on their part.  PostgreSQL makes this really difficult for
the package maintainers, and this has a knock-on effect on the
reliability of the upgrade process and thus on PostgreSQL itself.




Re: (A) native Windows port

From
Hannu Krosing
Date:
On Tue, 2002-07-09 at 18:30, Oliver Elphick wrote:
> On Tue, 2002-07-09 at 18:05, Hannu Krosing wrote:
> > The big change was from 6.x to 7.x where a chunk of data moved from end
> > of page to start of page and tableoid column was added. Otherways the
> > table structure is quite simple. The difficulties with user _data_ can
> > be mainly because of binary format changes for some types and such.
> > 
> > But I still can't see how will having a binary dumper that does mostly
> > the work of [ old_backend -c "COPY tablex TO STDOUT" ] help us here. 
> > 
> > IIRC the main difficulties in upgrading have always been elsewhere, like
> > migrating always changing system table data.
> 
> The main problem is getting access to the user data after an upgrade. 

Can't it be dumped in pre-upgrade script ?

> There's no particular problem in having to do an initdb, though it is an
> inconvenience; the difficulty is simply that any packaged distribution
> (rpm, deb, xxx) is going to have to replace all the old binaries.  So by
> the time the package is ready to do the database upgrade, it has
> destroyed the means of dumping the old data.  Lamar and I have to jump
> through hoops to get round this -- small hoops with flaming rags round
> them!

IMHO, if rpm and apt can't run a pre-install script before deleting the
old binaries they are going to replace/upgrade then you should complain
to authors of rpm and apt. 

It seems that they are doing things in wrong order. 

The right order should of course be

1) run pre-upgrade (pg_dumpall >dumpfile)
2) upgrade
3) run post-upgrade (initdb; psql < dumpfile)

---------------
Hannu



Re: (A) native Windows port

From
Lamar Owen
Date:
On Tuesday 09 July 2002 01:46 pm, Hannu Krosing wrote:
> On Tue, 2002-07-09 at 18:30, Oliver Elphick wrote:
> > The main problem is getting access to the user data after an upgrade.

> Can't it be dumped in pre-upgrade script ?

The pre-upgrade script is run in an environment that isn't robust enough to 
handle that.  What if you run out of disk space during the dump?  What if a 
postmaster is running -- and many people stop their postmaster before 
upgrading their version of PostgreSQL?

Besides, at least in the case of the RPM, during OS upgrade time the %pre 
scriptlet (the one you allude to) isn't running in a system with all the 
normal tools available.  Nor is there a postmaster running.  Due to a largish 
RAMdisk, a postmaster running might cause all manners of problems.

And an error in the scriptlet could potentially cause the OS upgrade to abort 
in midstream -- not a nice thing to do to users, having a package during 
upgrade abort their OS upgrade when it is a little over half through, and in 
an unbootable state.... No, any dumping of data cannot happen during the %pre 
script -- too many issues there.

> IMHO, if rpm and apt can't run a pre-install script before deleting the
> old binaries they are going to replace/upgrade then you should complain
> to authors of rpm and apt.

Oh, so it's RPM's and APT's problem that we require so many resources during 
upgrade.... :-)

> The right order should of course be

> 1) run pre-upgrade (pg_dumpall >dumpfile)
> 2) upgrade
> 3) run post-upgrade (initdb; psql < dumpfile)

All but the first step works fine.  The first step is impossible in the 
environment in which the %pre script runs.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Hannu Krosing
Date:
On Tue, 2002-07-09 at 13:48, Oliver Elphick wrote:
> On Tue, 2002-07-09 at 01:30, Matthew T. O'Connor wrote:
> > > Oh, that is a problem.  We would have to require the old executables.
> > 
> > Could this be solved with packaging? Meaning can postmasters from old versions 
> > be packed with a new release strictly for the purpose of upgrading?  It is my 
> > understanding that the only old executable needed is the postmaster is that 
> > correct?  Perhaps this also requires adding functionality so that pg_dump can 
> > run against a singer user postmaster.
> > 
> > Example: When PG 7.3 is released, the RPM / deb / setup.exe include the 
> > postmaster binary for v 7.2 (perhaps two or three older versions...). 
> 
> That isn't usable for Debian.  A package must be buildable from source;
> so I would have to include separate (though possibly cut-down) source
> for n previous packages.  It's a horrid prospect and a dreadful kludge
> of a solution - a maintainer's nightmare.

The old postmaster should not be built/distributed. As it is for
_upgrading_ only, you just have to _keep_ it when doing an upgrade, not
build a new "old" one ;)

--------------
Hannu





Re: (A) native Windows port

From
Peter Eisentraut
Date:
Oliver Elphick writes:

> I never have understood why the basic table structure changes so much
> that it can't be read; just what is involved in getting the ability to
> read old versions?

The problem in an extensible system such as PostgreSQL is that virtually
every feature change is reflected by a change in the structure of the
system catalogs.  It wouldn't be such a terribly big problem in theory to
make the backend handle these changes, but you'd end up with a huge bunch
of

if (dataVersion == 1) do this;
else if (dataVersion == 2) do that;
...

which would become slow and unwieldy, and would scare away developers.
That would of course be a self-serving scheme, because if the development
progress slowed down, you would have to update less frequently.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: (A) native Windows port

From
Hannu Krosing
Date:
On Tue, 2002-07-09 at 22:10, Lamar Owen wrote:
> On Tuesday 09 July 2002 01:46 pm, Hannu Krosing wrote:
> > On Tue, 2002-07-09 at 18:30, Oliver Elphick wrote:
> > > The main problem is getting access to the user data after an upgrade.
> 
> > Can't it be dumped in pre-upgrade script ?
> 
> The pre-upgrade script is run in an environment that isn't robust enough to 
> handle that.  What if you run out of disk space during the dump? 

You can either check beforehand or abort and delete the offending
dumpfile.

> What if a postmaster is running -- and many people stop their postmaster before 
> upgrading their version of PostgreSQL?

It is quite easy to both check for a running postmaster and start/stop
one.
> Besides, at least in the case of the RPM, during OS upgrade time the %pre 
> scriptlet (the one you allude to) isn't running in a system with all the 
> normal tools available.

I don't think that postmaster needs very many normal tools - it should
be quite independent, except for compat  libs for larger version
upgrades

> Nor is there a postmaster running.  Due to a largish 
> RAMdisk, a postmaster running might cause all manners of problems.

I don't know anything about the largish RAMdisk,what I meant was that
postmaster (a 2.7 MB program with ~4 MB RAM footprint) could include the
functionality of pg_dump and be runnable in single-user mode for dumping
old databases. 
> And an error in the scriptlet could potentially cause the OS upgrade to abort 
> in midstream -- not a nice thing to do to users, having a package during 
> upgrade abort their OS upgrade when it is a little over half through, and in 
> an unbootable state.... No, any dumping of data cannot happen during the %pre 
> script -- too many issues there.

But is it not the same with _every_ package ? Is there any actual
upgrading done in the pre/post scripts or are they generally not to be
trusted ?

> > IMHO, if rpm and apt can't run a pre-install script before deleting the
> > old binaries they are going to replace/upgrade then you should complain
> > to authors of rpm and apt.
> 
> Oh, so it's RPM's and APT's problem that we require so many resources during 
> upgrade.... :-)

As you said: "The pre-upgrade script is run in an environment that isn't
robust enough to handle that". Ok, maybe it's the environmental issue
then ;)

But more seriously - it is a DATAbase upgrade, not a usual program
upgrade which has a minuscule data part, usually not more than a
configuration file. Postgres, as a very extensible database, has an
ability to keep much of its functionality in the database.

We already do a pretty good job with pg_dump, but I would still not
trust it to do everything automatically and erase the originals.

If we start claiming that postgresql can do automatic "binary" upgrades
there will be much fun with people who have some application that runs
fine on 7.0.3 but barfs on 7.1.2, even if it is due to stricter
adherence to SQL99 and the SQL is completely out of control or rpm/apt. 

There may be even some lazy people who will think that now is the time
to auto-upgrade from 6.x ;/

> > The right order should of course be
> 
> > 1) run pre-upgrade (pg_dumpall >dumpfile)
> > 2) upgrade
> > 3) run post-upgrade (initdb; psql < dumpfile)
> 
> All but the first step works fine.  The first step is impossible in the 
> environment in which the %pre script runs.

Ok. But would it be impossible to move the old postmaster to some other
place, or is the environment too fragile even for that ?

If we move the old postmaster instead of copying then there will be a
lot less issues about running out of disk space :)

What we are facing here is a problem similar to trying upgrade all users
C programs when upgrading gcc. While it would be a good thing, nobody
actually tries to do it - we require them to have source code and to do
the "upgrade" manually.

That's what I propose - dump all databases in pre-upgrade (if you are
concerned about disk usage, run it twice, first to | wc and then to a
file) and try to load in post-upgrade. 

There will still be some things that are be impossible to "upgrade" like
upgrading a.out "C" functions to elf format backend.

Perhaps we will be able to detect what we can actually upgrade and bail
out if we find something unupgradable ?

-------------------
Hannu




Re: (A) native Windows port

From
Lamar Owen
Date:
On Tuesday 09 July 2002 04:17 pm, Hannu Krosing wrote:
> On Tue, 2002-07-09 at 22:10, Lamar Owen wrote:
> > The pre-upgrade script is run in an environment that isn't robust enough
> > to handle that.  What if you run out of disk space during the dump?

> You can either check beforehand or abort and delete the offending
> dumpfile.

And what if you have enough disk space to do the dump, but then that causes 
the OS upgrade to abort because there wasn't enough space left to finish 
upgrading (larger packages, perhaps)?  The system's hosed, and it's our 
fault.

> > What if a postmaster is running -- and many people stop their postmaster
> > before upgrading their version of PostgreSQL?

> It is quite easy to both check for a running postmaster and start/stop
> one.

Not when there is no ps in your path.  Or pg_ctl for that matter.  Nor is 
there necessarily a /proc tree waiting to be exploited.  We're talking the 
anaconda environment, which is tailored for OS installation and upgrading.  
You cannot start a postmaster; you cannot check to see if one is running -- 
you can't even check to see if you're in the anaconda chroot or not, so that 
you can use more tools if not in the OS installation mode.  Again -- the 
total OS upgrade path is a big part of this scenario, as far as the RPM's are 
concerned.  The Debian package may or may not have as grievous a structure.

The only tool you can really use under the anaconda chroot is busybox, and it 
may not do what you want it to.

> > Besides, at least in the case of the RPM, during OS upgrade time the %pre
> > scriptlet (the one you allude to) isn't running in a system with all the
> > normal tools available.

> I don't think that postmaster needs very many normal tools - it should
> be quite independent, except for compat  libs for larger version
> upgrades

The problem there is that you really have no way to tell the system which sets 
of libraries you want.  More to the point: RPM dependencies cannot take 
conditionals and have no concept of if..then.  Nor can you tell the system to 
_install_ the new postgresql instead of _upgrade_ (incidentally, in the RPM 
context an upgrade is an install of the new version followed by an uninstall 
of the old one -- if the new one overwrote files their traces are just wiped 
from the RPM database, if they weren't overwritten, the files get wiped along 
with their respective database entries).  If I could _force_ no upgrades, it 
would be much easier -- but I can't.  Nor can I be sure the %pre scriptlet 
will be run -- some people are so paranoid that they use rpm -U --no-scripts 
religiously.

Thus, when the old postgresql rpm's database entries (in practice virtually 
every old executable gets overwritten) are removed, its dependency 
information is also removed.  As the install/upgrade path builds a complete 
dependency tree of the final installation as part of the process, it knows 
whether the compat libs are needed or not.  If no other program needs them, 
you don't get them, even if you kept an old backend around that does need 
them.  But you really can't make the -server subpackage Require the compat 
packages, because you don't necessarily know what they will be named, or 
anything else they will provide.  If compat libs are even available for the 
version you're upgrading from.

> > Nor is there a postmaster running.  Due to a largish
> > RAMdisk, a postmaster running might cause all manners of problems.

> I don't know anything about the largish RAMdisk,what I meant was that
> postmaster (a 2.7 MB program with ~4 MB RAM footprint) could include the
> functionality of pg_dump and be runnable in single-user mode for dumping
> old databases.

If a standalone backend could reliably dump the database without needing 
networking and many of the other things we take for granted (the install mode 
is a cut-down single-user mode of sorts, running in a chroot of a sort), then 
it might be worth looking at.

> > And an error in the scriptlet could potentially cause the OS upgrade to
> > abort in midstream -- not a nice thing to do to users, having a package

> But is it not the same with _every_ package ? Is there any actual
> upgrading done in the pre/post scripts or are they generally not to be
> trusted ?

No other package is so *different* to require such a complicated upgrade 
process.  Some packages do more with their scriptlets than others, but no 
package does anything near as complicated as dumping a database.  

> We already do a pretty good job with pg_dump, but I would still not
> trust it to do everything automatically and erase the originals.

And that's a big problem.  We shouldn't have that ambivalence.  IOW, I think 
we need more upgrade testing.  I don't think I've seen a cycle yet that 
didn't have upgrade problems.

> If we start claiming that postgresql can do automatic "binary" upgrades
> there will be much fun with people who have some application that runs
> fine on 7.0.3 but barfs on 7.1.2, even if it is due to stricter
> adherence to SQL99 and the SQL is completely out of control or rpm/apt.

That's just us not being backward compatible.  I'm impacted by those things, 
being that I'm running OpenACS here on 7.2.1, when OACS is optimized for 7.1.  
Certain things are very broken.

> There may be even some lazy people who will think that now is the time
> to auto-upgrade from 6.x ;/

And why not?  If Red Hat Linux can upgrade a whole operating environment from 
version 2.0 all the way up to 7.3 (which they claim), why can't we?  If we 
can just give people the tools to deal with potential problems after the 
upgrade, then I think we can do it.  Such a tool as a old-version dumper 
would be a lifesaver to people, I believe.

> Ok. But would it be impossible to move the old postmaster to some other
> place, or is the environment too fragile even for that ?

That is what I have done in the past -- the old backend got copied over (the 
executable), then a special script was run (after upgrade, manually, by the 
user) that tried to pull a dump using the old backend.  It wasn't reliable.  
The biggest problem is that I have no way of insuring that the old backend's 
dependencies stay satisfied -- 'satisfied' meaning that the old glibc stays 
installed for compatibility.  Glibc, after all, is being upgraded out from 
under us, and I can't stop it or even slow it down.

And this could even be that most pathological of cases, where an a.out based 
system is being upgraded to an elf system without a.out kernel support.  
(point of note: PostgreSQL first appeared in official Red Hat Linux as 
version 6.2.1, released with Red Hat Linux 5.0, which was ELF/glibc 
(contrasted to 3.0.3 which was a.out/libc4 and 4.x which was ELF/libc5) -- 
but I don't know about the Debian situation and its pathology.)

> If we move the old postmaster instead of copying then there will be a
> lot less issues about running out of disk space :)

The disk space issue is with the ASCII dump file itself.  Furthermore, what 
happens if the dumpfile is greater than MAXFILESIZE?  Again, wc isn't in the 
path (because it too is being upgraded out from under us -- nothing is left 
untouched by the upgrade EXCEPT that install image RAMdisk, which has a very 
limited set of tools (and a nonstandard kernel to boot).  Networking might or 
might not be available.  Unix domain sockets might or might not be available.

But the crux is that the OS upgrade environment is designed to do one thing 
and one thing alone -- get the OS installed and/or upgraded.  General purpose 
tools just take up space on the install media, a place where space is at a 
very high premium.

> What we are facing here is a problem similar to trying upgrade all users
> C programs when upgrading gcc. While it would be a good thing, nobody
> actually tries to do it - we require them to have source code and to do
> the "upgrade" manually.

Is that directly comparable?  If you have alot of user functions written in C 
then possibly.  But I'm not interested in pathological cases -- I'm
interested in something that works OK for the majority of users.  As long as 
it works properly for users who aren't sophisticated enough to need the 
pathological cases handled, then it should be available.  Besides, one can 
always dump and restore if one wants to.  And just how well does the 
venerable dump/restore cycle work in the presence of these pathological 
cases?

Red Hat Linux doesn't claim upgradability in the presence of highly 
pathological cases (such as rogue software installed from non-RPM sources, or 
non-Red Hat RPM's installed (particularly Ximian Gnome).  So you have to go 
through a process with that.  But it is something you can recover from after 
the upgrade is complete.  That's what I'm after.  I don't hold out hope for a 
fully automatic upgrade -- it would be nice, but we are too extensible for it 
to be practical.  No -- I want tools to be able to recover my old data 
without the old version backend held-over from the previous install.  And I 
think this is a very resonable expectation.

> That's what I propose - dump all databases in pre-upgrade (if you are
> concerned about disk usage, run it twice, first to | wc and then to a
> file) and try to load in post-upgrade.

The wc utility isn't in the path in an OS install situation.  The df utility 
isn't in the path, either.  You can use python, though. :-)  Not that that 
would be a good thing in this context, however.

> There will still be some things that are be impossible to "upgrade" like
> upgrading a.out "C" functions to elf format backend.

If a user is sophisticated enough to write such, that user is sophisticated 
enough to take responsibility for the upgrade.  I'm not talking about users 
of that level here.  But even then, it would be nice to at least get the data 
back out -- the function can then be rebuilt easily enough from source.

> Perhaps we will be able to detect what we can actually upgrade and bail
> out if we find something unupgradable ?

All is alleviated if I can run a utility after the fact to read in my old 
data, without requiring the old packaged binaries.  I don't have to 
workaround ANYTHING.

Again I say -- would such a data dumper not be useful in cases of system 
catalog corruption that prevents a postmaster from starting?  I'm talking 
about a multipurpose utility here, not just something to make my life as RPM 
maintainer easy.

The pg_fsck program is a good beginning to such a program.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Rod Taylor
Date:
On Tue, 2002-07-09 at 19:09, Lamar Owen wrote:
> On Tuesday 09 July 2002 04:17 pm, Hannu Krosing wrote:
> > On Tue, 2002-07-09 at 22:10, Lamar Owen wrote:
> > > The pre-upgrade script is run in an environment that isn't robust enough
> > > to handle that.  What if you run out of disk space during the dump?
> 
> > You can either check beforehand or abort and delete the offending
> > dumpfile.
> 
> And what if you have enough disk space to do the dump, but then that causes 
> the OS upgrade to abort because there wasn't enough space left to finish 
> upgrading (larger packages, perhaps)?  The system's hosed, and it's our 
> fault.

What normally happens when you have low amounts of free diskspace and
attempt to upgrade the system?

On FreeBSD (portupgrade) it rolls back any changes it was attempting.  I
don't know other systems to be able to say.


Postgresql may require more diskspace to upgrade than most packages, 
but if the tools cannot fail cleanly it is already a problem that needs
to be addressed.



Re: (A) native Windows port

From
Lamar Owen
Date:
On Tuesday 09 July 2002 06:20 pm, Peter Eisentraut wrote:
> The problem in an extensible system such as PostgreSQL is that virtually
> every feature change is reflected by a change in the structure of the
> system catalogs.  It wouldn't be such a terribly big problem in theory to
> make the backend handle these changes, but you'd end up with a huge bunch
> of

> if (dataVersion == 1)
>   do this;
> else if (dataVersion == 2)
>   do that;

Ok, pardon me while I take a moment to braindump here.  And Peter, you of all 
people caused this braindump, so, 'hold on to your hat' :-).

You know, it occurs to me that we are indeed an Object RDBMS, but not in the 
conventional sense.  Our whole system is object oriented -- we are extensible 
by the data and the methods (functions) that operate on that data. In fact, 
the base system is simply a set of objects, all the way down to the base data 
types and their functions. So the problem jells down to this:

How does one upgrade the method portion of the object, bringing in new object 
data if necessary, while leaving non-impacted data alone?  Is there a way of 
partitioning the method-dependent object data from the non-object data?  This 
would require a complete system catalog redesign -- or would it?  

Can such a migration be object-oriented in itself, with the new version 
inheriting the old version and extending it.... (like I said, I'm 
braindumping here -- this may not be at all coherent -- but my stream of 
consciousness rarely is [coherent]).  Can our core be written/rewritten in 
such a way as to be _completely_ object driven? Someone steeped a little 
better in object theory please take over now....

Or am I totally out in left field here?
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Lamar Owen
Date:
On Tuesday 09 July 2002 07:19 pm, Rod Taylor wrote:
> On Tue, 2002-07-09 at 19:09, Lamar Owen wrote:
> > And what if you have enough disk space to do the dump, but then that
> > causes the OS upgrade to abort because there wasn't enough space left to
> > finish upgrading (larger packages, perhaps)?  The system's hosed, and
> > it's our fault.

> What normally happens when you have low amounts of free diskspace and
> attempt to upgrade the system?

Anaconda calculates (internally -- it's a Python program) the space required 
by the upgrade and won't let you proceed if you don't have enough space as 
reported by the RPM headers.  It's impossible to know ahead of time how much 
space will be required by an ASCII dump of the PostgreSQL database, and thus 
it cannot be taken into account by that algorithm.

As to failing cleanly, work is underway to allow RPM to rollback entire OS 
upgrades.  But again the disk space requirement shoots through the ceiling if 
you do this.  Already RPM can rollback the transaction being done on the RPM 
database (it's a db3 database system), but rolling back the filesystem is a 
little different.

But anaconda (which doesn't use the command line RPM anymore, it uses librpm 
to do its own RPM processing) checks beforehand how much space is needed and 
won't let you overspend disk space during the system upgrade.

The command-line RPM will also do this, and won't let you upgrade RPM's if 
there's not enough disk space, as calculated by reading the RPM header, which 
has the amount of space the uncompressed package takes (calculated as part of 
the RPM build process).

But if you throw in an unknown increase in space that anaconda/rpm cannot 
grok, then you cause a situation.

Can the ports system take into account the space required for a dumpfile?? :-)
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Lamar Owen
Date:
[replying to myself]
On Tuesday 09 July 2002 07:34 pm, Lamar Owen wrote:
> if you do this.  Already RPM can rollback the transaction being done on the
> RPM database (it's a db3 database system), but rolling back the filesystem
> is a little different.

As a note of interest, RPM itself is backed by a database, db3.  Prior to 
version 4.x, it was backed by db1.  Upgrading between the versions of RPM is 
simply -- installing db3 and dependenies, upgrade RPM, and run 'rpm 
--rebuilddb' -- which works most of the time, but there are pathological 
cases.....

You now are running db3 instead of db1, if you didn't get bit by a 
pathological case. :-)
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Rod Taylor
Date:
> Can the ports system take into account the space required for a dumpfile?? :-)

It cheats by keeping a backup of the old version -- makes an installable
package out of the currently installed version.  This is removed once
the package has been successfully upgraded (including dependencies).

On failure, it rolls back any packages (and those that depend on it) to
prior versions it backed up and continues on trying to upgrade other
parts of the system which don't depend on the rolled back portion.

Portupgrade regularly upgrades part of the system if the ports tree is
broken, won't build (architecture issues), couldn't download XYZ item,
or has run into other problems.  PostgreSQL in this case simply wouldn't
get upgraded with everything else -- reporting errors at the end.  That
said, Postgresql also may no longer work after the upgrade -- but I
guess thats what the 'test' mode is used to prevent.








Re: (A) native Windows port

From
Oliver Elphick
Date:
On Wed, 2002-07-10 at 00:09, Lamar Owen wrote:
> On Tuesday 09 July 2002 04:17 pm, Hannu Krosing 
> > It is quite easy to both check for a running postmaster and start/stop
> > one.
> 
> Not when there is no ps in your path.  Or pg_ctl for that matter.  Nor is 
> there necessarily a /proc tree waiting to be exploited.  We're talking the 
> anaconda environment, which is tailored for OS installation and upgrading.  
> You cannot start a postmaster; you cannot check to see if one is running -- 
> you can't even check to see if you're in the anaconda chroot or not, so that 
> you can use more tools if not in the OS installation mode.  Again -- the 
> total OS upgrade path is a big part of this scenario, as far as the RPM's are 
> concerned.  The Debian package may or may not have as grievous a structure.

No.  I don't have anything like your problems to contend with!

I can and do copy the old binaries and libraries in the pre-removal
script, which means I have a pretty good chance of accomplishing an
upgrade without user intervention.  If I had your problems I'd give up!




Re: (A) native Windows port

From
Jan Wieck
Date:
Oliver Elphick wrote:
> 
> The current upgrade process for PostgreSQL is founded on the idea that
> people build from source.  With binary distributions, half the users
> wouldn't know what to do with source; they expect (and are entitled to
> expect) that an upgrade will progress without the need for significant
> intervention on their part.  PostgreSQL makes this really difficult for
> the package maintainers, and this has a knock-on effect on the
> reliability of the upgrade process and thus on PostgreSQL itself.

I have to object here. The PostgreSQL upgrade process is based on
the idea of dump, install, initdb, restore. That has nothing to
do with building from source or installing from binaries.

The problem why this conflicts with these package managers is,
because they work package per package, instead of looking at the
big picture. Who said you can replace package A before running
the pre-upgrade script of dependent package B? Somehow this looks
like a foreign key violation to me. Oh, I forgot, RI constraints
are for documentation purposes only ... Greetings from the MySQL
documentation ;-)


Jan


-- 

#======================================================================#
# It's easier to get forgiveness for being wrong than for being
right. #
# Let's break this rule - forgive
me.                                  #
#==================================================
JanWieck@Yahoo.com #


Re: (A) native Windows port

From
Jan Wieck
Date:
Lamar Owen wrote:
> 
> [replying to myself]
> On Tuesday 09 July 2002 07:34 pm, Lamar Owen wrote:
> > if you do this.  Already RPM can rollback the transaction being done on the
> > RPM database (it's a db3 database system), but rolling back the filesystem
> > is a little different.
> 
> As a note of interest, RPM itself is backed by a database, db3.  Prior to
> version 4.x, it was backed by db1.  Upgrading between the versions of RPM is
> simply -- installing db3 and dependenies, upgrade RPM, and run 'rpm
> --rebuilddb' -- which works most of the time, but there are pathological
> cases.....
> 
> You now are running db3 instead of db1, if you didn't get bit by a
> pathological case. :-)

And how big/complex is the db1/3 system catalog we're talking
about exactly? How many rewrite rules have to be converted into
the new parsetree format during an RPM upgrade? 


Jan

-- 

#======================================================================#
# It's easier to get forgiveness for being wrong than for being
right. #
# Let's break this rule - forgive
me.                                  #
#==================================================
JanWieck@Yahoo.com #


Re: (A) native Windows port

From
Hannu Krosing
Date:
On Wed, 2002-07-10 at 01:09, Lamar Owen wrote:
> On Tuesday 09 July 2002 04:17 pm, Hannu Krosing wrote:
> > On Tue, 2002-07-09 at 22:10, Lamar Owen wrote:
> > > The pre-upgrade script is run in an environment that isn't robust enough
> > > to handle that.  What if you run out of disk space during the dump?
> 
> > You can either check beforehand or abort and delete the offending
> > dumpfile.
> 
...
> > That's what I propose - dump all databases in pre-upgrade (if you are
> > concerned about disk usage, run it twice, first to | wc and then to a
> > file) and try to load in post-upgrade.
> 
> The wc utility isn't in the path in an OS install situation.  The df utility 
> isn't in the path, either.  You can use python, though. :-)  Not that that 
> would be a good thing in this context, however.

Why not ? 

The following is wc in python

#!/usr/bin/python
import sys, string
bytes,words,lines = 0,0,0
while 1:   s = sys.stdin.readline()   if not s: break   bytes = bytes + len(s)   words = words + len(string.split(s))
lines= lines + 1
 
sys.stdout.write('%7d %7d %7d\n' % (lines,words,bytes))


And I have written custom postgres table dumpers in python without too
much effort (except reverse-engineering the page structure ;) for both
6.x and 7.x database tables, so we could actually use python here too. 

The basic user_data extractor part is done in about 50 lines - I did not
need much else as I wrote custom datatype converters for the specific
table I needed.

The generic part ( conversions and determining if tuples are live)
should also not bee too difficult.

The only part I can see right away as hard to re-implement in python is
TOAST.

Still I guess that the basic db_dump.py app will be somewhere between
500 and 5000 lines long, with possibly the toast compression module done
as c-language module modtoast.so


The only problem with this approach is that it needs maintaining
separately from postgres proper. OTOH, this may also be a good thing, as
a separate reimplementation is only known working guarantee that we
actually know what our page format is ;) as the docs have always been
wrong about this.

> Again I say -- would such a data dumper not be useful in cases of system 
> catalog corruption that prevents a postmaster from starting?  I'm talking 
> about a multipurpose utility here, not just something to make my life as RPM 
> maintainer easy.
> 
> The pg_fsck program is a good beginning to such a program.

Where can I fing pg_fsck ? 

It is not in recent CVS snapshots.

-------------
Hannu



Re: (A) native Windows port

From
Lamar Owen
Date:
On Wednesday 10 July 2002 03:24 am, Jan Wieck wrote:
> Oliver Elphick wrote:
> > The current upgrade process for PostgreSQL is founded on the idea that
> > people build from source.  With binary distributions, half the users
> > wouldn't know what to do with source; they expect (and are entitled to

> I have to object here. The PostgreSQL upgrade process is based on
> the idea of dump, install, initdb, restore. That has nothing to
> do with building from source or installing from binaries.

Let me interject a minor point here.  I recall upgrade cycles where I had to 
install a newer pg_dump in order to get my data out of the old system due to 
bugs in the prior pg_dump.  Getting two versions of PostgreSQL to cooexist 
peacefully in a binary packaged environment is a completely different problem 
than the typical 'from source' installation path -- which almost implies two 
versions available concurrently.  I believe this is the artifact Oliver was 
alluding to.  

I personally have not had the luxury of having two complete installations 
available at one instant during RPM upgrades.  Nor will any users of 
prepackaged binaries.

> The problem why this conflicts with these package managers is,
> because they work package per package, instead of looking at the
> big picture. Who said you can replace package A before running
> the pre-upgrade script of dependent package B?

How does this create the problem?  The postgresql-server subpackages of two 
versions are 'Package A' above.  There is no package B. 

Define 'the big picture' for all possible permutations of installed packages, 
please.

> Somehow this looks
> like a foreign key violation to me. Oh, I forgot, RI constraints
> are for documentation purposes only ... Greetings from the MySQL
> documentation ;-)

Is sarcasm really necessary?
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Lamar Owen
Date:
[cc: trimmed]

On Wednesday 10 July 2002 03:42 am, Jan Wieck wrote:
> Lamar Owen wrote:
> > As a note of interest, RPM itself is backed by a database, db3.  Prior to
> > version 4.x, it was backed by db1.  Upgrading between the versions of RPM
> > is simply -- installing db3 and dependenies, upgrade RPM, and run 'rpm
> > --rebuilddb' -- which works most of the time, but there are pathological
> > cases.....

> > You now are running db3 instead of db1, if you didn't get bit by a
> > pathological case. :-)

> And how big/complex is the db1/3 system catalog we're talking
> about exactly? 

Well, on a fully installed system it's about 44MB.  The RPM database isn't 
terribly complicated, but it's not trivial, either.

However, unless I am mistaken the generic db3 situation is easy migration.

>How many rewrite rules have to be converted into
> the new parsetree format during an RPM upgrade?

Don't know if anything comparable exists.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Lamar Owen
Date:
On Wednesday 10 July 2002 09:11 am, Hannu Krosing wrote:
> On Wed, 2002-07-10 at 01:09, Lamar Owen wrote:
> > The wc utility isn't in the path in an OS install situation.  The df
> > utility isn't in the path, either.  You can use python, though. :-)  Not
> > that that would be a good thing in this context, however.

> Why not ?

> The following is wc in python

[snip]

> And I have written custom postgres table dumpers in python without too
> much effort (except reverse-engineering the page structure ;) for both
> 6.x and 7.x database tables, so we could actually use python here too.

I'm willing to look into this.  However, the dump still has to be pulled with 
a standalone backend -- no networking availability can be assumed.

> The only problem with this approach is that it needs maintaining
> separately from postgres proper. OTOH, this may also be a good thing, as
> a separate reimplementation is only known working guarantee that we
> actually know what our page format is ;) as the docs have always been
> wrong about this.

Well, I could deal with that.

> > The pg_fsck program is a good beginning to such a program.

> Where can I fing pg_fsck ?

[looking in my bookmarks....]
http://svana.org/kleptog/pgsql/pgfsck.html

> It is not in recent CVS snapshots.

Martijn hasn't submitted it yet (AFAICT) for inclusion.  I believe if nothing 
else it should be in contrib.

Contrary to some people's apparent perception, I'm actually fairly flexible on 
this as long as the basic points can be dealt with.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Hannu Krosing
Date:
On Wed, 2002-07-10 at 16:20, Lamar Owen wrote:
> On Wednesday 10 July 2002 09:11 am, Hannu Krosing wrote:
> > On Wed, 2002-07-10 at 01:09, Lamar Owen wrote:
> > > The wc utility isn't in the path in an OS install situation.  The df
> > > utility isn't in the path, either.  You can use python, though. :-)  Not
> > > that that would be a good thing in this context, however.
>
> > Why not ?
>
> > The following is wc in python
>
> [snip]
>
> > And I have written custom postgres table dumpers in python without too
> > much effort (except reverse-engineering the page structure ;) for both
> > 6.x and 7.x database tables, so we could actually use python here too.
>
> I'm willing to look into this.  However, the dump still has to be pulled with
> a standalone backend -- no networking availability can be assumed.

Actually it works on raw table file ;)

I attach code that dumps data from page file for table of 4 ints all NOT
NULL, like

create table fourints(
  i1 int not null,
  i2 int not null,
  i3 int not null,
  i4 int not null
);

the script is meant for quick and dirty resque operations, and requires
that one writes their own data-field extractor code. I have used it
mainly to resurrect accidentally deleted data.

it is for 7.x style pagefile layout

-------------------
Hannu


Attachment

Re: (A) native Windows port

From
Hannu Krosing
Date:
On Wed, 2002-07-10 at 16:15, Lamar Owen wrote:
> [cc: trimmed]
> 
> On Wednesday 10 July 2002 03:42 am, Jan Wieck wrote:
> >How many rewrite rules have to be converted into
> > the new parsetree format during an RPM upgrade?
> 
> Don't know if anything comparable exists.
>

IMHO the best solution here is to also keep the source code for anything
that is usually kept as parse trees or somesuch so that one can get at
it without full backend running.

-------------
Hannu



Re: (A) native Windows port

From
Hannu Krosing
Date:
On Wed, 2002-07-10 at 16:20, Lamar Owen wrote:
> On Wednesday 10 July 2002 09:11 am, Hannu Krosing wrote:
> 
> > The only problem with this approach is that it needs maintaining
> > separately from postgres proper. OTOH, this may also be a good thing, as
> > a separate reimplementation is only known working guarantee that we
> > actually know what our page format is ;) as the docs have always been
> > wrong about this.
> 
> Well, I could deal with that.

also we must be aware that the page format is most likely
platform-dependant 

-----------
Hannu


Re: (A) native Windows port

From
Lamar Owen
Date:
On Wednesday 10 July 2002 11:48 am, Hannu Krosing wrote:
> On Wed, 2002-07-10 at 16:20, Lamar Owen wrote:
> > On Wednesday 10 July 2002 09:11 am, Hannu Krosing wrote:
> > > And I have written custom postgres table dumpers in python without too
> > > much effort (except reverse-engineering the page structure ;) for both
> > > 6.x and 7.x database tables, so we could actually use python here too.

> > I'm willing to look into this.  However, the dump still has to be pulled
> > with a standalone backend -- no networking availability can be assumed.

> Actually it works on raw table file ;)

> the script is meant for quick and dirty resque operations, and requires
> that one writes their own data-field extractor code. I have used it
> mainly to resurrect accidentally deleted data.

> it is for 7.x style pagefile layout

Hmmm.  This is interesting stuff.  I'll have to take a look at it once I'm 
finished re-learning Fortran 77 for a project I'm doing (34MB of DEC Fortran 
source that g77 doesn't like very well) for work.  I have a hard time 
switching language gears.  Particularly the Fortran 77 -> Python gear... :-) 
Although at least the fixed-form paradigm stays there in the transition. :-)  
It's been a very long time since I've done Fortran of this complexity.  
Actually, I've never done Fortran of _this_ complexity -- this is serious 
number-crunching stuff that uses all manners of higher math (tensors, even).  
There is no direct C equivalent to some of the stuff this code is doing -- 
which is part of the reason g77 is having problems.  But I digress.

Getting the %pre scriptlet to use a non-sh interpreter is undocumented, but 
not hard.  :-) (actually, I stumbled upon it by accident one time -- that 
time it was a bug....)  Now to see if it can be done consistently in both the 
anaconda chroot as well as a standard rpm command line invocation.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Hannu Krosing
Date:
On Wed, 2002-07-10 at 19:56, Lamar Owen wrote:
> On Wednesday 10 July 2002 11:48 am, Hannu Krosing wrote:
> > On Wed, 2002-07-10 at 16:20, Lamar Owen wrote:
> > > On Wednesday 10 July 2002 09:11 am, Hannu Krosing wrote:
> > > > And I have written custom postgres table dumpers in python without too
> > > > much effort (except reverse-engineering the page structure ;) for both
> > > > 6.x and 7.x database tables, so we could actually use python here too.
>
> > > I'm willing to look into this.  However, the dump still has to be pulled
> > > with a standalone backend -- no networking availability can be assumed.
>
> > Actually it works on raw table file ;)
>
> > the script is meant for quick and dirty resque operations, and requires
> > that one writes their own data-field extractor code. I have used it
> > mainly to resurrect accidentally deleted data.
>
> > it is for 7.x style pagefile layout
>
> Hmmm.  This is interesting stuff.  I'll have to take a look at it once I'm
> finished re-learning Fortran 77 for a project I'm doing (34MB of DEC Fortran
> source that g77 doesn't like very well) for work.  I have a hard time
> switching language gears.  Particularly the Fortran 77 -> Python gear... :-)
> Although at least the fixed-form paradigm stays there in the transition. :-)
> It's been a very long time since I've done Fortran of this complexity.
> Actually, I've never done Fortran of _this_ complexity -- this is serious
> number-crunching stuff that uses all manners of higher math (tensors, even).
> There is no direct C equivalent to some of the stuff this code is doing --
> which is part of the reason g77 is having problems.  But I digress.

Once you understand what te code is doing you can port it to python
using Numerical Python (http://www.pfdubois.com/numpy/) and/or
Scientific Python (http://starship.python.net/~hinsen/ScientificPython/)
to get a head-start in total conversion to python ;)

You may even try using F2PY ­ Fortran to Python Interface Generator
(http://cens.ioc.ee/projects/f2py2e/).
> Getting the %pre scriptlet to use a non-sh interpreter is undocumented, but
> not hard.  :-) (actually, I stumbled upon it by accident one time -- that
> time it was a bug....)  Now to see if it can be done consistently in both the
> anaconda chroot as well as a standard rpm command line invocation.

Actually, if the python dumper can be made to work somewhat reiably it
can be run after install/upgrade without too much trouble.

--------------
Hannu



Re: (A) native Windows port

From
Lamar Owen
Date:
On Wednesday 10 July 2002 10:26 am, Hannu Krosing wrote:
> Actually, if the python dumper can be made to work somewhat reiably it
> can be run after install/upgrade without too much trouble.

Yes, yes, of course.  My bad -- brain needs oil change... :-)

Thanks for the links to the python stuff, particularly the fortran to python 
translator.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: (A) native Windows port

From
Jan Wieck
Date:
Lamar Owen wrote:

> On Wednesday 10 July 2002 03:24 am, Jan Wieck wrote:
> 
> > The problem why this conflicts with these package managers is,
> > because they work package per package, instead of looking at the
> > big picture. Who said you can replace package A before running
> > the pre-upgrade script of dependent package B?
> 
> How does this create the problem?  The postgresql-server subpackages of two
> versions are 'Package A' above.  There is no package B.

Someone was talking about doing a complete OS upgrade and updating
something the new PG release (that is scheduled for update later) needs
but what makes the current old release not functional any more. Maybe I
misunderstood something.

> 
> Define 'the big picture' for all possible permutations of installed packages,
> please.

Got me on that. Sure, with all the possible permutations there is
allways an unsolveable dependency. What I think is, that knowing all
packages that are installed, that are to be added/removed/updated, it
would be possible to run pre-install, pre-update, pre-remove scripts for
all packages first. They have to clean up, save info and the like (dump
in our case, maybe install a new version of pg_dump runnable in the old
environment), but NOT disable functionality of any package. Second
install all binaries. Third run a second round of scripts for all
packages, finalizing the packages action.

> 
> > Somehow this looks
> > like a foreign key violation to me. Oh, I forgot, RI constraints
> > are for documentation purposes only ... Greetings from the MySQL
> > documentation ;-)
> 
> Is sarcasm really necessary?

Really really! I am dependent on it. If I don't get my daily dosis of
sarcasm, I become extremely ironic or sometimes cynic.


Jan

-- 

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: (A) native Windows port

From
Lamar Owen
Date:
On Wednesday 10 July 2002 04:42 pm, Jan Wieck wrote:
> Lamar Owen wrote:
> > On Wednesday 10 July 2002 03:24 am, Jan Wieck wrote:
> > > The problem why this conflicts with these package managers is,
> > > because they work package per package, instead of looking at the
> > > big picture. Who said you can replace package A before running
> > > the pre-upgrade script of dependent package B?

> > The postgresql-server subpackages of
> > two versions are 'Package A' above.  There is no package B.

> Someone was talking about doing a complete OS upgrade and updating
> something the new PG release (that is scheduled for update later) needs
> but what makes the current old release not functional any more. Maybe I
> misunderstood something.

Yes, you misunderstood.  The whole release is upgraded, and its the database 
itself that breaks.  How is the package manager supposed to know you had to 
make a backup copy of an executable in order to cater to the broken upgrade 
cycle?  (Is that sarcastic enough :-)... being that you like your daily dose 
:-)).

The backup executable no longer 'belongs' to any package as far as the rpm 
database is concerned.  

Suppose the upgrade in question was from PostgreSQL 7.0.3-2 to 7.2.1-5 (the 
'-2' and '-5' are the release numbers of that particular RPMset -- a version 
number for the package independent of the upstream program).  The backend 
itself belongs to package 'postgresql-server' in both versions.  After 
checking that postinstallation dependencies will be satisfied by its actions, 
the upgrade proceeds to install postgresql-server-7.2.1-5, which has a %pre 
scriptlet that makes a copy of /usr/bin/postgres and links, along with libpq 
and pg_dump for that version, into /usr/lib/pgsql/backups (IIRC -- it's been 
a long day and I haven't checked the accuracy of that detail).  

Said %pre scriptlet runs, then rpm unpacks its payload, a cpio archive 
containing the files of the package.  /usr/bin/postgres is one of the files 
overwritten in this process.  There could be trigger scripts installed by 
other packages run at this time.  Then the %post scriptlet is run, which in 
practice creates a postgres user and group, chowns a few directories, and 
runs ldconfig to get any new shared libraries.  

Now the postgresql-server-7.0.3-2 package gets uninstalled.  First, the 
%preuninst scriptlet runs.  Note that a conditional is available to 
distinguish between an upgrade 'uninstall' and a real uninstall.  Then any 
registered triggers are run.  Then any non-overwritten files are removed, and 
the database entries for 7.0.3-2 are removed.  Finally, the %postuninst 
scriptlet runs.

You now have the new package in place.

During an OS upgrade the dependencies are finagled in such a way that the 
'satisfied dependencies' for postgresql-server-7.0.3-2, which is going to be 
replaced by postgresql-server-7.2.1-5's, won't be required any more.  Unless 
another package requires the various shared libraries the 7.0.3-2 backend 
required, those shared libraries may get 'upgraded' out of the way -- the 
scriptlets have no way of communicating to the upgrade process 'hey! hold on 
to the dependency information for postgresql-server-7.0.3-2, even though that 
package is no longer marked as being installed.'

Whew.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11