Thread: pg_upgrade if 'postgres' database is dropped

pg_upgrade if 'postgres' database is dropped

From
Heikki Linnakangas
Date:
pg_upgrade doesn't work if the 'postgres' database has been dropped in 
the old cluster:

~/pgsql.master$ bin/pg_upgrade -b ~/pgsql.91stable/bin -B bin/ -d 
~/pgsql.91stable/data -D data-upgraded/
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories                 ok
Checking cluster versions                                   ok
Checking database user is a superuser                       ok
Checking for prepared transactions                          ok
Checking for reg* system OID user data types                ok
Checking for contrib/isn with bigint-passing mismatch       ok
Creating catalog dump                                       ok
Checking for prepared transactions                          ok

New cluster database "postgres" does not exist in the old cluster
Failure, exiting


That's a bit unfortunate. We have some other tools that don't work 
without the 'postgres' database, like 'reindexdb -all', but it would 
still be nice if pg_upgrade did.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Tue, Oct 4, 2011 at 12:11 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> pg_upgrade doesn't work if the 'postgres' database has been dropped in the
> old cluster:
>
> ~/pgsql.master$ bin/pg_upgrade -b ~/pgsql.91stable/bin -B bin/ -d
> ~/pgsql.91stable/data -D data-upgraded/
> Performing Consistency Checks
> -----------------------------
> Checking current, bin, and data directories                 ok
> Checking cluster versions                                   ok
> Checking database user is a superuser                       ok
> Checking for prepared transactions                          ok
> Checking for reg* system OID user data types                ok
> Checking for contrib/isn with bigint-passing mismatch       ok
> Creating catalog dump                                       ok
> Checking for prepared transactions                          ok
>
> New cluster database "postgres" does not exist in the old cluster
> Failure, exiting
>
>
> That's a bit unfortunate. We have some other tools that don't work without
> the 'postgres' database, like 'reindexdb -all', but it would still be nice
> if pg_upgrade did.

+1.  I think our usual policy is to try postgres first and then try
template1, so it would seem logical for pg_upgrade to do the same.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Tue, Oct 4, 2011 at 12:11 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > pg_upgrade doesn't work if the 'postgres' database has been dropped in the
> > old cluster:
> >
> > ~/pgsql.master$ bin/pg_upgrade -b ~/pgsql.91stable/bin -B bin/ -d
> > ~/pgsql.91stable/data -D data-upgraded/
> > Performing Consistency Checks
> > -----------------------------
> > Checking current, bin, and data directories ? ? ? ? ? ? ? ? ok
> > Checking cluster versions ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok
> > Checking database user is a superuser ? ? ? ? ? ? ? ? ? ? ? ok
> > Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok
> > Checking for reg* system OID user data types ? ? ? ? ? ? ? ?ok
> > Checking for contrib/isn with bigint-passing mismatch ? ? ? ok
> > Creating catalog dump ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok
> > Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok
> >
> > New cluster database "postgres" does not exist in the old cluster
> > Failure, exiting
> >
> >
> > That's a bit unfortunate. We have some other tools that don't work without
> > the 'postgres' database, like 'reindexdb -all', but it would still be nice
> > if pg_upgrade did.
> 
> +1.  I think our usual policy is to try postgres first and then try
> template1, so it would seem logical for pg_upgrade to do the same.

Well, it is a little tricky.  The problem is that I am not just
connecting to a database --- I am creating support functions in the
database.  Now, this is complex because template1 is the template for
new databases, except for pg_dump which uses template0.

So, it is going to be confusing to support both databases because there
is the cleanup details I have to document if I use template1.

Also, pg_dumpall unconditionally connects to the postgres database to
restore roles:
    fprintf(OPF, "\\connect postgres\n\n");

We could connect to template1 for this, but I am not comfortable
changing this.  And when pg_dumpall throws an error for a missing
postgres database, pg_upgrade is going to fail.

We started using the postgres database as a database for connections ---
do we want to change that?  We certainly can't have the pg_dumpall
output _conditionally_ connecting to template1.

I am feeling this isn't worth pursuing.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Stephen Frost
Date:
* Bruce Momjian (bruce@momjian.us) wrote:
> So, it is going to be confusing to support both databases because there
> is the cleanup details I have to document if I use template1.

Presumably there's some other database in the system besides template1
if postgres doesn't exist..  Would it be possible to just make it
configurable?  Then the user could pick a non-template database.  Having
it fail if the option isn't used and the default postgres isn't there is
fine, imv.
Thanks,
    Stephen

Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (bruce@momjian.us) wrote:
> > So, it is going to be confusing to support both databases because there
> > is the cleanup details I have to document if I use template1.
> 
> Presumably there's some other database in the system besides template1
> if postgres doesn't exist..  Would it be possible to just make it
> configurable?  Then the user could pick a non-template database.  Having
> it fail if the option isn't used and the default postgres isn't there is
> fine, imv.

I have not seen enough demand to make this a user-visible configuration.
We can just tell them to create a postgres database.   Frankly, they
would have had to _remove_ the postgres database after initdb for it not
to be there, and they are instructed to change nothing about the new
database.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Stephen Frost
Date:
* Bruce Momjian (bruce@momjian.us) wrote:
> I have not seen enough demand to make this a user-visible configuration.
> We can just tell them to create a postgres database.   Frankly, they
> would have had to _remove_ the postgres database after initdb for it not
> to be there, and they are instructed to change nothing about the new
> database.

Yes, they would have removed it because they didn't want it.  As I
recall, part of the agreement to create an extra database by default was
that it could be removed if users didn't want it.  Turning around and
then saying "but things won't work if it's not there" isn't exactly
supporting users who decide to remove it.

Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
be configured to connect to other databases instead, including for
globals.
Thanks,
    Stephen

Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (bruce@momjian.us) wrote:
> > I have not seen enough demand to make this a user-visible configuration.
> > We can just tell them to create a postgres database.   Frankly, they
> > would have had to _remove_ the postgres database after initdb for it not
> > to be there, and they are instructed to change nothing about the new
> > database.
> 
> Yes, they would have removed it because they didn't want it.  As I
> recall, part of the agreement to create an extra database by default was
> that it could be removed if users didn't want it.  Turning around and
> then saying "but things won't work if it's not there" isn't exactly
> supporting users who decide to remove it.

Well, you would have to remove it _after_ you did the pg_upgrade.  Right
now if you do a normal dump/restore upgrade, you also have to re-remove
the postgres database.  We don't have any mechanism to drop a database
as part of pg_dumpall's restore if it didn't exist in the old cluster.

> Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
> be configured to connect to other databases instead, including for
> globals.

Well, please show me the code, because the C code I showed you had the
'\connect postgres' string hardcoded in there.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Stephen Frost
Date:
* Bruce Momjian (bruce@momjian.us) wrote:
> Well, you would have to remove it _after_ you did the pg_upgrade.  Right
> now if you do a normal dump/restore upgrade, you also have to re-remove
> the postgres database.  We don't have any mechanism to drop a database
> as part of pg_dumpall's restore if it didn't exist in the old cluster.

Perhaps not, but it *could* be removed after the restore and all would
be well, yes?

> > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
> > be configured to connect to other databases instead, including for
> > globals.
>
> Well, please show me the code, because the C code I showed you had the
> '\connect postgres' string hardcoded in there.

I guess there's a difference between "can be used and will work
correctly, but might create some extra garbage" and "can't be used at
all".  pg_dumpall has a -l option for connecting to whatever *existing*
database you have to pull the global data, and then it'll restore into a
clean initdb'd cluster, after which you could remove postgres.
Admittedly, if you initdb the cluster, drop postgres, and then try a
restore, it would fail.  Personally, I'm not a big fan of that (why
don't we use what was passed in to -l for that..?), but, practically,
it's not that big a deal.  I don't know many folks who are going to
restore a pg_dumpall dump into an existing configuration where they've
monkied with things (that could cause all kinds of other issues anyway,
role conflicts, etc).

If I understood correctly (perhaps I didn't..), is that pg_upgrade
doesn't have the pg_dumpall equivilant of the '-l' or '--database'
option, and that's what is at issue here.
Thanks,
    Stephen

Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (bruce@momjian.us) wrote:
> > Well, you would have to remove it _after_ you did the pg_upgrade.  Right
> > now if you do a normal dump/restore upgrade, you also have to re-remove
> > the postgres database.  We don't have any mechanism to drop a database
> > as part of pg_dumpall's restore if it didn't exist in the old cluster.
> 
> Perhaps not, but it *could* be removed after the restore and all would
> be well, yes?

I am not sure how much pg_dump worries about removing system objects
during a restore --- I don't think it does that at all actually.  I
thought we did that for plpgsql, but I don't see that in the C code now,
and testing doesn't show it being removed by pg_dump.

> > > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
> > > be configured to connect to other databases instead, including for
> > > globals.
> > 
> > Well, please show me the code, because the C code I showed you had the
> > '\connect postgres' string hardcoded in there.
> 
> I guess there's a difference between "can be used and will work
> correctly, but might create some extra garbage" and "can't be used at
> all".  pg_dumpall has a -l option for connecting to whatever *existing*
> database you have to pull the global data, and then it'll restore into a
> clean initdb'd cluster, after which you could remove postgres.

Keep in mind -l might connect to a specified database to do the dump,
but it will still connect to the 'postgres' database to recreate them.

> Admittedly, if you initdb the cluster, drop postgres, and then try a
> restore, it would fail.  Personally, I'm not a big fan of that (why

Right, same with pg_upgrade.

> don't we use what was passed in to -l for that..?), but, practically,

No idea.

> it's not that big a deal.  I don't know many folks who are going to
> restore a pg_dumpall dump into an existing configuration where they've
> monkied with things (that could cause all kinds of other issues anyway,
> role conflicts, etc).
> 
> If I understood correctly (perhaps I didn't..), is that pg_upgrade
> doesn't have the pg_dumpall equivilant of the '-l' or '--database'
> option, and that's what is at issue here.

Well, I can modify pg_upgrade to connect to template1 easily (no need
for a switch) --- the problem is that pg_dumpall requires the 'postgres'
database to restore its dump file.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Stephen Frost wrote:
>> Yes, they would have removed it because they didn't want it.  As I
>> recall, part of the agreement to create an extra database by default was
>> that it could be removed if users didn't want it.  Turning around and
>> then saying "but things won't work if it's not there" isn't exactly
>> supporting users who decide to remove it.

> Well, you would have to remove it _after_ you did the pg_upgrade.

As far as the *target* cluster is concerned, I have no sympathy for
someone who messes with its contents before running pg_upgrade.
That's an RTFM matter: you're supposed to upgrade into a virgin
just-initdb'd cluster.

However, it would be nice if pg_upgrade supported transferring from a
*source* cluster that didn't have the postgres DB.

What about creating a new, single-purpose database in the source
cluster and then removing it again after we're done?
        regards, tom lane


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Stephen Frost wrote:
> >> Yes, they would have removed it because they didn't want it.  As I
> >> recall, part of the agreement to create an extra database by default was
> >> that it could be removed if users didn't want it.  Turning around and
> >> then saying "but things won't work if it's not there" isn't exactly
> >> supporting users who decide to remove it.
> 
> > Well, you would have to remove it _after_ you did the pg_upgrade.
> 
> As far as the *target* cluster is concerned, I have no sympathy for
> someone who messes with its contents before running pg_upgrade.
> That's an RTFM matter: you're supposed to upgrade into a virgin
> just-initdb'd cluster.
> 
> However, it would be nice if pg_upgrade supported transferring from a
> *source* cluster that didn't have the postgres DB.

I have this C comment in pg_upgrade about this:
*  If someone removes the 'postgres' database from the old cluster and*  the new cluster has a 'postgres' database, the
numberof databases*  will not match.  We actually could upgrade such a setup, but it would*  violate the 1-to-1 mapping
ofdatabase counts, so we throw an error*  instead.  We would detect this as a database count mismatch during*  upgrade,
butwe want to detect it during the check phase and report*  the database name.
 

Is this worth fixing?  Another problem is that pg_dumpall doesn't remove
the postgres database in the new cluster if it was not in the old one,
so they are going to end up with a postgres database in the new cluster
anyway.  I could argue that pg_upgrade is better because reports it
cannot recreate the new cluster exactly, while pg_dumpall just keeps a
postgres database that was not in the old cluster.

> What about creating a new, single-purpose database in the source
> cluster and then removing it again after we're done?

That is not a problem --- I can easily use template1.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> What about creating a new, single-purpose database in the source
>> cluster and then removing it again after we're done?
>
> That is not a problem --- I can easily use template1.

Huh?

You just said upthread that you didn't want to use template1 because
you didn't want to modify the template database.  I think the point is
that if you're doing something to the database that someone might
object to, you oughtn't be doing it to the postgres database either.
You should create a database just for pg_upgrade's use and install its
crap in there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> What about creating a new, single-purpose database in the source
> >> cluster and then removing it again after we're done?
> >
> > That is not a problem --- I can easily use template1.
> 
> Huh?
> 
> You just said upthread that you didn't want to use template1 because
> you didn't want to modify the template database.  I think the point is

I don't want to use postgres and then fall back to template1 if
necessary --- I would just use template1 always.

> that if you're doing something to the database that someone might
> object to, you oughtn't be doing it to the postgres database either.
> You should create a database just for pg_upgrade's use and install its
> crap in there.

It installs crap in all databases to set oids on system tables, for
example, so we are only creating it early in postgres (or template1) to
set auth_id.  Our sticking point now is that pg_dumpall has the
'postgres' database hardcoded for role creation.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Robert Haas wrote:
>> that if you're doing something to the database that someone might
>> object to, you oughtn't be doing it to the postgres database either.
>> You should create a database just for pg_upgrade's use and install its
>> crap in there.

> It installs crap in all databases to set oids on system tables,

It seems like you're both confusing the source and target clusters.
None of that stuff gets installed in the source, does it?
        regards, tom lane


Re: pg_upgrade if 'postgres' database is dropped

From
Magnus Hagander
Date:
<p>On Oct 28, 2011 5:22 AM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br />
><br/> > Bruce Momjian <<a href="mailto:bruce@momjian.us">bruce@momjian.us</a>> writes:<br /> > >
StephenFrost wrote:<br /> > >> Yes, they would have removed it because they didn't want it.  As I<br /> >
>>recall, part of the agreement to create an extra database by default was<br /> > >> that it could be
removedif users didn't want it.  Turning around and<br /> > >> then saying "but things won't work if it's not
there"isn't exactly<br /> > >> supporting users who decide to remove it.<br /> ><br /> > > Well, you
wouldhave to remove it _after_ you did the pg_upgrade.<br /> ><br /> > As far as the *target* cluster is
concerned,I have no sympathy for<br /> > someone who messes with its contents before running pg_upgrade.<br /> >
That'san RTFM matter: you're supposed to upgrade into a virgin<br /> > just-initdb'd cluster.<br /> ><br /> >
However,it would be nice if pg_upgrade supported transferring from a<br /> > *source* cluster that didn't have the
postgresDB.<br /> ><br /> > What about creating a new, single-purpose database in the source<br /> > cluster
andthen removing it again after we're done?<p>How about naming this newly created database "postgres"? That would make
thecode simple enough - always use the postgres database, just drop it at the end if it didn't exist in the source
cluster.<p>/Magnus <br /> 

Re: pg_upgrade if 'postgres' database is dropped

From
Magnus Hagander
Date:
<p>On Oct 28, 2011 5:19 AM, "Bruce Momjian" <<a href="mailto:bruce@momjian.us">bruce@momjian.us</a>> wrote:<br />
><br/> > Stephen Frost wrote:<br /> ><br /> > > > > Regarding pg_dumpall and pg_restore, I'm
prettysure both of those can<br /> > > > > be configured to connect to other databases instead, including
for<br/> > > > > globals.<br /> > > ><br /> > > > Well, please show me the code, because
theC code I showed you had the<br /> > > > '\connect postgres' string hardcoded in there.<br /> > ><br
/>> > I guess there's a difference between "can be used and will work<br /> > > correctly, but might create
someextra garbage" and "can't be used at<br /> > > all".  pg_dumpall has a -l option for connecting to whatever
*existing*<br/> > > database you have to pull the global data, and then it'll restore into a<br /> > >
cleaninitdb'd cluster, after which you could remove postgres.<br /> ><br /> > Keep in mind -l might connect to a
specifieddatabase to do the dump,<br /> > but it will still connect to the 'postgres' database to recreate them.<br
/>><br /> > > Admittedly, if you initdb the cluster, drop postgres, and then try a<br /> > > restore, it
wouldfail.  Personally, I'm not a big fan of that (why<br /> ><br /> > Right, same with pg_upgrade.<br /> ><br
/>> > don't we use what was passed in to -l for that..?), but, practically,<br /> ><br /> > No
idea.<p>Chicken/egg?If we did that, the pg_dumpall dump could no longer be loaded into an empty cluster since the db it
wantedto talk to didn't exist yet. And restoring into an empty cluster has to be the main use for pg_dumpall after
all....<p>/Magnus  

Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Robert Haas wrote:
> >> that if you're doing something to the database that someone might
> >> object to, you oughtn't be doing it to the postgres database either.
> >> You should create a database just for pg_upgrade's use and install its
> >> crap in there.
>
> > It installs crap in all databases to set oids on system tables,
>
> It seems like you're both confusing the source and target clusters.
> None of that stuff gets installed in the source, does it?

Right, only in the target.

Let me summarize:

The postgres database is required in the source because pg_upgrade likes
to have a 1-1 database mapping of old and new clusters.  This can be
changed, but it makes pg_upgrade slightly more complex.

The postgres database is required on the target because pg_upgrade
creates the support functions first in that database.  That can be
changed, but pg_dumpall restores roles in the postgres database by
default, not template1.  Again, this can be changed.

Because of this 'postgres' new cluster requirement, you can't just
delete the postgres database from the new cluster and run pg_upgrade.

Tom wants pg_upgrade to work if the old cluster doesn't have a postgres
database.  I see two solutions --- either remove the 1-1 mapping of
old/new databases, or remove the pg_upgrade and pg_dumpall dependence on
the postgres database and tell users to remove postgres from the new
cluster before the upgrade.

They already get a clear error message about the problem, which I think
is why we haven't seen more problem reports.  My guess is they are just
creating the postgres database on the old cluster before the upgrade
after they get the error.

I have applied the attached patch to at least clarify that they need the
postgres database in the old cluster, rather than them trying to remove
the postgres database from the new cluster.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 5b9b4cd..e400814
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*************** check_old_cluster_has_new_cluster_dbs(vo
*** 403,410 ****
                         new_cluster.dbarr.dbs[new_dbnum].db_name) == 0)
                  break;
          if (old_dbnum == old_cluster.dbarr.ndbs)
!             pg_log(PG_FATAL, "New cluster database \"%s\" does not exist in the old cluster\n",
!                    new_cluster.dbarr.dbs[new_dbnum].db_name);
      }
  }

--- 403,415 ----
                         new_cluster.dbarr.dbs[new_dbnum].db_name) == 0)
                  break;
          if (old_dbnum == old_cluster.dbarr.ndbs)
!         {
!             if (strcmp(new_cluster.dbarr.dbs[new_dbnum].db_name, "postgres") == 0)
!                 pg_log(PG_FATAL, "The \"postgres\" database must exist in the old cluster\n");
!             else
!                 pg_log(PG_FATAL, "New cluster database \"%s\" does not exist in the old cluster\n",
!                        new_cluster.dbarr.dbs[new_dbnum].db_name);
!         }
      }
  }


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> On Oct 28, 2011 5:22 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> >
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Stephen Frost wrote:
> > >> Yes, they would have removed it because they didn't want it.  As I
> > >> recall, part of the agreement to create an extra database by default
> was
> > >> that it could be removed if users didn't want it.  Turning around and
> > >> then saying "but things won't work if it's not there" isn't exactly
> > >> supporting users who decide to remove it.
> >
> > > Well, you would have to remove it _after_ you did the pg_upgrade.
> >
> > As far as the *target* cluster is concerned, I have no sympathy for
> > someone who messes with its contents before running pg_upgrade.
> > That's an RTFM matter: you're supposed to upgrade into a virgin
> > just-initdb'd cluster.
> >
> > However, it would be nice if pg_upgrade supported transferring from a
> > *source* cluster that didn't have the postgres DB.
> >
> > What about creating a new, single-purpose database in the source
> > cluster and then removing it again after we're done?
> 
> How about naming this newly created database "postgres"? That would make the
> code simple enough - always use the postgres database, just drop it at the
> end if it didn't exist in the source cluster.

Yes, that would work, but see my summarization email on this.  Using
template1 is not a problem for pg_upgrade, it is the modifications to
pg_dumpall that are an issue.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> On Oct 28, 2011 5:19 AM, "Bruce Momjian" <bruce@momjian.us> wrote:
> >
> > Stephen Frost wrote:
> >
> > > > > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those
> can
> > > > > be configured to connect to other databases instead, including for
> > > > > globals.
> > > >
> > > > Well, please show me the code, because the C code I showed you had the
> > > > '\connect postgres' string hardcoded in there.
> > >
> > > I guess there's a difference between "can be used and will work
> > > correctly, but might create some extra garbage" and "can't be used at
> > > all".  pg_dumpall has a -l option for connecting to whatever *existing*
> > > database you have to pull the global data, and then it'll restore into a
> > > clean initdb'd cluster, after which you could remove postgres.
> >
> > Keep in mind -l might connect to a specified database to do the dump,
> > but it will still connect to the 'postgres' database to recreate them.
> >
> > > Admittedly, if you initdb the cluster, drop postgres, and then try a
> > > restore, it would fail.  Personally, I'm not a big fan of that (why
> >
> > Right, same with pg_upgrade.
> >
> > > don't we use what was passed in to -l for that..?), but, practically,
> >
> > No idea.
> 
> Chicken/egg? If we did that, the pg_dumpall dump could no longer be loaded
> into an empty cluster since the db it wanted to talk to didn't exist yet.
> And restoring into an empty cluster has to be the main use for pg_dumpall
> after all....

True.  My assumption was that they had created some special database
before they did the pg_dumpall restore, but it would be odd because the
database would have been hard-coded into the dump, which isn't good.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Fri, Oct 28, 2011 at 8:12 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Yes, that would work, but see my summarization email on this.  Using
> template1 is not a problem for pg_upgrade, it is the modifications to
> pg_dumpall that are an issue.

I just did a bit of testing on this.  It appears that pg_dumpall, if
given a cluster containing no postgres database, will happily try to
connect to template1 instead.  If template1 isn't available either,
you can use "-l SOMEDBNAME" to specify the name of another database to
connect to instead.  So there is infinite flexibility there.

But regardless of which database it uses to *generate* the dump, the
dump itself will *always* contain this, right at the very beginning:

\connect postgres

That line is in fact hard-coded as a literal string in pg_dumpall.c.
It seems like the easiest fix here might be to just remove that line
from the dump, because AFAICS it's completely pointless.  During the
time for which that setting is in effect, we're just restoring
globals, so it shouldn't matter which database we're connected to;
only that we have a valid connection.  So trying to switch the
connection from whatever the user is connected to currently to
postgres doesn't accomplish anything useful, but it does make it
possible for dump restoration to unnecessarily fail.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Thom Brown
Date:
On 28 October 2011 14:28, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 28, 2011 at 8:12 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> Yes, that would work, but see my summarization email on this.  Using
>> template1 is not a problem for pg_upgrade, it is the modifications to
>> pg_dumpall that are an issue.
>
> I just did a bit of testing on this.  It appears that pg_dumpall, if
> given a cluster containing no postgres database, will happily try to
> connect to template1 instead.  If template1 isn't available either,
> you can use "-l SOMEDBNAME" to specify the name of another database to
> connect to instead.  So there is infinite flexibility there.
>
> But regardless of which database it uses to *generate* the dump, the
> dump itself will *always* contain this, right at the very beginning:
>
> \connect postgres
>
> That line is in fact hard-coded as a literal string in pg_dumpall.c.
> It seems like the easiest fix here might be to just remove that line
> from the dump, because AFAICS it's completely pointless.  During the
> time for which that setting is in effect, we're just restoring
> globals, so it shouldn't matter which database we're connected to;
> only that we have a valid connection.  So trying to switch the
> connection from whatever the user is connected to currently to
> postgres doesn't accomplish anything useful, but it does make it
> possible for dump restoration to unnecessarily fail.

This was the kind of qualm I had with createdb, which I submitted a
patch for earlier this year:
http://archives.postgresql.org/pgsql-hackers/2011-03/msg00999.php

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Fri, Oct 28, 2011 at 8:12 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Yes, that would work, but see my summarization email on this. ?Using
> > template1 is not a problem for pg_upgrade, it is the modifications to
> > pg_dumpall that are an issue.
> 
> I just did a bit of testing on this.  It appears that pg_dumpall, if
> given a cluster containing no postgres database, will happily try to
> connect to template1 instead.  If template1 isn't available either,
> you can use "-l SOMEDBNAME" to specify the name of another database to
> connect to instead.  So there is infinite flexibility there.
> 
> But regardless of which database it uses to *generate* the dump, the
> dump itself will *always* contain this, right at the very beginning:
> 
> \connect postgres
> 
> That line is in fact hard-coded as a literal string in pg_dumpall.c.
> It seems like the easiest fix here might be to just remove that line
> from the dump, because AFAICS it's completely pointless.  During the
> time for which that setting is in effect, we're just restoring
> globals, so it shouldn't matter which database we're connected to;
> only that we have a valid connection.  So trying to switch the
> connection from whatever the user is connected to currently to
> postgres doesn't accomplish anything useful, but it does make it
> possible for dump restoration to unnecessarily fail.

If you remove that line, I can modify pg_upgrade to use template1
instead of postgres, and then the user should just remove the postgres
database from the new cluster before the upgrade --- we can give them a
clear error message on that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Fri, Oct 28, 2011 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Fri, Oct 28, 2011 at 8:12 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Yes, that would work, but see my summarization email on this. ?Using
>> > template1 is not a problem for pg_upgrade, it is the modifications to
>> > pg_dumpall that are an issue.
>>
>> I just did a bit of testing on this.  It appears that pg_dumpall, if
>> given a cluster containing no postgres database, will happily try to
>> connect to template1 instead.  If template1 isn't available either,
>> you can use "-l SOMEDBNAME" to specify the name of another database to
>> connect to instead.  So there is infinite flexibility there.
>>
>> But regardless of which database it uses to *generate* the dump, the
>> dump itself will *always* contain this, right at the very beginning:
>>
>> \connect postgres
>>
>> That line is in fact hard-coded as a literal string in pg_dumpall.c.
>> It seems like the easiest fix here might be to just remove that line
>> from the dump, because AFAICS it's completely pointless.  During the
>> time for which that setting is in effect, we're just restoring
>> globals, so it shouldn't matter which database we're connected to;
>> only that we have a valid connection.  So trying to switch the
>> connection from whatever the user is connected to currently to
>> postgres doesn't accomplish anything useful, but it does make it
>> possible for dump restoration to unnecessarily fail.
>
> If you remove that line,

I'm happy to do that, unless someone can see a hole in my logic.

> I can modify pg_upgrade to use template1
> instead of postgres, and then the user should just remove the postgres
> database from the new cluster before the upgrade --- we can give them a
> clear error message on that.

What I would prefer is to have the upgrade succeed, and just ignore
the existence of a postgres database in the new cluster.  Maybe give
the user a notice and let them decide whether they wish to take any
action.  I understand that failing is probably less code, but IMHO one
of the biggest problems with pg_upgrade is that it's too fragile:
there are too many seemingly innocent things that can make it croak
(which isn't good, when you consider that anyone using pg_upgrade is
probably in a hurry to get the upgrade done and the database back
on-line).  It seems like this is an opportunity to get rid of one of
those unnecessary failure cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> >> But regardless of which database it uses to *generate* the dump, the
> >> dump itself will *always* contain this, right at the very beginning:
> >>
> >> \connect postgres
> >>
> >> That line is in fact hard-coded as a literal string in pg_dumpall.c.
> >> It seems like the easiest fix here might be to just remove that line
> >> from the dump, because AFAICS it's completely pointless. ?During the
> >> time for which that setting is in effect, we're just restoring
> >> globals, so it shouldn't matter which database we're connected to;
> >> only that we have a valid connection. ?So trying to switch the
> >> connection from whatever the user is connected to currently to
> >> postgres doesn't accomplish anything useful, but it does make it
> >> possible for dump restoration to unnecessarily fail.
> >
> > If you remove that line,
> 
> I'm happy to do that, unless someone can see a hole in my logic.
> 
> > I can modify pg_upgrade to use template1
> > instead of postgres, and then the user should just remove the postgres
> > database from the new cluster before the upgrade --- we can give them a
> > clear error message on that.
> 
> What I would prefer is to have the upgrade succeed, and just ignore
> the existence of a postgres database in the new cluster.  Maybe give
> the user a notice and let them decide whether they wish to take any
> action.  I understand that failing is probably less code, but IMHO one
> of the biggest problems with pg_upgrade is that it's too fragile:
> there are too many seemingly innocent things that can make it croak
> (which isn't good, when you consider that anyone using pg_upgrade is
> probably in a hurry to get the upgrade done and the database back
> on-line).  It seems like this is an opportunity to get rid of one of
> those unnecessary failure cases.

OK, then the simplest fix, once you modify pg_dumpall, would be to
modify pg_upgrade to remove reference to the postgres database in the
new cluster if it doesn't exist in the old one.  That would allow
pg_upgrade to maintain a 1-1 matching of databases in the old and new
cluster --- it allows the change to be locallized without affecting much
code.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> action.  I understand that failing is probably less code, but IMHO one
> of the biggest problems with pg_upgrade is that it's too fragile:
> there are too many seemingly innocent things that can make it croak
> (which isn't good, when you consider that anyone using pg_upgrade is
> probably in a hurry to get the upgrade done and the database back
> on-line).  It seems like this is an opportunity to get rid of one of
> those unnecessary failure cases.

FYI, the original design goal of pg_upgrade was to be do reliable
upgrades and fail at the hint of any inconsistency.  Seems it is time to
adjust its goals.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Fri, Oct 28, 2011 at 10:07 AM, Bruce Momjian <bruce@momjian.us> wrote:
> OK, then the simplest fix, once you modify pg_dumpall, would be to
> modify pg_upgrade to remove reference to the postgres database in the
> new cluster if it doesn't exist in the old one.  That would allow
> pg_upgrade to maintain a 1-1 matching of databases in the old and new
> cluster --- it allows the change to be locallized without affecting much
> code.

That sounds just fine.  +1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Fri, Oct 28, 2011 at 10:09 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> action.  I understand that failing is probably less code, but IMHO one
>> of the biggest problems with pg_upgrade is that it's too fragile:
>> there are too many seemingly innocent things that can make it croak
>> (which isn't good, when you consider that anyone using pg_upgrade is
>> probably in a hurry to get the upgrade done and the database back
>> on-line).  It seems like this is an opportunity to get rid of one of
>> those unnecessary failure cases.
>
> FYI, the original design goal of pg_upgrade was to be do reliable
> upgrades and fail at the hint of any inconsistency.  Seems it is time to
> adjust its goals.

We definitely don't want it to do anything that could compromise data
integrity.  But in this case there seems no risk of that, so it seems
we can have our cake and eat it, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Fri, Oct 28, 2011 at 10:07 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > OK, then the simplest fix, once you modify pg_dumpall, would be to
> > modify pg_upgrade to remove reference to the postgres database in the
> > new cluster if it doesn't exist in the old one. ?That would allow
> > pg_upgrade to maintain a 1-1 matching of databases in the old and new
> > cluster --- it allows the change to be locallized without affecting much
> > code.
> 
> That sounds just fine.  +1.

FYI, I don't want to modify pg_dumpall myself because I didn't want to
have pg_upgrade forcing a pg_dumpall change that applies to
non-binary-upgrade dumps.  pg_dumpall is too important.  I am fine if
someone else does it, though.  :-)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Fri, Oct 28, 2011 at 10:09 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> action. ?I understand that failing is probably less code, but IMHO one
> >> of the biggest problems with pg_upgrade is that it's too fragile:
> >> there are too many seemingly innocent things that can make it croak
> >> (which isn't good, when you consider that anyone using pg_upgrade is
> >> probably in a hurry to get the upgrade done and the database back
> >> on-line). ?It seems like this is an opportunity to get rid of one of
> >> those unnecessary failure cases.
> >
> > FYI, the original design goal of pg_upgrade was to be do reliable
> > upgrades and fail at the hint of any inconsistency. ?Seems it is time to
> > adjust its goals.
> 
> We definitely don't want it to do anything that could compromise data
> integrity.  But in this case there seems no risk of that, so it seems
> we can have our cake and eat it, too.

Agreed.  I was extra cautious.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Robert Haas wrote:
> > On Fri, Oct 28, 2011 at 10:07 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > > OK, then the simplest fix, once you modify pg_dumpall, would be to
> > > modify pg_upgrade to remove reference to the postgres database in the
> > > new cluster if it doesn't exist in the old one. ?That would allow
> > > pg_upgrade to maintain a 1-1 matching of databases in the old and new
> > > cluster --- it allows the change to be locallized without affecting much
> > > code.
> > 
> > That sounds just fine.  +1.
> 
> FYI, I don't want to modify pg_dumpall myself because I didn't want to
> have pg_upgrade forcing a pg_dumpall change that applies to
> non-binary-upgrade dumps.  pg_dumpall is too important.  I am fine if
> someone else does it, though.  :-)

If you want crazy, you could suppress the "\connect postgres" line only
in --binary-upgrade dumps, but that seems to error-prone because then
only --binary-upgrade dumps are exercising that behavior.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Fri, Oct 28, 2011 at 10:16 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Fri, Oct 28, 2011 at 10:07 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> > OK, then the simplest fix, once you modify pg_dumpall, would be to
>> > modify pg_upgrade to remove reference to the postgres database in the
>> > new cluster if it doesn't exist in the old one. ?That would allow
>> > pg_upgrade to maintain a 1-1 matching of databases in the old and new
>> > cluster --- it allows the change to be locallized without affecting much
>> > code.
>>
>> That sounds just fine.  +1.
>
> FYI, I don't want to modify pg_dumpall myself because I didn't want to
> have pg_upgrade forcing a pg_dumpall change that applies to
> non-binary-upgrade dumps.  pg_dumpall is too important.  I am fine if
> someone else does it, though.  :-)

OK, done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Fri, Oct 28, 2011 at 10:18 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Bruce Momjian wrote:
>> Robert Haas wrote:
>> > On Fri, Oct 28, 2011 at 10:07 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> > > OK, then the simplest fix, once you modify pg_dumpall, would be to
>> > > modify pg_upgrade to remove reference to the postgres database in the
>> > > new cluster if it doesn't exist in the old one. ?That would allow
>> > > pg_upgrade to maintain a 1-1 matching of databases in the old and new
>> > > cluster --- it allows the change to be locallized without affecting much
>> > > code.
>> >
>> > That sounds just fine.  +1.
>>
>> FYI, I don't want to modify pg_dumpall myself because I didn't want to
>> have pg_upgrade forcing a pg_dumpall change that applies to
>> non-binary-upgrade dumps.  pg_dumpall is too important.  I am fine if
>> someone else does it, though.  :-)
>
> If you want crazy, you could suppress the "\connect postgres" line only
> in --binary-upgrade dumps, but that seems to error-prone because then
> only --binary-upgrade dumps are exercising that behavior.

Also, then it would still be doing something silly in the non
--binary-upgrade case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Fri, Oct 28, 2011 at 10:16 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Fri, Oct 28, 2011 at 10:07 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > OK, then the simplest fix, once you modify pg_dumpall, would be to
> >> > modify pg_upgrade to remove reference to the postgres database in the
> >> > new cluster if it doesn't exist in the old one. ?That would allow
> >> > pg_upgrade to maintain a 1-1 matching of databases in the old and new
> >> > cluster --- it allows the change to be locallized without affecting much
> >> > code.
> >>
> >> That sounds just fine. ?+1.
> >
> > FYI, I don't want to modify pg_dumpall myself because I didn't want to
> > have pg_upgrade forcing a pg_dumpall change that applies to
> > non-binary-upgrade dumps. ?pg_dumpall is too important. ?I am fine if
> > someone else does it, though. ?:-)
>
> OK, done.

OK, the attached, applied patch removes the pg_upgrade dependency on the
'postgres' database existing in the new cluster.  However, vacuumdb,
used by pg_upgrade, still has this dependency:

        conn = connectDatabase("postgres", host, port, username,
        prompt_password, progname);

In fact, all the /scripts binaries use the postgres database, except for
createdb/dropdb, which has this heuristic:

    /*
     * Connect to the 'postgres' database by default, except have the
     * 'postgres' user use 'template1' so he can create the 'postgres'
     * database.
     */
    conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
                           host, port, username, prompt_password, progname);

This makes sense because you might be creating or dropping the postgres
database.  Do we want these to have smarter database selection code?

I will now work on code to allow the old cluster to optionally not have
a postgres database.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index 273561e..12df463
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*************** static void set_frozenxids(void);
*** 52,60 ****
  static void setup(char *argv0, bool live_check);
  static void cleanup(void);

- /* This is the database used by pg_dumpall to restore global tables */
- #define GLOBAL_DUMP_DB    "postgres"
-
  ClusterInfo old_cluster,
              new_cluster;
  OSInfo        os_info;
--- 52,57 ----
*************** prepare_new_databases(void)
*** 233,242 ****
      prep_status("Creating databases in the new cluster");

      /*
!      * Install support functions in the global-restore database to preserve
!      * pg_authid.oid.
       */
!     install_support_functions_in_new_db(GLOBAL_DUMP_DB);

      /*
       * We have to create the databases first so we can install support
--- 230,241 ----
      prep_status("Creating databases in the new cluster");

      /*
!      * Install support functions in the global-object restore database to
!      * preserve pg_authid.oid.  pg_dumpall uses 'template0' as its template
!      * database so objects we add into 'template1' are not propogated.  They
!      * are removed on pg_upgrade exit.
       */
!     install_support_functions_in_new_db("template1");

      /*
       * We have to create the databases first so we can install support
*************** create_new_objects(void)
*** 270,276 ****
          DbInfo       *new_db = &new_cluster.dbarr.dbs[dbnum];

          /* skip db we already installed */
!         if (strcmp(new_db->db_name, GLOBAL_DUMP_DB) != 0)
              install_support_functions_in_new_db(new_db->db_name);
      }
      check_ok();
--- 269,275 ----
          DbInfo       *new_db = &new_cluster.dbarr.dbs[dbnum];

          /* skip db we already installed */
!         if (strcmp(new_db->db_name, "template1") != 0)
              install_support_functions_in_new_db(new_db->db_name);
      }
      check_ok();

Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Fri, Oct 28, 2011 at 9:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> OK, the attached, applied patch removes the pg_upgrade dependency on the
> 'postgres' database existing in the new cluster.  However, vacuumdb,
> used by pg_upgrade, still has this dependency:
>
>            conn = connectDatabase("postgres", host, port, username,
>                prompt_password, progname);
>
> In fact, all the /scripts binaries use the postgres database, except for
> createdb/dropdb, which has this heuristic:
>
>    /*
>     * Connect to the 'postgres' database by default, except have the
>     * 'postgres' user use 'template1' so he can create the 'postgres'
>     * database.
>     */
>    conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
>                           host, port, username, prompt_password, progname);
>
> This makes sense because you might be creating or dropping the postgres
> database.  Do we want these to have smarter database selection code?

Well, I suppose as long as we're cleaning this up, we might as well be
thorough, so, sure, why not?  I think the algorithm pg_dumpall uses is
pretty sensible: let the user specify the database to use if they so
desire; if not, try postgres first and then template1.  I think we
could stick some logic for that in common.c which could be shared by
clusterdb, createdb, dropdb, dropuser, reindexdb, and vacuumdb.

However, we need to rethink the flag to be used for this: pg_dumpall
uses -l, but many of the other utilities already use that for some
other purpose, and it's not exactly mnemonic anyway.  "-d" for
database could work, but that's also in use in some places, and
furthermore somewhat confusing since many if not all of these
utilities have an option to operate on a single database only, and you
might think that -d would specify the database to operate on, rather
than the one to be used to get the list of databases.  pgAdmin uses
the term "maintenance database" to refer to a database to be used when
none is explicitly specified, and I think that's fairly clear
terminology.  So I propose that we add a --maintenance-db option (with
no short form, since this is a relatively obscure need) to the tools
listed above.  The tools will pass the associated value (or NULL if
the option is not specified) to the above-mentioned routine in
common.c, which will do the rest.

If nobody objects, I'll go do that.  Hopefully that should be enough
to put this problem to bed more or less permanently.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Tue, Nov 1, 2011 at 2:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> >> > It turns out there was only one place that expected a 1-1 mapping of old
> >> >> > and new databases (file transfer), so I just modified that code to allow
> >> >> > skipping a database in the new cluster that didn't exist in the old
> >> >> > cluster.
> >> >>
> >> >> Urp. ?But that means that if someone has any data in that database,
> >> >> pg_upgrade will basically eat it. ?That does not seem like a step
> >> >> forward.
> >> >
> >> > Please clarify? ?We already check that all the new cluster databases are
> >> > empty, so we are effectively skipping the transfering of files into
> >> > empty new cluster databases. ?It processes all old cluster databases and
> >> > forces a new cluster match --- it is only empty new cluster database
> >> > that are being skipped.
> >>
> >> Aren't you saying that if a postgres database exists in the old
> >> database (and potentially contains data) but is missing in the new
> >> database, we'll just fail to migrate it?
> >
> > No, the reverse. ?If the 'postgres' database exists in the new cluster,
> > but not in the old, we allow it to upgrade (we skip over the 'postgres'
> > database in the new cluster use the loop in the patch).
> 
> Oh, OK.  That seems fine - in fact, that seems perfect.
> 
> > Unless I am missing something. ?Did you see something odd in the patch
> > or in my wording?
> 
> Your wording confused me, but on further review I think I'm just
> easily confused.

The concept is pretty confusing and I had to think a while before I came
up with this approach.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Tue, Nov 1, 2011 at 1:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Bruce Momjian wrote:
> >> > What I would prefer is to have the upgrade succeed, and just ignore
> >> > the existence of a postgres database in the new cluster. ?Maybe give
> >> > the user a notice and let them decide whether they wish to take any
> >> > action. ?I understand that failing is probably less code, but IMHO one
> >> > of the biggest problems with pg_upgrade is that it's too fragile:
> >> > there are too many seemingly innocent things that can make it croak
> >> > (which isn't good, when you consider that anyone using pg_upgrade is
> >> > probably in a hurry to get the upgrade done and the database back
> >> > on-line). ?It seems like this is an opportunity to get rid of one of
> >> > those unnecessary failure cases.
> >>
> >> OK, then the simplest fix, once you modify pg_dumpall, would be to
> >> modify pg_upgrade to remove reference to the postgres database in the
> >> new cluster if it doesn't exist in the old one. ?That would allow
> >> pg_upgrade to maintain a 1-1 matching of databases in the old and new
> >> cluster --- it allows the change to be locallized without affecting much
> >> code.
> >
> > I fixed this a different way. ?I originally thought I could skip over
> > the 'postgres' database in the new cluster if it didn't exist in the old
> > cluster, but we have do things like check it is empty, so that was going
> > to be awkward.
> >
> > It turns out there was only one place that expected a 1-1 mapping of old
> > and new databases (file transfer), so I just modified that code to allow
> > skipping a database in the new cluster that didn't exist in the old
> > cluster.
> 
> Urp.  But that means that if someone has any data in that database,
> pg_upgrade will basically eat it.  That does not seem like a step
> forward.

Please clarify?  We already check that all the new cluster databases are
empty, so we are effectively skipping the transfering of files into
empty new cluster databases.  It processes all old cluster databases and
forces a new cluster match --- it is only empty new cluster database
that are being skipped.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> >> > It turns out there was only one place that expected a 1-1 mapping of old
> >> > and new databases (file transfer), so I just modified that code to allow
> >> > skipping a database in the new cluster that didn't exist in the old
> >> > cluster.
> >>
> >> Urp. ?But that means that if someone has any data in that database,
> >> pg_upgrade will basically eat it. ?That does not seem like a step
> >> forward.
> >
> > Please clarify? ?We already check that all the new cluster databases are
> > empty, so we are effectively skipping the transfering of files into
> > empty new cluster databases. ?It processes all old cluster databases and
> > forces a new cluster match --- it is only empty new cluster database
> > that are being skipped.
> 
> Aren't you saying that if a postgres database exists in the old
> database (and potentially contains data) but is missing in the new
> database, we'll just fail to migrate it?

No, the reverse.  If the 'postgres' database exists in the new cluster,
but not in the old, we allow it to upgrade (we skip over the 'postgres'
database in the new cluster use the loop in the patch).

Unless I am missing something.  Did you see something odd in the patch
or in my wording?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > What I would prefer is to have the upgrade succeed, and just ignore
> > the existence of a postgres database in the new cluster.  Maybe give
> > the user a notice and let them decide whether they wish to take any
> > action.  I understand that failing is probably less code, but IMHO one
> > of the biggest problems with pg_upgrade is that it's too fragile:
> > there are too many seemingly innocent things that can make it croak
> > (which isn't good, when you consider that anyone using pg_upgrade is
> > probably in a hurry to get the upgrade done and the database back
> > on-line).  It seems like this is an opportunity to get rid of one of
> > those unnecessary failure cases.
>
> OK, then the simplest fix, once you modify pg_dumpall, would be to
> modify pg_upgrade to remove reference to the postgres database in the
> new cluster if it doesn't exist in the old one.  That would allow
> pg_upgrade to maintain a 1-1 matching of databases in the old and new
> cluster --- it allows the change to be locallized without affecting much
> code.

I fixed this a different way.  I originally thought I could skip over
the 'postgres' database in the new cluster if it didn't exist in the old
cluster, but we have do things like check it is empty, so that was going
to be awkward.

It turns out there was only one place that expected a 1-1 mapping of old
and new databases (file transfer), so I just modified that code to allow
skipping a database in the new cluster that didn't exist in the old
cluster.

Attached patch applied.  This allows an upgrade if the 'postgres'
database is missing from the old cluster.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index e400814..d32a84c
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
***************
*** 14,20 ****

  static void set_locale_and_encoding(ClusterInfo *cluster);
  static void check_new_cluster_is_empty(void);
- static void check_old_cluster_has_new_cluster_dbs(void);
  static void check_locale_and_encoding(ControlData *oldctrl,
                            ControlData *newctrl);
  static void check_is_super_user(ClusterInfo *cluster);
--- 14,19 ----
*************** check_new_cluster(void)
*** 127,133 ****

      check_new_cluster_is_empty();
      check_for_prepared_transactions(&new_cluster);
-     check_old_cluster_has_new_cluster_dbs();

      check_loadable_libraries();

--- 126,131 ----
*************** check_new_cluster_is_empty(void)
*** 382,420 ****


  /*
-  *    If someone removes the 'postgres' database from the old cluster and
-  *    the new cluster has a 'postgres' database, the number of databases
-  *    will not match.  We actually could upgrade such a setup, but it would
-  *    violate the 1-to-1 mapping of database counts, so we throw an error
-  *    instead.  We would detect this as a database count mismatch during
-  *    upgrade, but we want to detect it during the check phase and report
-  *    the database name.
-  */
- static void
- check_old_cluster_has_new_cluster_dbs(void)
- {
-     int            old_dbnum,
-                 new_dbnum;
-
-     for (new_dbnum = 0; new_dbnum < new_cluster.dbarr.ndbs; new_dbnum++)
-     {
-         for (old_dbnum = 0; old_dbnum < old_cluster.dbarr.ndbs; old_dbnum++)
-             if (strcmp(old_cluster.dbarr.dbs[old_dbnum].db_name,
-                        new_cluster.dbarr.dbs[new_dbnum].db_name) == 0)
-                 break;
-         if (old_dbnum == old_cluster.dbarr.ndbs)
-         {
-             if (strcmp(new_cluster.dbarr.dbs[new_dbnum].db_name, "postgres") == 0)
-                 pg_log(PG_FATAL, "The \"postgres\" database must exist in the old cluster\n");
-             else
-                 pg_log(PG_FATAL, "New cluster database \"%s\" does not exist in the old cluster\n",
-                        new_cluster.dbarr.dbs[new_dbnum].db_name);
-         }
-     }
- }
-
-
- /*
   * create_script_for_old_cluster_deletion()
   *
   *    This is particularly useful for tablespace deletion.
--- 380,385 ----
*************** create_script_for_old_cluster_deletion(c
*** 462,468 ****
                  fprintf(script, RM_CMD " %s%s/PG_VERSION\n",
                   os_info.tablespaces[tblnum], old_cluster.tablespace_suffix);

!             for (dbnum = 0; dbnum < new_cluster.dbarr.ndbs; dbnum++)
              {
                  fprintf(script, RMDIR_CMD " %s%s/%d\n",
                    os_info.tablespaces[tblnum], old_cluster.tablespace_suffix,
--- 427,433 ----
                  fprintf(script, RM_CMD " %s%s/PG_VERSION\n",
                   os_info.tablespaces[tblnum], old_cluster.tablespace_suffix);

!             for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
              {
                  fprintf(script, RMDIR_CMD " %s%s/%d\n",
                    os_info.tablespaces[tblnum], old_cluster.tablespace_suffix,
diff --git a/contrib/pg_upgrade/function.c b/contrib/pg_upgrade/function.c
new file mode 100644
index 0f80089..b154f03
*** a/contrib/pg_upgrade/function.c
--- b/contrib/pg_upgrade/function.c
*************** get_loadable_libraries(void)
*** 132,139 ****
      int            totaltups;
      int            dbnum;

!     ress = (PGresult **)
!         pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
      totaltups = 0;

      /* Fetch all library names, removing duplicates within each DB */
--- 132,138 ----
      int            totaltups;
      int            dbnum;

!     ress = (PGresult **) pg_malloc(old_cluster.dbarr.ndbs * sizeof(PGresult *));
      totaltups = 0;

      /* Fetch all library names, removing duplicates within each DB */
diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c
new file mode 100644
index 1aefd33..ad893be
*** a/contrib/pg_upgrade/relfilenode.c
--- b/contrib/pg_upgrade/relfilenode.c
*************** const char *
*** 34,55 ****
  transfer_all_new_dbs(DbInfoArr *old_db_arr,
                     DbInfoArr *new_db_arr, char *old_pgdata, char *new_pgdata)
  {
!     int            dbnum;
      const char *msg = NULL;

      prep_status("Restoring user relation files\n");

!     if (old_db_arr->ndbs != new_db_arr->ndbs)
!         pg_log(PG_FATAL, "old and new clusters have a different number of databases\n");
!
!     for (dbnum = 0; dbnum < old_db_arr->ndbs; dbnum++)
      {
!         DbInfo       *old_db = &old_db_arr->dbs[dbnum];
!         DbInfo       *new_db = &new_db_arr->dbs[dbnum];
          FileNameMap *mappings;
          int            n_maps;
          pageCnvCtx *pageConverter = NULL;

          if (strcmp(old_db->db_name, new_db->db_name) != 0)
              pg_log(PG_FATAL, "old and new databases have different names: old \"%s\", new \"%s\"\n",
                     old_db->db_name, new_db->db_name);
--- 34,63 ----
  transfer_all_new_dbs(DbInfoArr *old_db_arr,
                     DbInfoArr *new_db_arr, char *old_pgdata, char *new_pgdata)
  {
!     int            old_dbnum, new_dbnum;
      const char *msg = NULL;

      prep_status("Restoring user relation files\n");

!     /* Scan the old cluster databases and transfer their files */
!     for (old_dbnum = new_dbnum = 0;
!          old_dbnum < old_db_arr->ndbs && new_dbnum < new_db_arr->ndbs;
!          old_dbnum++, new_dbnum++)
      {
!         DbInfo       *old_db = &old_db_arr->dbs[old_dbnum];
!         DbInfo       *new_db = &new_db_arr->dbs[new_dbnum];
          FileNameMap *mappings;
          int            n_maps;
          pageCnvCtx *pageConverter = NULL;

+         /*
+          *    Advance past any databases that exist in the new cluster
+          *    but not in the old, e.g. "postgres".
+          */
+         while (strcmp(old_db->db_name, new_db->db_name) != 0 &&
+                new_dbnum < new_db_arr->ndbs)
+             new_db = &new_db_arr->dbs[++new_dbnum];
+
          if (strcmp(old_db->db_name, new_db->db_name) != 0)
              pg_log(PG_FATAL, "old and new databases have different names: old \"%s\", new \"%s\"\n",
                     old_db->db_name, new_db->db_name);

Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Tue, Nov 1, 2011 at 2:36 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Tue, Nov 1, 2011 at 1:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Bruce Momjian wrote:
>> >> > What I would prefer is to have the upgrade succeed, and just ignore
>> >> > the existence of a postgres database in the new cluster. ?Maybe give
>> >> > the user a notice and let them decide whether they wish to take any
>> >> > action. ?I understand that failing is probably less code, but IMHO one
>> >> > of the biggest problems with pg_upgrade is that it's too fragile:
>> >> > there are too many seemingly innocent things that can make it croak
>> >> > (which isn't good, when you consider that anyone using pg_upgrade is
>> >> > probably in a hurry to get the upgrade done and the database back
>> >> > on-line). ?It seems like this is an opportunity to get rid of one of
>> >> > those unnecessary failure cases.
>> >>
>> >> OK, then the simplest fix, once you modify pg_dumpall, would be to
>> >> modify pg_upgrade to remove reference to the postgres database in the
>> >> new cluster if it doesn't exist in the old one. ?That would allow
>> >> pg_upgrade to maintain a 1-1 matching of databases in the old and new
>> >> cluster --- it allows the change to be locallized without affecting much
>> >> code.
>> >
>> > I fixed this a different way. ?I originally thought I could skip over
>> > the 'postgres' database in the new cluster if it didn't exist in the old
>> > cluster, but we have do things like check it is empty, so that was going
>> > to be awkward.
>> >
>> > It turns out there was only one place that expected a 1-1 mapping of old
>> > and new databases (file transfer), so I just modified that code to allow
>> > skipping a database in the new cluster that didn't exist in the old
>> > cluster.
>>
>> Urp.  But that means that if someone has any data in that database,
>> pg_upgrade will basically eat it.  That does not seem like a step
>> forward.
>
> Please clarify?  We already check that all the new cluster databases are
> empty, so we are effectively skipping the transfering of files into
> empty new cluster databases.  It processes all old cluster databases and
> forces a new cluster match --- it is only empty new cluster database
> that are being skipped.

Aren't you saying that if a postgres database exists in the old
database (and potentially contains data) but is missing in the new
database, we'll just fail to migrate it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Tue, Nov 1, 2011 at 2:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> >> > It turns out there was only one place that expected a 1-1 mapping of old
>> >> > and new databases (file transfer), so I just modified that code to allow
>> >> > skipping a database in the new cluster that didn't exist in the old
>> >> > cluster.
>> >>
>> >> Urp. ?But that means that if someone has any data in that database,
>> >> pg_upgrade will basically eat it. ?That does not seem like a step
>> >> forward.
>> >
>> > Please clarify? ?We already check that all the new cluster databases are
>> > empty, so we are effectively skipping the transfering of files into
>> > empty new cluster databases. ?It processes all old cluster databases and
>> > forces a new cluster match --- it is only empty new cluster database
>> > that are being skipped.
>>
>> Aren't you saying that if a postgres database exists in the old
>> database (and potentially contains data) but is missing in the new
>> database, we'll just fail to migrate it?
>
> No, the reverse.  If the 'postgres' database exists in the new cluster,
> but not in the old, we allow it to upgrade (we skip over the 'postgres'
> database in the new cluster use the loop in the patch).

Oh, OK.  That seems fine - in fact, that seems perfect.

> Unless I am missing something.  Did you see something odd in the patch
> or in my wording?

Your wording confused me, but on further review I think I'm just
easily confused.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Tue, Nov 1, 2011 at 1:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Bruce Momjian wrote:
>> > What I would prefer is to have the upgrade succeed, and just ignore
>> > the existence of a postgres database in the new cluster.  Maybe give
>> > the user a notice and let them decide whether they wish to take any
>> > action.  I understand that failing is probably less code, but IMHO one
>> > of the biggest problems with pg_upgrade is that it's too fragile:
>> > there are too many seemingly innocent things that can make it croak
>> > (which isn't good, when you consider that anyone using pg_upgrade is
>> > probably in a hurry to get the upgrade done and the database back
>> > on-line).  It seems like this is an opportunity to get rid of one of
>> > those unnecessary failure cases.
>>
>> OK, then the simplest fix, once you modify pg_dumpall, would be to
>> modify pg_upgrade to remove reference to the postgres database in the
>> new cluster if it doesn't exist in the old one.  That would allow
>> pg_upgrade to maintain a 1-1 matching of databases in the old and new
>> cluster --- it allows the change to be locallized without affecting much
>> code.
>
> I fixed this a different way.  I originally thought I could skip over
> the 'postgres' database in the new cluster if it didn't exist in the old
> cluster, but we have do things like check it is empty, so that was going
> to be awkward.
>
> It turns out there was only one place that expected a 1-1 mapping of old
> and new databases (file transfer), so I just modified that code to allow
> skipping a database in the new cluster that didn't exist in the old
> cluster.

Urp.  But that means that if someone has any data in that database,
pg_upgrade will basically eat it.  That does not seem like a step
forward.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Sat, Oct 29, 2011 at 4:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 28, 2011 at 9:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> OK, the attached, applied patch removes the pg_upgrade dependency on the
>> 'postgres' database existing in the new cluster.  However, vacuumdb,
>> used by pg_upgrade, still has this dependency:
>>
>>            conn = connectDatabase("postgres", host, port, username,
>>                prompt_password, progname);
>>
>> In fact, all the /scripts binaries use the postgres database, except for
>> createdb/dropdb, which has this heuristic:
>>
>>    /*
>>     * Connect to the 'postgres' database by default, except have the
>>     * 'postgres' user use 'template1' so he can create the 'postgres'
>>     * database.
>>     */
>>    conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
>>                           host, port, username, prompt_password, progname);
>>
>> This makes sense because you might be creating or dropping the postgres
>> database.  Do we want these to have smarter database selection code?
>
> Well, I suppose as long as we're cleaning this up, we might as well be
> thorough, so, sure, why not?  I think the algorithm pg_dumpall uses is
> pretty sensible: let the user specify the database to use if they so
> desire; if not, try postgres first and then template1.  I think we
> could stick some logic for that in common.c which could be shared by
> clusterdb, createdb, dropdb, dropuser, reindexdb, and vacuumdb.
>
> However, we need to rethink the flag to be used for this: pg_dumpall
> uses -l, but many of the other utilities already use that for some
> other purpose, and it's not exactly mnemonic anyway.  "-d" for
> database could work, but that's also in use in some places, and
> furthermore somewhat confusing since many if not all of these
> utilities have an option to operate on a single database only, and you
> might think that -d would specify the database to operate on, rather
> than the one to be used to get the list of databases.  pgAdmin uses
> the term "maintenance database" to refer to a database to be used when
> none is explicitly specified, and I think that's fairly clear
> terminology.  So I propose that we add a --maintenance-db option (with
> no short form, since this is a relatively obscure need) to the tools
> listed above.  The tools will pass the associated value (or NULL if
> the option is not specified) to the above-mentioned routine in
> common.c, which will do the rest.
>
> If nobody objects, I'll go do that.  Hopefully that should be enough
> to put this problem to bed more or less permanently.

All right, I've worked up a (rather boring and tedious) patch to do
this, which is attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> > However, we need to rethink the flag to be used for this: pg_dumpall
> > uses -l, but many of the other utilities already use that for some
> > other purpose, and it's not exactly mnemonic anyway. ?"-d" for
> > database could work, but that's also in use in some places, and
> > furthermore somewhat confusing since many if not all of these
> > utilities have an option to operate on a single database only, and you
> > might think that -d would specify the database to operate on, rather
> > than the one to be used to get the list of databases. ?pgAdmin uses
> > the term "maintenance database" to refer to a database to be used when
> > none is explicitly specified, and I think that's fairly clear
> > terminology. ?So I propose that we add a --maintenance-db option (with
> > no short form, since this is a relatively obscure need) to the tools
> > listed above. ?The tools will pass the associated value (or NULL if
> > the option is not specified) to the above-mentioned routine in
> > common.c, which will do the rest.
> >
> > If nobody objects, I'll go do that. ?Hopefully that should be enough
> > to put this problem to bed more or less permanently.
> 
> All right, I've worked up a (rather boring and tedious) patch to do
> this, which is attached.

I wonder if we should bother using a flag for this.  No one has asked
for one, and the new code to conditionally connect to databases should
function fine for most use cases.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Wed, Nov 2, 2011 at 2:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> > However, we need to rethink the flag to be used for this: pg_dumpall
>> > uses -l, but many of the other utilities already use that for some
>> > other purpose, and it's not exactly mnemonic anyway. ?"-d" for
>> > database could work, but that's also in use in some places, and
>> > furthermore somewhat confusing since many if not all of these
>> > utilities have an option to operate on a single database only, and you
>> > might think that -d would specify the database to operate on, rather
>> > than the one to be used to get the list of databases. ?pgAdmin uses
>> > the term "maintenance database" to refer to a database to be used when
>> > none is explicitly specified, and I think that's fairly clear
>> > terminology. ?So I propose that we add a --maintenance-db option (with
>> > no short form, since this is a relatively obscure need) to the tools
>> > listed above. ?The tools will pass the associated value (or NULL if
>> > the option is not specified) to the above-mentioned routine in
>> > common.c, which will do the rest.
>> >
>> > If nobody objects, I'll go do that. ?Hopefully that should be enough
>> > to put this problem to bed more or less permanently.
>>
>> All right, I've worked up a (rather boring and tedious) patch to do
>> this, which is attached.
>
> I wonder if we should bother using a flag for this.  No one has asked
> for one, and the new code to conditionally connect to databases should
> function fine for most use cases.

True, but OTOH we have such a flag for pg_dumpall, and I've already
done the work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> >> > If nobody objects, I'll go do that. ?Hopefully that should be enough
> >> > to put this problem to bed more or less permanently.
> >>
> >> All right, I've worked up a (rather boring and tedious) patch to do
> >> this, which is attached.
> >
> > I wonder if we should bother using a flag for this. ?No one has asked
> > for one, and the new code to conditionally connect to databases should
> > function fine for most use cases.
> 
> True, but OTOH we have such a flag for pg_dumpall, and I've already
> done the work.

Well, every user-visible API option has a cost, and I am not sure there
is enough usefulness to overcome the cost of this.

As far as pg_dumpall, you could argue that the -l flag isn't needed; 
the docs say:
      -l dbname, --database=dbname          Specifies the name of the database to connect to to          dump global
objectsand discover what other databases          should be dumped. If not specified, the postgres          database
willbe used, and if that does not exist,          template1 will be used.
 

What is the value of this flag?  The only value I can see would be if
the 'postgres' database does not exist and you are concerned that you
might block create database operations during pg_dumpall's dump of
global objects, or you don't have permissions for template1 for some
reason.

Also, if we are going to add this flag, we should have pg_dumpall use it
too and just depricate the old options.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Wed, Nov 2, 2011 at 8:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> >> > If nobody objects, I'll go do that. ?Hopefully that should be enough
>> >> > to put this problem to bed more or less permanently.
>> >>
>> >> All right, I've worked up a (rather boring and tedious) patch to do
>> >> this, which is attached.
>> >
>> > I wonder if we should bother using a flag for this. ?No one has asked
>> > for one, and the new code to conditionally connect to databases should
>> > function fine for most use cases.
>>
>> True, but OTOH we have such a flag for pg_dumpall, and I've already
>> done the work.
>
> Well, every user-visible API option has a cost, and I am not sure there
> is enough usefulness to overcome the cost of this.

I am not sure why you think this is worth the time it takes to argue
about it, but if you want to whack the patch around or just forget the
whole thing, go ahead.  The difference between what you're proposing
and what I'm proposing is about 25 lines of code, so it hardly needs
an acre of justification.  To me, making the tools consistent with
each other and not dependent on the user's choice of database names is
worth the tiny amount of code it takes to make that happen.

> Also, if we are going to add this flag, we should have pg_dumpall use it
> too and just depricate the old options.

I thought about that, but couldn't think of a compelling reason to
break backward compatibility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Wed, Nov 2, 2011 at 8:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> >> > If nobody objects, I'll go do that. ?Hopefully that should be enough
> >> >> > to put this problem to bed more or less permanently.
> >> >>
> >> >> All right, I've worked up a (rather boring and tedious) patch to do
> >> >> this, which is attached.
> >> >
> >> > I wonder if we should bother using a flag for this. ?No one has asked
> >> > for one, and the new code to conditionally connect to databases should
> >> > function fine for most use cases.
> >>
> >> True, but OTOH we have such a flag for pg_dumpall, and I've already
> >> done the work.
> >
> > Well, every user-visible API option has a cost, and I am not sure there
> > is enough usefulness to overcome the cost of this.
> 
> I am not sure why you think this is worth the time it takes to argue
> about it, but if you want to whack the patch around or just forget the
> whole thing, go ahead.  The difference between what you're proposing
> and what I'm proposing is about 25 lines of code, so it hardly needs
> an acre of justification.  To me, making the tools consistent with
> each other and not dependent on the user's choice of database names is
> worth the tiny amount of code it takes to make that happen.

Well, it would be good to get other opinions on this.  The amount of
code isn't really the issue for me, but rather keeping the user API as
clean as possible.

I don't want someone to say, "Oh, here's a new user option.  Wonder why
I should use it?  Hmm, no one can tell me."

If an option's use-case is not clear, we have to explain in the docs why
to use it, and right now no one can tell me why we should use it.

> > Also, if we are going to add this flag, we should have pg_dumpall use it
> > too and just deprecate the old options.
> 
> I thought about that, but couldn't think of a compelling reason to
> break backward compatibility.

Well, I figure we better have something compelling to do any change,
including a new command-line option.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> I fixed this a different way.  I originally thought I could skip over
> the 'postgres' database in the new cluster if it didn't exist in the old
> cluster, but we have do things like check it is empty, so that was going
> to be awkward.
>
> It turns out there was only one place that expected a 1-1 mapping of old
> and new databases (file transfer), so I just modified that code to allow
> skipping a database in the new cluster that didn't exist in the old
> cluster.
>
> Attached patch applied.  This allows an upgrade if the 'postgres'
> database is missing from the old cluster.

OK, I thought some more and didn't like the way the code could loop off
the end of the new cluster without matching all the old cluster
database.

The attached, applied patches improves this.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c
new file mode 100644
index 382588f..d67d01f
*** a/contrib/pg_upgrade/relfilenode.c
--- b/contrib/pg_upgrade/relfilenode.c
*************** transfer_all_new_dbs(DbInfoArr *old_db_a
*** 41,51 ****

      /* Scan the old cluster databases and transfer their files */
      for (old_dbnum = new_dbnum = 0;
!          old_dbnum < old_db_arr->ndbs && new_dbnum < new_db_arr->ndbs;
           old_dbnum++, new_dbnum++)
      {
!         DbInfo       *old_db = &old_db_arr->dbs[old_dbnum];
!         DbInfo       *new_db = &new_db_arr->dbs[new_dbnum];
          FileNameMap *mappings;
          int            n_maps;
          pageCnvCtx *pageConverter = NULL;
--- 41,50 ----

      /* Scan the old cluster databases and transfer their files */
      for (old_dbnum = new_dbnum = 0;
!          old_dbnum < old_db_arr->ndbs;
           old_dbnum++, new_dbnum++)
      {
!         DbInfo       *old_db = &old_db_arr->dbs[old_dbnum], *new_db;
          FileNameMap *mappings;
          int            n_maps;
          pageCnvCtx *pageConverter = NULL;
*************** transfer_all_new_dbs(DbInfoArr *old_db_a
*** 55,67 ****
           *    but not in the old, e.g. "postgres".  (The user might
           *    have removed the 'postgres' database from the old cluster.)
           */
!         while (strcmp(old_db->db_name, new_db->db_name) != 0 &&
!                new_dbnum < new_db_arr->ndbs)
!             new_db = &new_db_arr->dbs[++new_dbnum];

!         if (strcmp(old_db->db_name, new_db->db_name) != 0)
!             pg_log(PG_FATAL, "old and new databases have different names: old \"%s\", new \"%s\"\n",
!                    old_db->db_name, new_db->db_name);

          n_maps = 0;
          mappings = gen_db_file_maps(old_db, new_db, &n_maps, old_pgdata,
--- 54,69 ----
           *    but not in the old, e.g. "postgres".  (The user might
           *    have removed the 'postgres' database from the old cluster.)
           */
!         for (; new_dbnum < new_db_arr->ndbs; new_dbnum++)
!         {
!             new_db = &new_db_arr->dbs[new_dbnum];
!             if (strcmp(old_db->db_name, new_db->db_name) == 0)
!                 break;
!         }

!         if (new_dbnum >= new_db_arr->ndbs)
!             pg_log(PG_FATAL, "old database \"%s\" not found in the new cluster\n",
!                    old_db->db_name);

          n_maps = 0;
          mappings = gen_db_file_maps(old_db, new_db, &n_maps, old_pgdata,

Re: pg_upgrade if 'postgres' database is dropped

From
Magnus Hagander
Date:
On Thu, Nov 3, 2011 at 11:20, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Wed, Nov 2, 2011 at 8:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Robert Haas wrote:
>> >> >> > If nobody objects, I'll go do that. ?Hopefully that should be enough
>> >> >> > to put this problem to bed more or less permanently.
>> >> >>
>> >> >> All right, I've worked up a (rather boring and tedious) patch to do
>> >> >> this, which is attached.
>> >> >
>> >> > I wonder if we should bother using a flag for this. ?No one has asked
>> >> > for one, and the new code to conditionally connect to databases should
>> >> > function fine for most use cases.
>> >>
>> >> True, but OTOH we have such a flag for pg_dumpall, and I've already
>> >> done the work.
>> >
>> > Well, every user-visible API option has a cost, and I am not sure there
>> > is enough usefulness to overcome the cost of this.
>>
>> I am not sure why you think this is worth the time it takes to argue
>> about it, but if you want to whack the patch around or just forget the
>> whole thing, go ahead.  The difference between what you're proposing
>> and what I'm proposing is about 25 lines of code, so it hardly needs
>> an acre of justification.  To me, making the tools consistent with
>> each other and not dependent on the user's choice of database names is
>> worth the tiny amount of code it takes to make that happen.
>
> Well, it would be good to get other opinions on this.  The amount of
> code isn't really the issue for me, but rather keeping the user API as
> clean as possible.

Seems reasonably clean to me. Not sure what would be unclean about it?
Are you saying we need to explain the concept of maintenance db
somewhere in the docs?

>> > Also, if we are going to add this flag, we should have pg_dumpall use it
>> > too and just deprecate the old options.
>>
>> I thought about that, but couldn't think of a compelling reason to
>> break backward compatibility.

Adding it to pg_dumpal lwithout removing the old one doesn't cause
backwards compatibility break. Then mark the old one as deprecated,
and remove it a few releases down the road. We can't keep every switch
around forever ;)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> On Thu, Nov 3, 2011 at 11:20, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Wed, Nov 2, 2011 at 8:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > Robert Haas wrote:
> >> >> >> > If nobody objects, I'll go do that. ?Hopefully that should be enough
> >> >> >> > to put this problem to bed more or less permanently.
> >> >> >>
> >> >> >> All right, I've worked up a (rather boring and tedious) patch to do
> >> >> >> this, which is attached.
> >> >> >
> >> >> > I wonder if we should bother using a flag for this. ?No one has asked
> >> >> > for one, and the new code to conditionally connect to databases should
> >> >> > function fine for most use cases.
> >> >>
> >> >> True, but OTOH we have such a flag for pg_dumpall, and I've already
> >> >> done the work.
> >> >
> >> > Well, every user-visible API option has a cost, and I am not sure there
> >> > is enough usefulness to overcome the cost of this.
> >>
> >> I am not sure why you think this is worth the time it takes to argue
> >> about it, but if you want to whack the patch around or just forget the
> >> whole thing, go ahead. ?The difference between what you're proposing
> >> and what I'm proposing is about 25 lines of code, so it hardly needs
> >> an acre of justification. ?To me, making the tools consistent with
> >> each other and not dependent on the user's choice of database names is
> >> worth the tiny amount of code it takes to make that happen.
> >
> > Well, it would be good to get other opinions on this. ?The amount of
> > code isn't really the issue for me, but rather keeping the user API as
> > clean as possible.
> 
> Seems reasonably clean to me. Not sure what would be unclean about it?
> Are you saying we need to explain the concept of maintenance db
> somewhere in the docs?
> 
> >> > Also, if we are going to add this flag, we should have pg_dumpall use it
> >> > too and just deprecate the old options.
> >>
> >> I thought about that, but couldn't think of a compelling reason to
> >> break backward compatibility.
> 
> Adding it to pg_dumpal lwithout removing the old one doesn't cause
> backwards compatibility break. Then mark the old one as deprecated,
> and remove it a few releases down the road. We can't keep every switch
> around forever ;)

OK, good.  I will work on this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> On Thu, Nov 3, 2011 at 11:20, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Wed, Nov 2, 2011 at 8:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > Robert Haas wrote:
> >> >> >> > If nobody objects, I'll go do that. ?Hopefully that should be enough
> >> >> >> > to put this problem to bed more or less permanently.
> >> >> >>
> >> >> >> All right, I've worked up a (rather boring and tedious) patch to do
> >> >> >> this, which is attached.
> >> >> >
> >> >> > I wonder if we should bother using a flag for this. ?No one has asked
> >> >> > for one, and the new code to conditionally connect to databases should
> >> >> > function fine for most use cases.
> >> >>
> >> >> True, but OTOH we have such a flag for pg_dumpall, and I've already
> >> >> done the work.
> >> >
> >> > Well, every user-visible API option has a cost, and I am not sure there
> >> > is enough usefulness to overcome the cost of this.
> >>
> >> I am not sure why you think this is worth the time it takes to argue
> >> about it, but if you want to whack the patch around or just forget the
> >> whole thing, go ahead. ?The difference between what you're proposing
> >> and what I'm proposing is about 25 lines of code, so it hardly needs
> >> an acre of justification. ?To me, making the tools consistent with
> >> each other and not dependent on the user's choice of database names is
> >> worth the tiny amount of code it takes to make that happen.
> >
> > Well, it would be good to get other opinions on this. ?The amount of
> > code isn't really the issue for me, but rather keeping the user API as
> > clean as possible.
> 
> Seems reasonably clean to me. Not sure what would be unclean about it?
> Are you saying we need to explain the concept of maintenance db
> somewhere in the docs?

My point is that no one can explain why they would need to specify an
alternate database when using 'postgres' and falling back to 'template1'
works for almost everyone.  I say just make it automatic, as it was in
Robert's patch, and be done with it.

> >> > Also, if we are going to add this flag, we should have pg_dumpall use it
> >> > too and just deprecate the old options.
> >>
> >> I thought about that, but couldn't think of a compelling reason to
> >> break backward compatibility.
> 
> Adding it to pg_dumpall without removing the old one doesn't cause
> backwards compatibility break. Then mark the old one as deprecated,
> and remove it a few releases down the road. We can't keep every switch
> around forever ;)

OK.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade if 'postgres' database is dropped

From
Robert Haas
Date:
On Tue, Dec 6, 2011 at 7:00 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Seems reasonably clean to me. Not sure what would be unclean about it?

Based on this feedback, I went ahead and committed my previous patch.
This means that if pg_upgrade wants to accept a --maintenance-db
option, it will be able to pass it through to any other commands it
invokes.  And if it doesn't do that, vacuumdb et. al. should still
work anyway, as long as either template1 or postgres is accessible.

>>> > Also, if we are going to add this flag, we should have pg_dumpall use it
>>> > too and just deprecate the old options.
>>>
>>> I thought about that, but couldn't think of a compelling reason to
>>> break backward compatibility.
>
> Adding it to pg_dumpal lwithout removing the old one doesn't cause
> backwards compatibility break. Then mark the old one as deprecated,
> and remove it a few releases down the road. We can't keep every switch
> around forever ;)

I'm not really sold on tinkering with pg_dumpall; if it ain't broke,
don't fix it.  But we can bikeshed about it if we like.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company