Thread: Problem with pg_upgrade?

Problem with pg_upgrade?

From
Bruce Momjian
Date:
I have gotten two reports via IRC that months after using 9.0
pg_upgrade, some of the clog files have been removed while there is
still data in the table needing those clog files.  These reports came to
me through Rhodiumtoad who analyzed the systems.

Looking at pg_upgrade, I am concerned that somehow autovaccum is running
in frozen mode before I have restored the frozen xids for the table or
database.  Here is the code I am using:
   snprintf(cmd, sizeof(cmd),            SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "            "-o \"-p %d -c
autovacuum=off"            "-c autovacuum_freeze_max_age=2000000000\" "            "start >> \"%s\" 2>&1" SYSTEMQUOTE,
         bindir,
 

Does anyone have any other suggestions on how to make sure autovacuum
does not run in freeze mode?  I know 'autovacuum=off' turns off normal
autovacuum.  Would increasing autovacuum_naptime help?  It looks like
the autovacuum code sleeps before processing anything, but I am not
certain.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Jeff Davis
Date:
On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote:
> Does anyone have any other suggestions on how to make sure autovacuum
> does not run in freeze mode?

Can you run in single user mode?

Regards,Jeff Davis



Re: Problem with pg_upgrade?

From
Alvaro Herrera
Date:
Excerpts from Jeff Davis's message of mar mar 29 21:27:34 -0300 2011:
> On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote:
> > Does anyone have any other suggestions on how to make sure autovacuum
> > does not run in freeze mode?
> 
> Can you run in single user mode?

I asked the same thing.  Apparently the problem is that it would make
error handling a lot more difficult.  I think it would be better to have
some sort of option to disable autovacuum completely which would be used
only during pg_upgrade.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Problem with pg_upgrade?

From
Jeff Davis
Date:
On Tue, 2011-03-29 at 21:43 -0300, Alvaro Herrera wrote:
> I think it would be better to have
> some sort of option to disable autovacuum completely which would be used
> only during pg_upgrade.

Sounds reasonable to me.

Regards,Jeff Davis




Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Excerpts from Jeff Davis's message of mar mar 29 21:27:34 -0300 2011:
> > On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote:
> > > Does anyone have any other suggestions on how to make sure autovacuum
> > > does not run in freeze mode?
> > 
> > Can you run in single user mode?
> 
> I asked the same thing.  Apparently the problem is that it would make
> error handling a lot more difficult.  I think it would be better to have
> some sort of option to disable autovacuum completely which would be used
> only during pg_upgrade.

Yes, also consider that pg_dumpall assumes psql with its use of
\connect, and I would have to start/stop the single-user backend for
every database change, plus I use psql with ON_ERROR_STOP=on.

I think we have three options:
o  find if the use of autovacuum_freeze_max_age is safe, or make   it safeo  document that autovacuum_naptime always
happensbefore   autovacuum does anything and set it higho  modify autovacuum to be an enum, with values
on/off/disabled

I think the last one makes more sense, and is safer if we need to
backpatch this.  Creating a new variable for this would be confusing
because it could conflict with the 'autovacuum' setting.

Also, I am unclear if this is really our bug.  At least one of the
systems was on Ubuntu/Debian, and they might both have been, and I know
Debian changes our source code.  Where can I find a copy of the diffs
they have made?

I am hearing only second-hand reports of this problem through
Rhodiumtoad on IRC.  I don't have IRC access this week so if someone can
get details from him that would help.  I think the fix he found was to
pull the clog files off of an old file system backup.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Peter Eisentraut
Date:
On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote:
> Also, I am unclear if this is really our bug.  At least one of the
> systems was on Ubuntu/Debian, and they might both have been, and I know
> Debian changes our source code.  Where can I find a copy of the diffs
> they have made?

http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/




Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote:
> > Also, I am unclear if this is really our bug.  At least one of the
> > systems was on Ubuntu/Debian, and they might both have been, and I know
> > Debian changes our source code.  Where can I find a copy of the diffs
> > they have made?
> 
> http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/

These all seem reasonable, but I am confused by this.  Someone reported
last week that the equals sign is not optional in postgreql.conf on
Debian but I don't see that patch in here.  Also I thought they modified
pg_hba.conf in some odd way.  Are these changes no longer made in 9.0?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Robert Haas
Date:
On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote:
> I think we have three options:
>
>        o  find if the use of autovacuum_freeze_max_age is safe, or make
>           it safe
>        o  document that autovacuum_naptime always happens before
>           autovacuum does anything and set it high
>        o  modify autovacuum to be an enum, with values on/off/disabled
>
> I think the last one makes more sense, and is safer if we need to
> backpatch this.  Creating a new variable for this would be confusing
> because it could conflict with the 'autovacuum' setting.

I have to admit the prospect of abuse is slightly frightening to me
here.  I guess we can't be held responsible for users who do dumb
things, but it might not be too clear to someone what the difference
is between autovacuum=off and autovacuum=disabled.  I don't really
understand why this is an issue in the first place, though.  Surely we
must be setting the XID counter on the new cluster to match the one on
the old cluster, and migrating the relfrozenxid and datfrozenxid
settings, so why does it matter if someone runs vacuum freeze?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Problem with pg_upgrade?

From
Jeff Davis
Date:
On Wed, 2011-03-30 at 16:46 -0400, Robert Haas wrote:
> I don't really
> understand why this is an issue in the first place, though.  Surely we
> must be setting the XID counter on the new cluster to match the one on
> the old cluster, and migrating the relfrozenxid and datfrozenxid
> settings, so why does it matter if someone runs vacuum freeze?

Because autovacuum may run before those things are properly set, as
Bruce said in the original email:

"I am concerned that somehow autovaccum is running
in frozen mode before I have restored the frozen xids for the table or
database."

I think some kind of hidden GUC might be the best option. I tend to
agree that a third option to the "autovacuum" setting would be
confusing.

Regards,Jeff Davis



Re: Problem with pg_upgrade?

From
Peter Eisentraut
Date:
On ons, 2011-03-30 at 15:39 -0400, Bruce Momjian wrote:
> Peter Eisentraut wrote:
> > On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote:
> > > Also, I am unclear if this is really our bug.  At least one of the
> > > systems was on Ubuntu/Debian, and they might both have been, and I know
> > > Debian changes our source code.  Where can I find a copy of the diffs
> > > they have made?
> > 
> > http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/
> 
> These all seem reasonable, but I am confused by this.  Someone reported
> last week that the equals sign is not optional in postgreql.conf on
> Debian but I don't see that patch in here.

That's probably because some other tool processes to the configuration
file to find the data directory or something.

> Also I thought they modified
> pg_hba.conf in some odd way.  Are these changes no longer made in 9.0?

I think that was about 10 years ago.




Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > I think we have three options:
> >
> > ? ? ? ?o ?find if the use of autovacuum_freeze_max_age is safe, or make
> > ? ? ? ? ? it safe
> > ? ? ? ?o ?document that autovacuum_naptime always happens before
> > ? ? ? ? ? autovacuum does anything and set it high
> > ? ? ? ?o ?modify autovacuum to be an enum, with values on/off/disabled
> >
> > I think the last one makes more sense, and is safer if we need to
> > backpatch this. ?Creating a new variable for this would be confusing
> > because it could conflict with the 'autovacuum' setting.
> 
> I have to admit the prospect of abuse is slightly frightening to me
> here.  I guess we can't be held responsible for users who do dumb
> things, but it might not be too clear to someone what the difference
> is between autovacuum=off and autovacuum=disabled.  I don't really
> understand why this is an issue in the first place, though.  Surely we
> must be setting the XID counter on the new cluster to match the one on
> the old cluster, and migrating the relfrozenxid and datfrozenxid
> settings, so why does it matter if someone runs vacuum freeze?

First, I am not sure it is a problem, but based on the IRC reports I
felt I should ask here for confirmation.  Here is a sample pg_dump
output:
CREATE TABLE sample (    x integer);-- For binary upgrade, set relfrozenxid.UPDATE pg_catalog.pg_classSET relfrozenxid
='703'WHERE oid = 'sample'::pg_catalog.regclass;
 

So, we set the cluster xid while we do this schema-only restore.  I
belive it might be possible for autovacuum to run while the schema is
restored, see an empty table, and set the relfrozenxid to be the current
xid, when in fact we are about to put a heap file in place of the
current empty file.  I thought the autovacuum_freeze_max_age=2000000000
would prevent this but now I am not sure.  I assumed that since the gap
between the restored relfrozenxid and the current counter would
certainly be < 2000000000 that autovacuum would not touch it.  It is
possible these users had drastically modified autovacuum_freeze_max_age
to cause 3-billion gaps --- again, I have no direct contact with the
reporters, but I figured being paranoid is a good thing where pg_upgrade
is involved.

I wonder if the fact that these people never reported the bug means
there were doing something odd with their servers.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> First, I am not sure it is a problem, but based on the IRC reports I
> felt I should ask here for confirmation.  Here is a sample pg_dump
> output:
> 
>     CREATE TABLE sample (
>         x integer
>     );
>     
>     -- For binary upgrade, set relfrozenxid.
>     UPDATE pg_catalog.pg_class
>     SET relfrozenxid = '703'
>     WHERE oid = 'sample'::pg_catalog.regclass;
> 
> So, we set the cluster xid while we do this schema-only restore.  I
> belive it might be possible for autovacuum to run while the schema is
> restored, see an empty table, and set the relfrozenxid to be the current
> xid, when in fact we are about to put a heap file in place of the
> current empty file.  I thought the autovacuum_freeze_max_age=2000000000
> would prevent this but now I am not sure.  I assumed that since the gap
> between the restored relfrozenxid and the current counter would
> certainly be < 2000000000 that autovacuum would not touch it.  It is
> possible these users had drastically modified autovacuum_freeze_max_age
> to cause 3-billion gaps --- again, I have no direct contact with the
> reporters, but I figured being paranoid is a good thing where pg_upgrade
> is involved.
> 
> I wonder if the fact that these people never reported the bug means
> there were doing something odd with their servers.

I just updated the C comment about what we are doing:
    * Using autovacuum=off disables cleanup vacuum and analyze, but    * freeze vacuums can still happen, so we set
*autovacuum_freeze_max_age very high.  We assume all datfrozenxid and    * relfrozen values are less than a gap of
2000000000from the current    * xid counter, so autovacuum will not touch them.
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > I wonder if the fact that these people never reported the bug means
> > there were doing something odd with their servers.
> 
> I just updated the C comment about what we are doing:
> 
>      * Using autovacuum=off disables cleanup vacuum and analyze, but
>      * freeze vacuums can still happen, so we set
>      * autovacuum_freeze_max_age very high.  We assume all datfrozenxid and
>      * relfrozen values are less than a gap of 2000000000 from the current
>      * xid counter, so autovacuum will not touch them.

FYI, 2000000000 is the maximum value for autovacuum_freeze_max_age, so a
user can't set it higher.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Robert Haas
Date:
On Wed, Mar 30, 2011 at 5:27 PM, Bruce Momjian <bruce@momjian.us> wrote:
> First, I am not sure it is a problem, but based on the IRC reports I
> felt I should ask here for confirmation.  Here is a sample pg_dump
> output:
>
>        CREATE TABLE sample (
>            x integer
>        );
>
>        -- For binary upgrade, set relfrozenxid.
>        UPDATE pg_catalog.pg_class
>        SET relfrozenxid = '703'
>        WHERE oid = 'sample'::pg_catalog.regclass;
>
> So, we set the cluster xid while we do this schema-only restore.  I
> belive it might be possible for autovacuum to run while the schema is
> restored, see an empty table, and set the relfrozenxid to be the current
> xid, when in fact we are about to put a heap file in place of the
> current empty file.  I thought the autovacuum_freeze_max_age=2000000000
> would prevent this but now I am not sure.  I assumed that since the gap
> between the restored relfrozenxid and the current counter would
> certainly be < 2000000000 that autovacuum would not touch it.  It is
> possible these users had drastically modified autovacuum_freeze_max_age
> to cause 3-billion gaps --- again, I have no direct contact with the
> reporters, but I figured being paranoid is a good thing where pg_upgrade
> is involved.

It does seem possible that that could happen, but I'm not sure exactly
what would be causing autovacuum to fire in the first place.  It
wouldn't have to be triggered by the anti-wraparound machinery - if
the table appeared to be in need of vacuuming, then we'd vacuum it,
discover that is was empty, and update relfrozenxid.  Hmm... could it
fire just because the table has no stats?  But if that were the case
you'd think we'd be seeing this more often.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Robert Haas wrote:
> > So, we set the cluster xid while we do this schema-only restore. ?I
> > belive it might be possible for autovacuum to run while the schema is
> > restored, see an empty table, and set the relfrozenxid to be the current
> > xid, when in fact we are about to put a heap file in place of the
> > current empty file. ?I thought the autovacuum_freeze_max_age=2000000000
> > would prevent this but now I am not sure. ?I assumed that since the gap
> > between the restored relfrozenxid and the current counter would
> > certainly be < 2000000000 that autovacuum would not touch it. ?It is
> > possible these users had drastically modified autovacuum_freeze_max_age
> > to cause 3-billion gaps --- again, I have no direct contact with the
> > reporters, but I figured being paranoid is a good thing where pg_upgrade
> > is involved.
> 
> It does seem possible that that could happen, but I'm not sure exactly
> what would be causing autovacuum to fire in the first place.  It
> wouldn't have to be triggered by the anti-wraparound machinery - if
> the table appeared to be in need of vacuuming, then we'd vacuum it,
> discover that is was empty, and update relfrozenxid.  Hmm... could it
> fire just because the table has no stats?  But if that were the case
> you'd think we'd be seeing this more often.

Well, autovacuum=off, so it should only run in freeze mode, and I can't
see how that could happen.  I am thinking I have to study autovacuum.c.

I wonder if datfrozenxid could be incremented because the database is
originally empty.  It would just need to scan pg_class, not actually
vacuum anything.  I wonder if we do that.  The bottom line is I am
hanging too much on autovacuum_freeze_max_age causing autovacuum to do
nothing.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > It does seem possible that that could happen, but I'm not sure exactly
> > what would be causing autovacuum to fire in the first place.  It
> > wouldn't have to be triggered by the anti-wraparound machinery - if
> > the table appeared to be in need of vacuuming, then we'd vacuum it,
> > discover that is was empty, and update relfrozenxid.  Hmm... could it
> > fire just because the table has no stats?  But if that were the case
> > you'd think we'd be seeing this more often.
> 
> Well, autovacuum=off, so it should only run in freeze mode, and I can't
> see how that could happen.  I am thinking I have to study autovacuum.c.
> 
> I wonder if datfrozenxid could be incremented because the database is
> originally empty.  It would just need to scan pg_class, not actually
> vacuum anything.  I wonder if we do that.  The bottom line is I am
> hanging too much on autovacuum_freeze_max_age causing autovacuum to do
> nothing.

What if we allow autovacuum_max_workers to be set to zero;  the current
minimum is one.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
> > > It does seem possible that that could happen, but I'm not sure exactly
> > > what would be causing autovacuum to fire in the first place.  It
> > > wouldn't have to be triggered by the anti-wraparound machinery - if
> > > the table appeared to be in need of vacuuming, then we'd vacuum it,
> > > discover that is was empty, and update relfrozenxid.  Hmm... could it
> > > fire just because the table has no stats?  But if that were the case
> > > you'd think we'd be seeing this more often.
> > 
> > Well, autovacuum=off, so it should only run in freeze mode, and I can't
> > see how that could happen.  I am thinking I have to study autovacuum.c.
> > 
> > I wonder if datfrozenxid could be incremented because the database is
> > originally empty.  It would just need to scan pg_class, not actually
> > vacuum anything.  I wonder if we do that.  The bottom line is I am
> > hanging too much on autovacuum_freeze_max_age causing autovacuum to do
> > nothing.
> 
> What if we allow autovacuum_max_workers to be set to zero;  the current
> minimum is one.

I can think of one case where autovacuum_freeze_max_age would be
insufficient.  If you set autovacuum_freeze_max_age in the old cluster
to 2B, and you had a database that was near that limit, the tables
created by pg_upgrade's --schema-only restore might create enough new
transactions to cause autovacuum to run in freeze mode.  While I think
it is unlikely that is the cause of the problem report, it is enough for
me to discount using autovacuum_freeze_max_age to disable autovacuum
freeze.

I will work on code to allow autovacuum_max_workers to be set to zero in
HEAD and 9.0, and have pg_upgrade us that.  I think the maintenance
overhead of an invisible variable is too much.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Heikki Linnakangas
Date:
On 31.03.2011 17:55, Bruce Momjian wrote:
> I will work on code to allow autovacuum_max_workers to be set to zero in
> HEAD and 9.0, and have pg_upgrade us that.

We've intentionally not allowed the user to disable anti-wraparound 
autovacuum before. Do we really want to allow it now for the sake of 
pg_upgrade?

>  I think the maintenance
> overhead of an invisible variable is too much.

A simple GUC or command-line switch isn't much code.

Is the problem just that the clog files get removed too early, or is 
there something else? If it's just the clog files, we could simply copy 
them (again) after updating datfrozenxids.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> On 31.03.2011 17:55, Bruce Momjian wrote:
> > I will work on code to allow autovacuum_max_workers to be set to zero in
> > HEAD and 9.0, and have pg_upgrade us that.
> 
> We've intentionally not allowed the user to disable anti-wraparound 
> autovacuum before. Do we really want to allow it now for the sake of 
> pg_upgrade?

Not sure.

> >  I think the maintenance
> > overhead of an invisible variable is too much.
> 
> A simple GUC or command-line switch isn't much code.

Well, is this going to show in SHOW ALL or pg_settings?  Do we have the
ability to easily disable display of this?

> Is the problem just that the clog files get removed too early, or is 
> there something else? If it's just the clog files, we could simply copy 
> them (again) after updating datfrozenxids.

The problem is that pg_upgrade through pg_dumpall is setting
pg_database/pg_class frozen xid values and I can't have autovacuum
modifying the system while this is happening.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Robert Haas
Date:
On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>>  I think the maintenance
>> overhead of an invisible variable is too much.
>
> A simple GUC or command-line switch isn't much code.

I like the idea of a command-line switch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> >> ?I think the maintenance
> >> overhead of an invisible variable is too much.
> >
> > A simple GUC or command-line switch isn't much code.
> 
> I like the idea of a command-line switch.

If you want to do that you should gereralize it as --binary-upgrade in
case we have other needs for it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Robert Haas
Date:
On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>> >> ?I think the maintenance
>> >> overhead of an invisible variable is too much.
>> >
>> > A simple GUC or command-line switch isn't much code.
>>
>> I like the idea of a command-line switch.
>
> If you want to do that you should gereralize it as --binary-upgrade in
> case we have other needs for it.

Yeah.  Or we could do a binary_upgrade GUC which has the effect of
forcibly suppressing autovacuum, and maybe other things later.  I
think that's a lot less hazardous than fiddling with the autovacuum
GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Problem with pg_upgrade?

From
Gurjeet Singh
Date:
On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote:

I am hearing only second-hand reports of this problem through
Rhodiumtoad on IRC.  I don't have IRC access this week

If the firewalls allow port 80, you can use Freenode's web interface: webchat.freenode.net

Regards,
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
> >> <heikki.linnakangas@enterprisedb.com> wrote:
> >> >> ?I think the maintenance
> >> >> overhead of an invisible variable is too much.
> >> >
> >> > A simple GUC or command-line switch isn't much code.
> >>
> >> I like the idea of a command-line switch.
> >
> > If you want to do that you should gereralize it as --binary-upgrade in
> > case we have other needs for it.
> 
> Yeah.  Or we could do a binary_upgrade GUC which has the effect of
> forcibly suppressing autovacuum, and maybe other things later.  I
> think that's a lot less hazardous than fiddling with the autovacuum
> GUC.

I like the idea of a command-line flag because it forces everything to
be affected, and cannot be turned on and off in sessions --- if you are
doing a binary upgrade, locked-down is good. :-)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Problem with pg_upgrade?

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Robert Haas wrote:
> > On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > Robert Haas wrote:
> > >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
> > >> <heikki.linnakangas@enterprisedb.com> wrote:
> > >> >> ?I think the maintenance
> > >> >> overhead of an invisible variable is too much.
> > >> >
> > >> > A simple GUC or command-line switch isn't much code.
> > >>
> > >> I like the idea of a command-line switch.
> > >
> > > If you want to do that you should gereralize it as --binary-upgrade in
> > > case we have other needs for it.
> > 
> > Yeah.  Or we could do a binary_upgrade GUC which has the effect of
> > forcibly suppressing autovacuum, and maybe other things later.  I
> > think that's a lot less hazardous than fiddling with the autovacuum
> > GUC.
> 
> I like the idea of a command-line flag because it forces everything to
> be affected, and cannot be turned on and off in sessions --- if you are
> doing a binary upgrade, locked-down is good. :-)

Someone emailed me mentioning we might want to use this flag to cause
the server to use only local connections or lock it down in some other
way in the future.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pg_upgrade bug found!

From
Bruce Momjian
Date:
OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
the two reported pg_upgrade problems he saw via IRC.  It seems toast
tables have xids and pg_dump is not preserving the toast relfrozenxids
as it should.  Heap tables have preserved relfrozenxids, but if you
update a heap row but don't change the toast value, and the old heap row
is later removed, the toast table can have an older relfrozenxids than
the heap table.

The fix for this is to have pg_dump preserve toast relfrozenxids, which
can be easily added and backpatched.  We might want to push a 9.0.4 for
this.  Second, we need to find a way for people to detect and fix
existing systems that have this problem, perhaps looming when the
pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we
need to figure out how to get this information to users.  Perhaps the
communication comes through the 9.0.4 release announcement.

Yes, this is not good!  :-(

I will still add a special flag to postgres to turn off autovacuum, but
as we suspected, this is only a marginal improvement and not the cause
of the 9.0.X failures.  The good news is that only two people have seen
this problem and it only happens when the hint bits have not been set on
the toast rows and the oldest heap rows have been updated.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
> the two reported pg_upgrade problems he saw via IRC.  It seems toast
> tables have xids and pg_dump is not preserving the toast relfrozenxids
> as it should.  Heap tables have preserved relfrozenxids, but if you
> update a heap row but don't change the toast value, and the old heap row
> is later removed, the toast table can have an older relfrozenxids than
> the heap table.
>
> The fix for this is to have pg_dump preserve toast relfrozenxids, which
> can be easily added and backpatched.  We might want to push a 9.0.4 for
> this.  Second, we need to find a way for people to detect and fix
> existing systems that have this problem, perhaps looming when the
> pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we
> need to figure out how to get this information to users.  Perhaps the
> communication comes through the 9.0.4 release announcement.

I am not sure how to interpret the lack of replies to this email.
Either it is confidence, shock, or we told you so.  ;-)

Anyway, the attached patch fixes the problem.  The fix is for pg_dump's
binary upgrade mode.  This would need to be backpatched back to 8.4
because pg_migrator needs this too.

I have added a personal regression test to show which
pg_class.relfrozenxid values are not preserved, and with this patch the
only ones not preserved are toast tables used by system tables, which
are not copied from the old cluster (FirstNormalObjectId = 16384).  I am
attaching that old/new pg_class.relfrozenxid diff as well.

Any idea how to correct existing systems?  Would VACUUM FREEZE of just
the toast tables work?  I perhaps could create a short DO block that
would vacuum freeze just toast tables;  it would have to be run in every
database.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
new file mode 100644
index 3f6e77b..1ccdb4d
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** getTables(int *numTables)
*** 3812,3817 ****
--- 3812,3819 ----
      int            i_relhasrules;
      int            i_relhasoids;
      int            i_relfrozenxid;
+     int            i_toastoid;
+     int            i_toastfrozenxid;
      int            i_relpersistence;
      int            i_owning_tab;
      int            i_owning_col;
*************** getTables(int *numTables)
*** 3855,3861 ****
                            "(%s c.relowner) AS rolname, "
                            "c.relchecks, c.relhastriggers, "
                            "c.relhasindex, c.relhasrules, c.relhasoids, "
!                           "c.relfrozenxid, c.relpersistence, "
                            "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS
reloftype," 
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
--- 3857,3865 ----
                            "(%s c.relowner) AS rolname, "
                            "c.relchecks, c.relhastriggers, "
                            "c.relhasindex, c.relhasrules, c.relhasoids, "
!                           "c.relfrozenxid, tc.oid AS toid, "
!                           "tc.relfrozenxid AS tfrozenxid, "
!                           "c.relpersistence, "
                            "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS
reloftype," 
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3889,3895 ****
                            "(%s c.relowner) AS rolname, "
                            "c.relchecks, c.relhastriggers, "
                            "c.relhasindex, c.relhasrules, c.relhasoids, "
!                           "c.relfrozenxid, 'p' AS relpersistence, "
                            "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS
reloftype," 
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
--- 3893,3901 ----
                            "(%s c.relowner) AS rolname, "
                            "c.relchecks, c.relhastriggers, "
                            "c.relhasindex, c.relhasrules, c.relhasoids, "
!                           "c.relfrozenxid, tc.oid AS toid, "
!                           "tc.relfrozenxid AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS
reloftype," 
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3922,3928 ****
                            "(%s c.relowner) AS rolname, "
                            "c.relchecks, c.relhastriggers, "
                            "c.relhasindex, c.relhasrules, c.relhasoids, "
!                           "c.relfrozenxid, 'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
--- 3928,3936 ----
                            "(%s c.relowner) AS rolname, "
                            "c.relchecks, c.relhastriggers, "
                            "c.relhasindex, c.relhasrules, c.relhasoids, "
!                           "c.relfrozenxid, tc.oid AS toid, "
!                           "tc.relfrozenxid AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3955,3961 ****
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "relfrozenxid, 'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
--- 3963,3972 ----
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "relfrozenxid, "
!                           "0 AS toid, "
!                           "0 AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3987,3993 ****
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "0 AS relfrozenxid, 'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
--- 3998,4007 ----
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "0 AS relfrozenxid, "
!                           "0 AS toid, "
!                           "0 AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 4019,4025 ****
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "0 AS relfrozenxid, 'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
--- 4033,4042 ----
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "0 AS relfrozenxid, "
!                           "0 AS toid, "
!                           "0 AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "d.refobjid AS owning_tab, "
                            "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 4047,4053 ****
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "0 AS relfrozenxid, 'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "NULL::oid AS owning_tab, "
                            "NULL::int4 AS owning_col, "
--- 4064,4073 ----
                            "(%s relowner) AS rolname, "
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, relhasoids, "
!                           "0 AS relfrozenxid, "
!                           "0 AS toid, "
!                           "0 AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "NULL::oid AS owning_tab, "
                            "NULL::int4 AS owning_col, "
*************** getTables(int *numTables)
*** 4070,4076 ****
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, "
                            "'t'::bool AS relhasoids, "
!                           "0 AS relfrozenxid, 'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "NULL::oid AS owning_tab, "
                            "NULL::int4 AS owning_col, "
--- 4090,4099 ----
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, "
                            "'t'::bool AS relhasoids, "
!                           "0 AS relfrozenxid, "
!                           "0 AS toid, "
!                           "0 AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "NULL::oid AS owning_tab, "
                            "NULL::int4 AS owning_col, "
*************** getTables(int *numTables)
*** 4103,4109 ****
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, "
                            "'t'::bool AS relhasoids, "
!                           "0 as relfrozenxid, 'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "NULL::oid AS owning_tab, "
                            "NULL::int4 AS owning_col, "
--- 4126,4135 ----
                            "relchecks, (reltriggers <> 0) AS relhastriggers, "
                            "relhasindex, relhasrules, "
                            "'t'::bool AS relhasoids, "
!                           "0 as relfrozenxid, "
!                           "0 AS toid, "
!                           "0 AS tfrozenxid, "
!                           "'p' AS relpersistence, "
                            "NULL AS reloftype, "
                            "NULL::oid AS owning_tab, "
                            "NULL::int4 AS owning_col, "
*************** getTables(int *numTables)
*** 4149,4154 ****
--- 4175,4182 ----
      i_relhasrules = PQfnumber(res, "relhasrules");
      i_relhasoids = PQfnumber(res, "relhasoids");
      i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+     i_toastoid = PQfnumber(res, "toid");
+     i_toastfrozenxid = PQfnumber(res, "tfrozenxid");
      i_relpersistence = PQfnumber(res, "relpersistence");
      i_owning_tab = PQfnumber(res, "owning_tab");
      i_owning_col = PQfnumber(res, "owning_col");
*************** getTables(int *numTables)
*** 4190,4195 ****
--- 4218,4225 ----
          tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
          tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0);
          tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid));
+         tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid));
+         tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid));
          if (PQgetisnull(res, i, i_reloftype))
              tblinfo[i].reloftype = NULL;
          else
*************** dumpTableSchema(Archive *fout, TableInfo
*** 12221,12233 ****
                  }
              }

!             appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid\n");
              appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
                                "SET relfrozenxid = '%u'\n"
                                "WHERE oid = ",
                                tbinfo->frozenxid);
              appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
              appendPQExpBuffer(q, "::pg_catalog.regclass;\n");
          }

          /* Loop dumping statistics and storage statements */
--- 12251,12273 ----
                  }
              }

!             appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n");
              appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
                                "SET relfrozenxid = '%u'\n"
                                "WHERE oid = ",
                                tbinfo->frozenxid);
              appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
              appendPQExpBuffer(q, "::pg_catalog.regclass;\n");
+
+             if (tbinfo->toast_oid)
+             {
+                 /* We preserve the toast oids, so we can use it during restore */
+                 appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n");
+                 appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
+                                   "SET relfrozenxid = '%u'\n"
+                                   "WHERE oid = '%u';\n",
+                                   tbinfo->toast_frozenxid, tbinfo->toast_oid);
+             }
          }

          /* Loop dumping statistics and storage statements */
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
new file mode 100644
index 113ecb1..6559e23
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
*************** typedef struct _tableInfo
*** 248,253 ****
--- 248,255 ----
      bool        hastriggers;    /* does it have any triggers? */
      bool        hasoids;        /* does it have OIDs? */
      uint32        frozenxid;        /* for restore frozen xid */
+     Oid            toast_oid;        /* for restore toast frozen xid */
+     uint32        toast_frozenxid;/* for restore toast frozen xid */
      int            ncheck;            /* # of CHECK expressions */
      char       *reloftype;        /* underlying type for typed table */
      /* these two are set only if table is a sequence owned by a column: */
3c3
< postgres|654
---
> postgres|1457
6,7c6,7
< template0|654
< template1|654
---
> template0|1457
> template1|1457
11,19c11,19
< pg_toast|pg_toast_11454|654
< pg_toast|pg_toast_11459|654
< pg_toast|pg_toast_11464|654
< pg_toast|pg_toast_11469|654
< pg_toast|pg_toast_11474|654
< pg_toast|pg_toast_11479|654
< pg_toast|pg_toast_11484|654
< pg_toast|pg_toast_1255|654
< pg_toast|pg_toast_1262|654
---
> pg_toast|pg_toast_11511|1457
> pg_toast|pg_toast_11516|1457
> pg_toast|pg_toast_11521|1457
> pg_toast|pg_toast_11526|1457
> pg_toast|pg_toast_11531|1457
> pg_toast|pg_toast_11536|1457
> pg_toast|pg_toast_11541|1457
> pg_toast|pg_toast_1255|1457
> pg_toast|pg_toast_1262|1457
96,103c96,104
< pg_toast|pg_toast_2396|654
< pg_toast|pg_toast_2604|654
< pg_toast|pg_toast_2606|654
< pg_toast|pg_toast_2609|654
< pg_toast|pg_toast_2618|654
< pg_toast|pg_toast_2619|654
< pg_toast|pg_toast_2620|654
< pg_toast|pg_toast_2964|654
---
> pg_toast|pg_toast_2396|1457
> pg_toast|pg_toast_2604|1457
> pg_toast|pg_toast_2606|1457
> pg_toast|pg_toast_2609|1457
> pg_toast|pg_toast_2618|1457
> pg_toast|pg_toast_2619|1457
> pg_toast|pg_toast_2620|1457
> pg_toast|pg_toast_2964|1457
> pg_toast|pg_toast_3596|1457
279c280
< (268 rows)
---
> (269 rows)

Re: pg_upgrade bug found!

From
Josh Berkus
Date:
On 4/7/11 9:16 AM, Bruce Momjian wrote:
> OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
>> the two reported pg_upgrade problems he saw via IRC.

BTW, just for the release notes, RhodiumToad == Andrew Gierth.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Thu, 2011-04-07 at 12:16 -0400, Bruce Momjian wrote:
> Bruce Momjian wrote:
> > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
> > the two reported pg_upgrade problems he saw via IRC.  It seems toast
> > tables have xids and pg_dump is not preserving the toast relfrozenxids
> > as it should.  Heap tables have preserved relfrozenxids, but if you
> > update a heap row but don't change the toast value, and the old heap row
> > is later removed, the toast table can have an older relfrozenxids than
> > the heap table.
> > 
> > The fix for this is to have pg_dump preserve toast relfrozenxids, which
> > can be easily added and backpatched.  We might want to push a 9.0.4 for
> > this.  Second, we need to find a way for people to detect and fix
> > existing systems that have this problem, perhaps looming when the
> > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we
> > need to figure out how to get this information to users.  Perhaps the
> > communication comes through the 9.0.4 release announcement.
> 
> I am not sure how to interpret the lack of replies to this email. 
> Either it is confidence, shock, or we told you so.  ;-)
> 
> Anyway, the attached patch fixes the problem.  The fix is for pg_dump's
> binary upgrade mode.  This would need to be backpatched back to 8.4
> because pg_migrator needs this too.
> 
> I have added a personal regression test to show which
> pg_class.relfrozenxid values are not preserved, and with this patch the
> only ones not preserved are toast tables used by system tables, which
> are not copied from the old cluster (FirstNormalObjectId = 16384).  I am
> attaching that old/new pg_class.relfrozenxid diff as well.
> 
> Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> the toast tables work?

VACUUM FREEZE will never set the relfrozenxid backward. If it was never
preserved to begin with, I assume that the existing value could be
arbitrarily before or after, so it might not be updated.

I think that after you VACUUM FREEZE the toast table, then the real
oldest frozen xid (as opposed to the bad value in relfrozenxid for the
toast table) would have to be the same or newer than that of the heap.
Right? That means you could safely copy the heap's relfrozenxid to the
relfrozenxid of its toast table.

> I perhaps could create a short DO block that
> would vacuum freeze just toast tables;  it would have to be run in every
> database.

Well, that won't work, because VACUUM can't be executed in a transaction
block or function.

Regards,Jeff Davis



Re: pg_upgrade bug found!

From
Noah Misch
Date:
On Thu, Apr 07, 2011 at 12:16:55PM -0400, Bruce Momjian wrote:
> Bruce Momjian wrote:
> > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
> > the two reported pg_upgrade problems he saw via IRC.  It seems toast
> > tables have xids and pg_dump is not preserving the toast relfrozenxids
> > as it should.  Heap tables have preserved relfrozenxids, but if you
> > update a heap row but don't change the toast value, and the old heap row
> > is later removed, the toast table can have an older relfrozenxids than
> > the heap table.
> > 
> > The fix for this is to have pg_dump preserve toast relfrozenxids, which
> > can be easily added and backpatched.  We might want to push a 9.0.4 for
> > this.  Second, we need to find a way for people to detect and fix
> > existing systems that have this problem, perhaps looming when the
> > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we
> > need to figure out how to get this information to users.  Perhaps the
> > communication comes through the 9.0.4 release announcement.
> 
> I am not sure how to interpret the lack of replies to this email. 
> Either it is confidence, shock, or we told you so.  ;-)

Your explanation and patch make sense.  Seems all too clear in retrospect.

> Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> the toast tables work?  I perhaps could create a short DO block that
> would vacuum freeze just toast tables;  it would have to be run in every
> database.

I see three cases:

1) The pg_class.relfrozenxid that the TOAST table should have received ("true
relfrozenxid") is still covered by available clog files.  Fixable with some
combination of pg_class.relfrozenxid twiddling and "SET vacuum_freeze_table_age
= 0; VACUUM toasttbl".

2) The true relfrozenxid is no longer covered by available clog files.  The fix
for case 1 will get "file "foo" doesn't exist, reading as zeroes" log messages,
and we will treat all transactions as uncommitted.  Not generally fixable after
that has happened.  We could probably provide a recipe for checking whether it
could have happened given access to a backup from just before the upgrade.

3) Enough transaction xids have elapsed such that the true relfrozenxid is again
covered by clog files, but those records are unrelated to the original
transactions.  Actually, I don't think this can happen, even with the maximum
autovacuum_freeze_max_age.

I haven't tested those, so I'm sure there's some error in that assessment.

nm


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> > I have added a personal regression test to show which
> > pg_class.relfrozenxid values are not preserved, and with this patch the
> > only ones not preserved are toast tables used by system tables, which
> > are not copied from the old cluster (FirstNormalObjectId = 16384).  I am
> > attaching that old/new pg_class.relfrozenxid diff as well.
> > 
> > Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> > the toast tables work?
> 
> VACUUM FREEZE will never set the relfrozenxid backward. If it was never
> preserved to begin with, I assume that the existing value could be
> arbitrarily before or after, so it might not be updated.
> 
> I think that after you VACUUM FREEZE the toast table, then the real
> oldest frozen xid (as opposed to the bad value in relfrozenxid for the
> toast table) would have to be the same or newer than that of the heap.
> Right? That means you could safely copy the heap's relfrozenxid to the
> relfrozenxid of its toast table.

OK, so the only other idea I have is to write some pretty complicated
query function that does a sequential scan of each toast table and pulls
the earliest xmin/xmax from the tables and use that to set the
relfrozenxid (pretty complicated because it has to deal with the freeze
horizon and wraparound).

> > I perhaps could create a short DO block that
> > would vacuum freeze just toast tables;  it would have to be run in every
> > database.
> 
> Well, that won't work, because VACUUM can't be executed in a transaction
> block or function.

Good point.

The only bright part of this is that missing clog will throw an error so
we are not returning incorrect data, and hopefully people will report
problems to us when it happens.

Ideally I would like to get this patch and correction code out into the
field in case more people run into this problem.  I know some will, I
just don't know how many.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Robert Haas
Date:
On Thu, Apr 7, 2011 at 3:46 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Jeff Davis wrote:
>> > I have added a personal regression test to show which
>> > pg_class.relfrozenxid values are not preserved, and with this patch the
>> > only ones not preserved are toast tables used by system tables, which
>> > are not copied from the old cluster (FirstNormalObjectId = 16384).  I am
>> > attaching that old/new pg_class.relfrozenxid diff as well.
>> >
>> > Any idea how to correct existing systems?  Would VACUUM FREEZE of just
>> > the toast tables work?
>>
>> VACUUM FREEZE will never set the relfrozenxid backward. If it was never
>> preserved to begin with, I assume that the existing value could be
>> arbitrarily before or after, so it might not be updated.
>>
>> I think that after you VACUUM FREEZE the toast table, then the real
>> oldest frozen xid (as opposed to the bad value in relfrozenxid for the
>> toast table) would have to be the same or newer than that of the heap.
>> Right? That means you could safely copy the heap's relfrozenxid to the
>> relfrozenxid of its toast table.
>
> OK, so the only other idea I have is to write some pretty complicated
> query function that does a sequential scan of each toast table and pulls
> the earliest xmin/xmax from the tables and use that to set the
> relfrozenxid (pretty complicated because it has to deal with the freeze
> horizon and wraparound).
>
>> > I perhaps could create a short DO block that
>> > would vacuum freeze just toast tables;  it would have to be run in every
>> > database.
>>
>> Well, that won't work, because VACUUM can't be executed in a transaction
>> block or function.
>
> Good point.
>
> The only bright part of this is that missing clog will throw an error so
> we are not returning incorrect data, and hopefully people will report
> problems to us when it happens.
>
> Ideally I would like to get this patch and correction code out into the
> field in case more people run into this problem.  I know some will, I
> just don't know how many.

ISTM we need to force a minor release once we are sure this has been
corrected.  We had also probably put out an announcement warning
people that have already used pg_upgrade of possible data corruption.
I'm not sure exactly what the language around that should be, but this
does seem pretty bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade bug found!

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> ISTM we need to force a minor release once we are sure this has
> been corrected.  We had also probably put out an announcement
> warning people that have already used pg_upgrade of possible data
> corruption. I'm not sure exactly what the language around that
> should be, but this does seem pretty bad.
We just used this to upgrade all of our databases to 9.0.  Most of
those (particularly the databases where data originates) have VACUUM
FREEZE ANALYZE run nightly, and we ran this against all databases
right after each pg_upgrade.  Will that have offered us some
protection from this bug?
-Kevin


Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Thu, 2011-04-07 at 15:46 -0400, Bruce Momjian wrote:
> OK, so the only other idea I have is to write some pretty complicated
> query function that does a sequential scan of each toast table and pulls
> the earliest xmin/xmax from the tables and use that to set the
> relfrozenxid (pretty complicated because it has to deal with the freeze
> horizon and wraparound).

That sounds like the correct way to fix the situation, although it's a
little more work to install another function just for this one-time
purpose. TransactionIdPrecedes() should already account for wraparound,
so I don't think that it will be too complicated (make sure to read
every tuple though, not just the ones currently visible).

Stepping back a second to make sure I understand the problem: the only
problem is that relfrozenxid on the toast table after an upgrade is
wrong. Correct?

Regards,Jeff Davis




Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Thu, 2011-04-07 at 15:46 -0400, Bruce Momjian wrote:
> > OK, so the only other idea I have is to write some pretty complicated
> > query function that does a sequential scan of each toast table and pulls
> > the earliest xmin/xmax from the tables and use that to set the
> > relfrozenxid (pretty complicated because it has to deal with the freeze
> > horizon and wraparound).
> 
> That sounds like the correct way to fix the situation, although it's a
> little more work to install another function just for this one-time
> purpose. TransactionIdPrecedes() should already account for wraparound,
> so I don't think that it will be too complicated (make sure to read
> every tuple though, not just the ones currently visible).

I want to avoid anything that requires a compile because they are hard
for many sites to install so TransactionIdPrecedes() is out.  We will
need to do this in PL/pgSQL probably.

> Stepping back a second to make sure I understand the problem: the only
> problem is that relfrozenxid on the toast table after an upgrade is
> wrong. Correct?

Yes, it was not restored from the old cluster.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Thu, 2011-04-07 at 15:46 -0400, Bruce Momjian wrote:
> > OK, so the only other idea I have is to write some pretty complicated
> > query function that does a sequential scan of each toast table and pulls
> > the earliest xmin/xmax from the tables and use that to set the
> > relfrozenxid (pretty complicated because it has to deal with the freeze
> > horizon and wraparound).
> 
> That sounds like the correct way to fix the situation, although it's a
> little more work to install another function just for this one-time
> purpose. TransactionIdPrecedes() should already account for wraparound,
> so I don't think that it will be too complicated (make sure to read
> every tuple though, not just the ones currently visible).
> 
> Stepping back a second to make sure I understand the problem: the only
> problem is that relfrozenxid on the toast table after an upgrade is
> wrong. Correct?

One minimal solution might be to set the toast relfozenxid to match the
heap frozenxid?  Ideas?  It is not 100% accurate but it might help.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Robert Haas wrote:
> >> Well, that won't work, because VACUUM can't be executed in a transaction
> >> block or function.
> >
> > Good point.
> >
> > The only bright part of this is that missing clog will throw an error so
> > we are not returning incorrect data, and hopefully people will report
> > problems to us when it happens.
> >
> > Ideally I would like to get this patch and correction code out into the
> > field in case more people run into this problem. ?I know some will, I
> > just don't know how many.
> 
> ISTM we need to force a minor release once we are sure this has been
> corrected.  We had also probably put out an announcement warning
> people that have already used pg_upgrade of possible data corruption.
> I'm not sure exactly what the language around that should be, but this
> does seem pretty bad.

Yep, "pretty bad" it is.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Robert Haas wrote:
> > >> Well, that won't work, because VACUUM can't be executed in a transaction
> > >> block or function.
> > >
> > > Good point.
> > >
> > > The only bright part of this is that missing clog will throw an error so
> > > we are not returning incorrect data, and hopefully people will report
> > > problems to us when it happens.
> > >
> > > Ideally I would like to get this patch and correction code out into the
> > > field in case more people run into this problem. ?I know some will, I
> > > just don't know how many.
> > 
> > ISTM we need to force a minor release once we are sure this has been
> > corrected.  We had also probably put out an announcement warning
> > people that have already used pg_upgrade of possible data corruption.
> > I'm not sure exactly what the language around that should be, but this
> > does seem pretty bad.
> 
> Yep, "pretty bad" it is.

The bug exists because I did not realize that the toast relfrozenxid is
tracked independently of the heap, until the IRC report diagnosis.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Thu, 2011-04-07 at 17:06 -0400, Bruce Momjian wrote:
> I want to avoid anything that requires a compile because they are hard
> for many sites to install so TransactionIdPrecedes() is out.  We will
> need to do this in PL/pgSQL probably.

PL/pgSQL can't see dead rows, so that would not be correct. It's
guaranteed to be the same value you see from the heap or newer; because
if it's not visible in the heap, it's not going to be visible in the
toast table.

Regards,Jeff Davis



Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Thu, 2011-04-07 at 17:06 -0400, Bruce Momjian wrote:
> > I want to avoid anything that requires a compile because they are hard
> > for many sites to install so TransactionIdPrecedes() is out.  We will
> > need to do this in PL/pgSQL probably.
> 
> PL/pgSQL can't see dead rows, so that would not be correct. It's
> guaranteed to be the same value you see from the heap or newer; because
> if it's not visible in the heap, it's not going to be visible in the
> toast table.

Well, frankly all we need to do is set those hint bits before the clog
gets remove, so maybe just a SELECT * would do the trick!  That and
maybe set the relfrozenxid to match the heap.

It is there now or more people would be reporting problems.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Jeff Davis wrote:
> > On Thu, 2011-04-07 at 17:06 -0400, Bruce Momjian wrote:
> > > I want to avoid anything that requires a compile because they are hard
> > > for many sites to install so TransactionIdPrecedes() is out.  We will
> > > need to do this in PL/pgSQL probably.
> > 
> > PL/pgSQL can't see dead rows, so that would not be correct. It's
> > guaranteed to be the same value you see from the heap or newer; because
> > if it's not visible in the heap, it's not going to be visible in the
> > toast table.
> 
> Well, frankly all we need to do is set those hint bits before the clog
> gets remove, so maybe just a SELECT * would do the trick!  That and
> maybe set the relfrozenxid to match the heap.
> 
> It is there now or more people would be reporting problems.

Clarification, "the clog" is there now or more people would be reporting
problems.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
"Kevin Grittner"
Date:
Bruce Momjian <bruce@momjian.us> wrote:
> all we need to do is set those hint bits before the clog gets
> remove, so maybe just a SELECT * would do the trick!
Does that mean that those experiencing the problem are failing to do
the vacuumdb run which is recommended in the pg_upgrade instructions?
-Kevin


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Kevin Grittner wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
>  
> > all we need to do is set those hint bits before the clog gets
> > remove, so maybe just a SELECT * would do the trick!
>  
> Does that mean that those experiencing the problem are failing to do
> the vacuumdb run which is recommended in the pg_upgrade instructions?

You know, I looked at that, but I don't think that is going to save me. 
:-(   It says:
Upgrade complete----------------| Optimizer statistics are not transferred by pg_upgrade| so consider running:|
vacuumdb--all --analyze-only| on the newly-upgraded cluster.| Running this script will delete the old cluster's data
files:|      /usr/var/local/pgdev/pgfoundry/pg_migrator/pg_migrator/delete_old_cluster.sh
 

We recommend 'vacuumdb --all --analyze-only' which I assume only samples
random pages and does not set all the hint bits.  In fact, you can't
even analyze TOAST tables:
test=> ANALYZE pg_toast.pg_toast_3596;WARNING:  skipping "pg_toast_3596" --- cannot analyze non-tables orspecial system
tablesANALYZE

but you can SELECT from them:
 chunk_id | chunk_seq | chunk_data----------+-----------+------------(0 rows)

Also, if we force VACUUM FREEZE on the toast tables we would have no
need to advance their relfrozenxids because all the xids would be fixed.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote: 
> > Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> > the toast tables work?
> 
> VACUUM FREEZE will never set the relfrozenxid backward. If it was never
> preserved to begin with, I assume that the existing value could be
> arbitrarily before or after, so it might not be updated.

Now that I understand the problem a little better, I think VACUUM FREEZE
might work, after all.

Originally, I thought that the toast table's relfrozenxid could be some
arbitrarily wrong value. But actually, the CREATE TABLE is issued after
the xid of the new cluster has already been advanced to the xid of the
old cluster, so it should be a "somewhat reasonable" value.

That means that VACUUM FREEZE of the toast table, if there are no
concurrent transactions, will freeze all of the tuples; and the
newFrozenXid should always be seen as newer than the existing (and
wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and
everything should be fine. Right?

Regards,Jeff Davis



Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote: 
> > > Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> > > the toast tables work?
> > 
> > VACUUM FREEZE will never set the relfrozenxid backward. If it was never
> > preserved to begin with, I assume that the existing value could be
> > arbitrarily before or after, so it might not be updated.
> 
> Now that I understand the problem a little better, I think VACUUM FREEZE
> might work, after all.

Good.  I don't want to be inventing something complex if I can avoid it.
Simple is good, espeically if admins panic.  I would rather simple and
longer than short but complex  :-)

> Originally, I thought that the toast table's relfrozenxid could be some
> arbitrarily wrong value. But actually, the CREATE TABLE is issued after
> the xid of the new cluster has already been advanced to the xid of the
> old cluster, so it should be a "somewhat reasonable" value.

Yes, it will be reasonable.

> That means that VACUUM FREEZE of the toast table, if there are no
> concurrent transactions, will freeze all of the tuples; and the
> newFrozenXid should always be seen as newer than the existing (and
> wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and
> everything should be fine. Right?

Right.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Robert Haas
Date:
On Thu, Apr 7, 2011 at 5:52 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Jeff Davis wrote:
>> On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote:
>> > > Any idea how to correct existing systems?  Would VACUUM FREEZE of just
>> > > the toast tables work?
>> >
>> > VACUUM FREEZE will never set the relfrozenxid backward. If it was never
>> > preserved to begin with, I assume that the existing value could be
>> > arbitrarily before or after, so it might not be updated.
>>
>> Now that I understand the problem a little better, I think VACUUM FREEZE
>> might work, after all.
>
> Good.  I don't want to be inventing something complex if I can avoid it.
> Simple is good, espeically if admins panic.  I would rather simple and
> longer than short but complex  :-)
>
>> Originally, I thought that the toast table's relfrozenxid could be some
>> arbitrarily wrong value. But actually, the CREATE TABLE is issued after
>> the xid of the new cluster has already been advanced to the xid of the
>> old cluster, so it should be a "somewhat reasonable" value.
>
> Yes, it will be reasonable.
>
>> That means that VACUUM FREEZE of the toast table, if there are no
>> concurrent transactions, will freeze all of the tuples; and the
>> newFrozenXid should always be seen as newer than the existing (and
>> wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and
>> everything should be fine. Right?
>
> Right.

This depends on how soon after the upgrade VACUUM FREEZE is run,
doesn't it?  If the XID counter has advanced too far...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Thu, Apr 7, 2011 at 5:52 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Jeff Davis wrote:
> >> On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote:
> >> > > Any idea how to correct existing systems? ?Would VACUUM FREEZE of just
> >> > > the toast tables work?
> >> >
> >> > VACUUM FREEZE will never set the relfrozenxid backward. If it was never
> >> > preserved to begin with, I assume that the existing value could be
> >> > arbitrarily before or after, so it might not be updated.
> >>
> >> Now that I understand the problem a little better, I think VACUUM FREEZE
> >> might work, after all.
> >
> > Good. ?I don't want to be inventing something complex if I can avoid it.
> > Simple is good, espeically if admins panic. ?I would rather simple and
> > longer than short but complex ?:-)
> >
> >> Originally, I thought that the toast table's relfrozenxid could be some
> >> arbitrarily wrong value. But actually, the CREATE TABLE is issued after
> >> the xid of the new cluster has already been advanced to the xid of the
> >> old cluster, so it should be a "somewhat reasonable" value.
> >
> > Yes, it will be reasonable.
> >
> >> That means that VACUUM FREEZE of the toast table, if there are no
> >> concurrent transactions, will freeze all of the tuples; and the
> >> newFrozenXid should always be seen as newer than the existing (and
> >> wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and
> >> everything should be fine. Right?
> >
> > Right.
> 
> This depends on how soon after the upgrade VACUUM FREEZE is run,
> doesn't it?  If the XID counter has advanced too far...

Well, I assume VACUUM FREEZE is going to sequential scan the table and
replace every xid.  If the clog is gone, well, we have problems.  I
think the IRC reporter pulled the clog files from a backup.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > > Yes, it will be reasonable.
> > >
> > >> That means that VACUUM FREEZE of the toast table, if there are no
> > >> concurrent transactions, will freeze all of the tuples; and the
> > >> newFrozenXid should always be seen as newer than the existing (and
> > >> wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and
> > >> everything should be fine. Right?
> > >
> > > Right.
> > 
> > This depends on how soon after the upgrade VACUUM FREEZE is run,
> > doesn't it?  If the XID counter has advanced too far...
> 
> Well, I assume VACUUM FREEZE is going to sequential scan the table and
> replace every xid.  If the clog is gone, well, we have problems.  I
> think the IRC reporter pulled the clog files from a backup.

So I think we have four possible approaches to correct databases:
1) SELECT * to set the hint bits2) VACUUM to set the hint bits3) VACUUM FREEZE to remove the old xids4) some
complicatedfunction
 

I don't like #4, and I think I can script #2 and #3 in psql by using COPY
to create a VACUUM script and then run it with \i.  #1 is easy in a DO
block with PL/pgSQL.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Thu, 2011-04-07 at 20:14 -0400, Bruce Momjian wrote:
> So I think we have four possible approaches to correct databases:
> 
>     1) SELECT * to set the hint bits
>     2) VACUUM to set the hint bits
>     3) VACUUM FREEZE to remove the old xids
>     4) some complicated function
> 
> I don't like #4, and I think I can script #2 and #3 in psql by using COPY
> to create a VACUUM script and then run it with \i.  #1 is easy in a DO
> block with PL/pgSQL.

The only one that sounds very reasonable to me is #3. If there are any
xids older than the relfrozenxid, we need to get rid of them. If there
is some reason that doesn't work, I suppose we can consider the
alternatives. But I don't like the hint-bit-setting approach much.

What if the xmax is really a transaction that got an exclusive lock on
the tuple, rather than actually deleting it? Are you sure that a SELECT
(or even a normal VACUUM) would get rid of that xid, or might something
still try to look it up in the clog later?

Not only that, but hint-bit-setting is not WAL-logged, so you'd really
have to do a checkpoint afterward.

Regards,Jeff Davis



Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Thu, 2011-04-07 at 20:14 -0400, Bruce Momjian wrote:
> > So I think we have four possible approaches to correct databases:
> > 
> >     1) SELECT * to set the hint bits
> >     2) VACUUM to set the hint bits
> >     3) VACUUM FREEZE to remove the old xids
> >     4) some complicated function
> > 
> > I don't like #4, and I think I can script #2 and #3 in psql by using COPY
> > to create a VACUUM script and then run it with \i.  #1 is easy in a DO
> > block with PL/pgSQL.
> 
> The only one that sounds very reasonable to me is #3. If there are any
> xids older than the relfrozenxid, we need to get rid of them. If there
> is some reason that doesn't work, I suppose we can consider the
> alternatives. But I don't like the hint-bit-setting approach much.
> 
> What if the xmax is really a transaction that got an exclusive lock on
> the tuple, rather than actually deleting it? Are you sure that a SELECT
> (or even a normal VACUUM) would get rid of that xid, or might something
> still try to look it up in the clog later?
> 
> Not only that, but hint-bit-setting is not WAL-logged, so you'd really
> have to do a checkpoint afterward.

Glad you said that!  Here is a script which does what we want:
-- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that were upgraded by pg_upgrade and pg_migrator.--
Runthe script using psql for every database in the cluster, -- except 'template0', e.g.--     psql -f pg_upgrade_fix
dbname--It will not lock any tables but will generate I/O.--SET vacuum_freeze_min_age = 0;SET vacuum_freeze_table_age =
0;CREATETEMPORARY TABLE pg_upgrade_fix AS    SELECT 'VACUUM FREEZE pg_toast.' ||         quote_ident(relname) || ';'
FROM     pg_class c, pg_namespace n     WHERE     c.relnamespace = n.oid AND         n.nspname = 'pg_toast' AND
c.relkind= 't';\copy pg_upgrade_fix TO 'pg_upgrade_fix.sql';\i pg_upgrade_fix.sqlDROP TABLE pg_upgrade_fix;
 

Looks pretty simple to copy/paste and use.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Noah Misch wrote:
> On Thu, Apr 07, 2011 at 12:16:55PM -0400, Bruce Momjian wrote:
> > Bruce Momjian wrote:
> > > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
> > > the two reported pg_upgrade problems he saw via IRC.  It seems toast
> > > tables have xids and pg_dump is not preserving the toast relfrozenxids
> > > as it should.  Heap tables have preserved relfrozenxids, but if you
> > > update a heap row but don't change the toast value, and the old heap row
> > > is later removed, the toast table can have an older relfrozenxids than
> > > the heap table.
> > > 
> > > The fix for this is to have pg_dump preserve toast relfrozenxids, which
> > > can be easily added and backpatched.  We might want to push a 9.0.4 for
> > > this.  Second, we need to find a way for people to detect and fix
> > > existing systems that have this problem, perhaps looming when the
> > > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we
> > > need to figure out how to get this information to users.  Perhaps the
> > > communication comes through the 9.0.4 release announcement.
> > 
> > I am not sure how to interpret the lack of replies to this email. 
> > Either it is confidence, shock, or we told you so.  ;-)
> 
> Your explanation and patch make sense.  Seems all too clear in retrospect.

Yeah, like "duh" for me.

> > Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> > the toast tables work?  I perhaps could create a short DO block that
> > would vacuum freeze just toast tables;  it would have to be run in every
> > database.
> 
> I see three cases:
> 
> 1) The pg_class.relfrozenxid that the TOAST table should have received
> ("true relfrozenxid") is still covered by available clog files.  Fixable
> with some combination of pg_class.relfrozenxid twiddling and "SET
> vacuum_freeze_table_age = 0; VACUUM toasttbl".

Right, VACUUM FREEZE.  I now see I don't need to set
vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has:
   if (n->options & VACOPT_FREEZE)n->freeze_min_age = n->freeze_table_age = 0;

> 2) The true relfrozenxid is no longer covered by available clog files.
> The fix for case 1 will get "file "foo" doesn't exist, reading as
> zeroes" log messages, and we will treat all transactions as uncommitted.

Uh, are you sure?  I think it would return an error message about a
missing clog file for the query;  here is a report of a case not related
to pg_upgrade:
http://archives.postgresql.org/pgsql-admin/2010-09/msg00109.php

> Not generally fixable after that has happened.  We could probably
> provide a recipe for checking whether it could have happened given
> access to a backup from just before the upgrade.

The IRC folks pulled the clog files off of backups.

> 3) Enough transaction xids have elapsed such that the true relfrozenxid
> is again covered by clog files, but those records are unrelated to the
> original transactions.  Actually, I don't think this can happen, even
> with the maximum autovacuum_freeze_max_age.

Yes, I don't think that can happen either.

One concern I have is that existing heap tables are protecting clog
files, but once those are frozen, the system might remove clog files not
realizing it has to freeze the heap tables too.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Noah Misch
Date:
On Thu, Apr 07, 2011 at 10:21:06PM -0400, Bruce Momjian wrote:
> Noah Misch wrote:
> > 1) The pg_class.relfrozenxid that the TOAST table should have received
> > ("true relfrozenxid") is still covered by available clog files.  Fixable
> > with some combination of pg_class.relfrozenxid twiddling and "SET
> > vacuum_freeze_table_age = 0; VACUUM toasttbl".
> 
> Right, VACUUM FREEZE.  I now see I don't need to set
> vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has:
> 
>     if (n->options & VACOPT_FREEZE)
>     n->freeze_min_age = n->freeze_table_age = 0;

True; it just performs more work than strictly necessary.  We don't actually
need earlier-than-usual freezing.  We need only ensure that the relfrozenxid
will guide future VACUUMs to do that freezing early enough.  However, I'm not
sure how to do that without directly updating relfrozenxid, so it's probably
just as well to cause some extra work and stick to the standard interface.

> > 2) The true relfrozenxid is no longer covered by available clog files.
> > The fix for case 1 will get "file "foo" doesn't exist, reading as
> > zeroes" log messages, and we will treat all transactions as uncommitted.
> 
> Uh, are you sure?  I think it would return an error message about a
> missing clog file for the query;  here is a report of a case not related
> to pg_upgrade:
> 
>     http://archives.postgresql.org/pgsql-admin/2010-09/msg00109.php

My statement was indeed incorrect.  (Was looking at the "reading as zeroes"
message in slru.c, but it only applies during recovery.)

> > Not generally fixable after that has happened.  We could probably
> > provide a recipe for checking whether it could have happened given
> > access to a backup from just before the upgrade.
> 
> The IRC folks pulled the clog files off of backups.

Since we do get the error after all, that should always be enough.

> One concern I have is that existing heap tables are protecting clog
> files, but once those are frozen, the system might remove clog files not
> realizing it has to freeze the heap tables too.

Yes.  On a similar note, would it be worth having your prototype fixup script
sort the VACUUM FREEZE calls in descending relfrozenxid order?

Thanks,
nm


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Noah Misch wrote:
> On Thu, Apr 07, 2011 at 10:21:06PM -0400, Bruce Momjian wrote:
> > Noah Misch wrote:
> > > 1) The pg_class.relfrozenxid that the TOAST table should have received
> > > ("true relfrozenxid") is still covered by available clog files.  Fixable
> > > with some combination of pg_class.relfrozenxid twiddling and "SET
> > > vacuum_freeze_table_age = 0; VACUUM toasttbl".
> > 
> > Right, VACUUM FREEZE.  I now see I don't need to set
> > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has:
> > 
> >     if (n->options & VACOPT_FREEZE)
> >     n->freeze_min_age = n->freeze_table_age = 0;
> 
> True; it just performs more work than strictly necessary.  We don't actually
> need earlier-than-usual freezing.  We need only ensure that the relfrozenxid
> will guide future VACUUMs to do that freezing early enough.  However, I'm not
> sure how to do that without directly updating relfrozenxid, so it's probably
> just as well to cause some extra work and stick to the standard interface.

Looking at the code, it seems VACUUM FREEZE will freeze any row it can,
and because we restarted the cluster after the upgrade, all the rows we
care about are visible or dead to all transactions.  pg_upgrade
certainly relies on that fact already.

> > > 2) The true relfrozenxid is no longer covered by available clog files.
> > > The fix for case 1 will get "file "foo" doesn't exist, reading as
> > > zeroes" log messages, and we will treat all transactions as uncommitted.
> > 
> > Uh, are you sure?  I think it would return an error message about a
> > missing clog file for the query;  here is a report of a case not related
> > to pg_upgrade:
> > 
> >     http://archives.postgresql.org/pgsql-admin/2010-09/msg00109.php
> 
> My statement was indeed incorrect.  (Was looking at the "reading as zeroes"
> message in slru.c, but it only applies during recovery.)

No problem.  Thanks for the review.

> > > Not generally fixable after that has happened.  We could probably
> > > provide a recipe for checking whether it could have happened given
> > > access to a backup from just before the upgrade.
> > 
> > The IRC folks pulled the clog files off of backups.
> 
> Since we do get the error after all, that should always be enough.

That was my thought too.

> > One concern I have is that existing heap tables are protecting clog
> > files, but once those are frozen, the system might remove clog files not
> > realizing it has to freeze the heap tables too.
> 
> Yes.  On a similar note, would it be worth having your prototype fixup script
> sort the VACUUM FREEZE calls in descending relfrozenxid order?

Good question.  I don't think the relfrozenxid ordering would make sense
because I think it is unlikely problems will happen while they are
running the script.  The other problem is that because of wraparound it
is unclear which xid is earliest.  What I did add was to order by the
oid, so at least the toast numbers are in order and people can more
easily track its progress.

New version;  I made some other small adjustments:
-- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.--
Runthe script using psql for every database in the cluster-- except 'template0', e.g.:--     psql -U postgres -a -f
pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate
I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS    SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'    FROM
pg_classc, pg_namespace n     WHERE     c.relnamespace = n.oid AND         n.nspname = 'pg_toast' AND        c.relkind
='t'    ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql
 


--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Thu, 2011-04-07 at 22:21 -0400, Bruce Momjian wrote:
> One concern I have is that existing heap tables are protecting clog
> files, but once those are frozen, the system might remove clog files not
> realizing it has to freeze the heap tables too.

I don't understand. Can you elaborate?

Regards,Jeff Davis




Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Fri, 2011-04-08 at 07:08 -0400, Noah Misch wrote:
> > Right, VACUUM FREEZE.  I now see I don't need to set
> > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has:
> > 
> >     if (n->options & VACOPT_FREEZE)
> >     n->freeze_min_age = n->freeze_table_age = 0;
> 
> True; it just performs more work than strictly necessary.  We don't actually
> need earlier-than-usual freezing.  We need only ensure that the relfrozenxid
> will guide future VACUUMs to do that freezing early enough.  However, I'm not
> sure how to do that without directly updating relfrozenxid, so it's probably
> just as well to cause some extra work and stick to the standard interface.

If there are tuples in a toast table containing xids that are older than
the toast table's relfrozenxid, then there are only two options:

1. Make relfrozenxid go backward to the right value. There is currently
no mechanism to do this without compiling C code into the server,
because (a) VACUUM FREEZE will never move the relfrozenxid backward; and
(b) there is no way to find the oldest xid in a table with a normal
snapshot.

2. Get rid of those xids older than relfrozenxid (i.e. VACUUM FREEZE). 

I don't know what you mean about VACUUM FREEZE doing extra work. I
suppose you could set the vacuum_freeze_min_age to be exactly the right
value such that it freezes everything before the existing (and wrong)
relfrozenxid, but in practice I think it would be the same amount of
work.

Regards,Jeff Davis



Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
> > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
> > the two reported pg_upgrade problems he saw via IRC.  It seems toast
> > tables have xids and pg_dump is not preserving the toast relfrozenxids
> > as it should.  Heap tables have preserved relfrozenxids, but if you
> > update a heap row but don't change the toast value, and the old heap row
> > is later removed, the toast table can have an older relfrozenxids than
> > the heap table.
> >
> > The fix for this is to have pg_dump preserve toast relfrozenxids, which
> > can be easily added and backpatched.  We might want to push a 9.0.4 for
> > this.  Second, we need to find a way for people to detect and fix
> > existing systems that have this problem, perhaps looming when the
> > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we
> > need to figure out how to get this information to users.  Perhaps the
> > communication comes through the 9.0.4 release announcement.
>
> I am not sure how to interpret the lack of replies to this email.
> Either it is confidence, shock, or we told you so.  ;-)
>
> Anyway, the attached patch fixes the problem.  The fix is for pg_dump's
> binary upgrade mode.  This would need to be backpatched back to 8.4
> because pg_migrator needs this too.

OK, I have applied the attached three patches to 8.4, 9.0, and 9.1.
They are all slightly different because of code drift, and I created a
unified diff which I find is clearer for single-line changes.

I was very careful about the patching of queries because many of these
queries are only activated when dumping an older database, and are
therefore hard to test for SQL query errors.  I included all the version
patches in case someone sees something I missed.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 3842895..c5057f7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3185,6 +3185,8 @@ getTables(int *numTables)
     int            i_relhasrules;
     int            i_relhasoids;
     int            i_relfrozenxid;
+    int            i_toastoid;
+    int            i_toastfrozenxid;
     int            i_owning_tab;
     int            i_owning_col;
     int            i_reltablespace;
@@ -3226,7 +3228,8 @@ getTables(int *numTables)
                           "(%s c.relowner) AS rolname, "
                           "c.relchecks, c.relhastriggers, "
                           "c.relhasindex, c.relhasrules, c.relhasoids, "
-                          "c.relfrozenxid, "
+                          "c.relfrozenxid, tc.oid AS toid, "
+                          "tc.relfrozenxid AS tfrozenxid, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
@@ -3259,6 +3262,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
@@ -3290,6 +3295,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
@@ -3321,6 +3328,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
                           "NULL AS reltablespace, "
@@ -3348,6 +3357,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
                           "NULL AS reltablespace, "
@@ -3370,6 +3381,8 @@ getTables(int *numTables)
                           "relhasindex, relhasrules, "
                           "'t'::bool AS relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
                           "NULL AS reltablespace, "
@@ -3446,6 +3459,8 @@ getTables(int *numTables)
     i_relhasrules = PQfnumber(res, "relhasrules");
     i_relhasoids = PQfnumber(res, "relhasoids");
     i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+    i_toastoid = PQfnumber(res, "toid");
+    i_toastfrozenxid = PQfnumber(res, "tfrozenxid");
     i_owning_tab = PQfnumber(res, "owning_tab");
     i_owning_col = PQfnumber(res, "owning_col");
     i_reltablespace = PQfnumber(res, "reltablespace");
@@ -3484,6 +3499,8 @@ getTables(int *numTables)
         tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
         tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0);
         tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid));
+        tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid));
+        tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid));
         tblinfo[i].ncheck = atoi(PQgetvalue(res, i, i_relchecks));
         if (PQgetisnull(res, i, i_owning_tab))
         {
@@ -10044,13 +10061,23 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
                 }
             }

-            appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid.\n");
+            appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n");
             appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
                               "SET relfrozenxid = '%u'\n"
                               "WHERE oid = ",
                               tbinfo->frozenxid);
             appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
             appendPQExpBuffer(q, "::pg_catalog.regclass;\n");
+
+            if (tbinfo->toast_oid)
+            {
+                /* We preserve the toast oids, so we can use it during restore */
+                appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n");
+                appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
+                                  "SET relfrozenxid = '%u'\n"
+                                  "WHERE oid = '%u';\n",
+                                  tbinfo->toast_frozenxid, tbinfo->toast_oid);
+            }
         }

         /* Loop dumping statistics and storage statements */
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 3339bf7..390b20c 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -227,6 +227,8 @@ typedef struct _tableInfo
     bool        hastriggers;    /* does it have any triggers? */
     bool        hasoids;        /* does it have OIDs? */
     uint32        frozenxid;        /* for restore frozen xid */
+    Oid            toast_oid;        /* for restore toast frozen xid */
+    uint32        toast_frozenxid;/* for restore toast frozen xid */
     int            ncheck;            /* # of CHECK expressions */
     /* these two are set only if table is a sequence owned by a column: */
     Oid            owning_tab;        /* OID of table owning sequence */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f93affd..8721e65 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3409,6 +3409,8 @@ getTables(int *numTables)
     int            i_relhasrules;
     int            i_relhasoids;
     int            i_relfrozenxid;
+    int            i_toastoid;
+    int            i_toastfrozenxid;
     int            i_owning_tab;
     int            i_owning_col;
     int            i_reltablespace;
@@ -3451,7 +3453,8 @@ getTables(int *numTables)
                           "(%s c.relowner) AS rolname, "
                           "c.relchecks, c.relhastriggers, "
                           "c.relhasindex, c.relhasrules, c.relhasoids, "
-                          "c.relfrozenxid, "
+                          "c.relfrozenxid, tc.oid AS toid, "
+                          "tc.relfrozenxid AS tfrozenxid, "
                           "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype,
"
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3484,7 +3487,8 @@ getTables(int *numTables)
                           "(%s c.relowner) AS rolname, "
                           "c.relchecks, c.relhastriggers, "
                           "c.relhasindex, c.relhasrules, c.relhasoids, "
-                          "c.relfrozenxid, "
+                          "c.relfrozenxid, tc.oid AS toid, "
+                          "tc.relfrozenxid AS tfrozenxid, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3518,6 +3522,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3550,6 +3556,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3582,6 +3590,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3610,6 +3620,8 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL AS reloftype, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
@@ -3633,6 +3645,8 @@ getTables(int *numTables)
                           "relhasindex, relhasrules, "
                           "'t'::bool AS relhasoids, "
                           "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL AS reloftype, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
@@ -3666,6 +3680,8 @@ getTables(int *numTables)
                           "relhasindex, relhasrules, "
                           "'t'::bool AS relhasoids, "
                           "0 as relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
                           "NULL AS reloftype, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
@@ -3711,6 +3727,8 @@ getTables(int *numTables)
     i_relhasrules = PQfnumber(res, "relhasrules");
     i_relhasoids = PQfnumber(res, "relhasoids");
     i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+    i_toastoid = PQfnumber(res, "toid");
+    i_toastfrozenxid = PQfnumber(res, "tfrozenxid");
     i_owning_tab = PQfnumber(res, "owning_tab");
     i_owning_col = PQfnumber(res, "owning_col");
     i_reltablespace = PQfnumber(res, "reltablespace");
@@ -3750,6 +3768,8 @@ getTables(int *numTables)
         tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
         tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0);
         tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid));
+        tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid));
+        tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid));
         if (PQgetisnull(res, i, i_reloftype))
             tblinfo[i].reloftype = NULL;
         else
@@ -10852,13 +10872,23 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
                 }
             }

-            appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid.\n");
+            appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n");
             appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
                               "SET relfrozenxid = '%u'\n"
                               "WHERE oid = ",
                               tbinfo->frozenxid);
             appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
             appendPQExpBuffer(q, "::pg_catalog.regclass;\n");
+
+            if (tbinfo->toast_oid)
+            {
+                /* We preserve the toast oids, so we can use it during restore */
+                appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n");
+                appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
+                                  "SET relfrozenxid = '%u'\n"
+                                  "WHERE oid = '%u';\n",
+                                  tbinfo->toast_frozenxid, tbinfo->toast_oid);
+            }
         }

         /* Loop dumping statistics and storage statements */
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index c309f69..1dc7157 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -228,6 +228,8 @@ typedef struct _tableInfo
     bool        hastriggers;    /* does it have any triggers? */
     bool        hasoids;        /* does it have OIDs? */
     uint32        frozenxid;        /* for restore frozen xid */
+    Oid            toast_oid;        /* for restore toast frozen xid */
+    uint32        toast_frozenxid;/* for restore toast frozen xid */
     int            ncheck;            /* # of CHECK expressions */
     char       *reloftype;        /* underlying type for typed table */
     /* these two are set only if table is a sequence owned by a column: */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 3f6e77b..1ccdb4d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3812,6 +3812,8 @@ getTables(int *numTables)
     int            i_relhasrules;
     int            i_relhasoids;
     int            i_relfrozenxid;
+    int            i_toastoid;
+    int            i_toastfrozenxid;
     int            i_relpersistence;
     int            i_owning_tab;
     int            i_owning_col;
@@ -3855,7 +3857,9 @@ getTables(int *numTables)
                           "(%s c.relowner) AS rolname, "
                           "c.relchecks, c.relhastriggers, "
                           "c.relhasindex, c.relhasrules, c.relhasoids, "
-                          "c.relfrozenxid, c.relpersistence, "
+                          "c.relfrozenxid, tc.oid AS toid, "
+                          "tc.relfrozenxid AS tfrozenxid, "
+                          "c.relpersistence, "
                           "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype,
"
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3889,7 +3893,9 @@ getTables(int *numTables)
                           "(%s c.relowner) AS rolname, "
                           "c.relchecks, c.relhastriggers, "
                           "c.relhasindex, c.relhasrules, c.relhasoids, "
-                          "c.relfrozenxid, 'p' AS relpersistence, "
+                          "c.relfrozenxid, tc.oid AS toid, "
+                          "tc.relfrozenxid AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype,
"
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3922,7 +3928,9 @@ getTables(int *numTables)
                           "(%s c.relowner) AS rolname, "
                           "c.relchecks, c.relhastriggers, "
                           "c.relhasindex, c.relhasrules, c.relhasoids, "
-                          "c.relfrozenxid, 'p' AS relpersistence, "
+                          "c.relfrozenxid, tc.oid AS toid, "
+                          "tc.relfrozenxid AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3955,7 +3963,10 @@ getTables(int *numTables)
                           "(%s relowner) AS rolname, "
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
-                          "relfrozenxid, 'p' AS relpersistence, "
+                          "relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -3987,7 +3998,10 @@ getTables(int *numTables)
                           "(%s relowner) AS rolname, "
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
-                          "0 AS relfrozenxid, 'p' AS relpersistence, "
+                          "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -4019,7 +4033,10 @@ getTables(int *numTables)
                           "(%s relowner) AS rolname, "
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
-                          "0 AS relfrozenxid, 'p' AS relpersistence, "
+                          "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "NULL AS reloftype, "
                           "d.refobjid AS owning_tab, "
                           "d.refobjsubid AS owning_col, "
@@ -4047,7 +4064,10 @@ getTables(int *numTables)
                           "(%s relowner) AS rolname, "
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, relhasoids, "
-                          "0 AS relfrozenxid, 'p' AS relpersistence, "
+                          "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "NULL AS reloftype, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
@@ -4070,7 +4090,10 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, "
                           "'t'::bool AS relhasoids, "
-                          "0 AS relfrozenxid, 'p' AS relpersistence, "
+                          "0 AS relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "NULL AS reloftype, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
@@ -4103,7 +4126,10 @@ getTables(int *numTables)
                           "relchecks, (reltriggers <> 0) AS relhastriggers, "
                           "relhasindex, relhasrules, "
                           "'t'::bool AS relhasoids, "
-                          "0 as relfrozenxid, 'p' AS relpersistence, "
+                          "0 as relfrozenxid, "
+                          "0 AS toid, "
+                          "0 AS tfrozenxid, "
+                          "'p' AS relpersistence, "
                           "NULL AS reloftype, "
                           "NULL::oid AS owning_tab, "
                           "NULL::int4 AS owning_col, "
@@ -4149,6 +4175,8 @@ getTables(int *numTables)
     i_relhasrules = PQfnumber(res, "relhasrules");
     i_relhasoids = PQfnumber(res, "relhasoids");
     i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+    i_toastoid = PQfnumber(res, "toid");
+    i_toastfrozenxid = PQfnumber(res, "tfrozenxid");
     i_relpersistence = PQfnumber(res, "relpersistence");
     i_owning_tab = PQfnumber(res, "owning_tab");
     i_owning_col = PQfnumber(res, "owning_col");
@@ -4190,6 +4218,8 @@ getTables(int *numTables)
         tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
         tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0);
         tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid));
+        tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid));
+        tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid));
         if (PQgetisnull(res, i, i_reloftype))
             tblinfo[i].reloftype = NULL;
         else
@@ -12221,13 +12251,23 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
                 }
             }

-            appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid\n");
+            appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n");
             appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
                               "SET relfrozenxid = '%u'\n"
                               "WHERE oid = ",
                               tbinfo->frozenxid);
             appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
             appendPQExpBuffer(q, "::pg_catalog.regclass;\n");
+
+            if (tbinfo->toast_oid)
+            {
+                /* We preserve the toast oids, so we can use it during restore */
+                appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n");
+                appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
+                                  "SET relfrozenxid = '%u'\n"
+                                  "WHERE oid = '%u';\n",
+                                  tbinfo->toast_frozenxid, tbinfo->toast_oid);
+            }
         }

         /* Loop dumping statistics and storage statements */
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 113ecb1..6559e23 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -248,6 +248,8 @@ typedef struct _tableInfo
     bool        hastriggers;    /* does it have any triggers? */
     bool        hasoids;        /* does it have OIDs? */
     uint32        frozenxid;        /* for restore frozen xid */
+    Oid            toast_oid;        /* for restore toast frozen xid */
+    uint32        toast_frozenxid;/* for restore toast frozen xid */
     int            ncheck;            /* # of CHECK expressions */
     char       *reloftype;        /* underlying type for typed table */
     /* these two are set only if table is a sequence owned by a column: */

Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Thu, 2011-04-07 at 22:21 -0400, Bruce Momjian wrote:
> > One concern I have is that existing heap tables are protecting clog
> > files, but once those are frozen, the system might remove clog files not
> > realizing it has to freeze the heap tables too.
> 
> I don't understand. Can you elaborate?

Well, when you initially run pg_upgrade, your heap relfrozenxid is
preserved, and we only remove clog files when _all_ relations in all
database do not need them, so for a time the heap tables will keep the
clogs around.  Over time, the heap files will be vacuum frozen, and
their relfrozenxid advanced.  Once that happens to all heaps, the system
thinks it can remove clog files, and doesn't realize the toast tables
also need vacuuming.  This is the "it might become more of a problem in
the future" concern I have.  The script I posted does fix this, and the
code changes prevent it from happening completely.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Fri, 2011-04-08 at 07:08 -0400, Noah Misch wrote:
> > > Right, VACUUM FREEZE.  I now see I don't need to set
> > > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has:
> > > 
> > >     if (n->options & VACOPT_FREEZE)
> > >     n->freeze_min_age = n->freeze_table_age = 0;
> > 
> > True; it just performs more work than strictly necessary.  We don't actually
> > need earlier-than-usual freezing.  We need only ensure that the relfrozenxid
> > will guide future VACUUMs to do that freezing early enough.  However, I'm not
> > sure how to do that without directly updating relfrozenxid, so it's probably
> > just as well to cause some extra work and stick to the standard interface.
> 
> If there are tuples in a toast table containing xids that are older than
> the toast table's relfrozenxid, then there are only two options:
> 
> 1. Make relfrozenxid go backward to the right value. There is currently
> no mechanism to do this without compiling C code into the server,
> because (a) VACUUM FREEZE will never move the relfrozenxid backward; and
> (b) there is no way to find the oldest xid in a table with a normal
> snapshot.

Right, this is all to complicated.

> 2. Get rid of those xids older than relfrozenxid (i.e. VACUUM FREEZE). 
> 
> I don't know what you mean about VACUUM FREEZE doing extra work. I
> suppose you could set the vacuum_freeze_min_age to be exactly the right
> value such that it freezes everything before the existing (and wrong)
> relfrozenxid, but in practice I think it would be the same amount of
> work.

We don't know how far back to go with freezing, so we just have to
freeze it all.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Noah Misch
Date:
On Fri, Apr 08, 2011 at 10:05:01AM -0700, Jeff Davis wrote:
> On Fri, 2011-04-08 at 07:08 -0400, Noah Misch wrote:
> > > Right, VACUUM FREEZE.  I now see I don't need to set
> > > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has:
> > > 
> > >     if (n->options & VACOPT_FREEZE)
> > >     n->freeze_min_age = n->freeze_table_age = 0;
> > 
> > True; it just performs more work than strictly necessary.  We don't actually
> > need earlier-than-usual freezing.  We need only ensure that the relfrozenxid
> > will guide future VACUUMs to do that freezing early enough.  However, I'm not
> > sure how to do that without directly updating relfrozenxid, so it's probably
> > just as well to cause some extra work and stick to the standard interface.
> 
> If there are tuples in a toast table containing xids that are older than
> the toast table's relfrozenxid, then there are only two options:
> 
> 1. Make relfrozenxid go backward to the right value. There is currently
> no mechanism to do this without compiling C code into the server,
> because (a) VACUUM FREEZE will never move the relfrozenxid backward; and
> (b) there is no way to find the oldest xid in a table with a normal
> snapshot.

Couldn't you set relfrozenxid and datfrozenxid to txid_current() - 1100000000
(the highest possible vacuum_freeze_min_age, plus some slop), then run "SET
vacuum_freeze_table_age = 0; VACUUM tbl" on all tables for which you did this?
There's no need to set relfrozenxid back to a particular "right" value.  Not
suggesting we recommend this, but I can't think offhand why it wouldn't suffice.

> 2. Get rid of those xids older than relfrozenxid (i.e. VACUUM FREEZE). 
>
> I don't know what you mean about VACUUM FREEZE doing extra work. I
> suppose you could set the vacuum_freeze_min_age to be exactly the right
> value such that it freezes everything before the existing (and wrong)
> relfrozenxid, but in practice I think it would be the same amount of
> work.

Suppose that your next xid at pg_upgrade time was 500M, and it's now 505M.  If
you're using the default vacuum_freeze_min_age = 50M, "SET
vacuum_freeze_table_age = 0; VACUUM tbl" will only freeze tuples covering 5M
transaction ids.  "VACUUM FREEZE tbl" (a.k.a "SET vacuum_freeze_table_age = 0;
SET vacuum_freeze_min_age = 0; VACUUM tbl") will freeze tuples covering 55M
transaction ids.

nm


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> New version;  I made some other small adjustments:
> 
>     -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8
>     -- servers that was upgraded by pg_upgrade and pg_migrator.
>     -- Run the script using psql for every database in the cluster
>     -- except 'template0', e.g.:
>     --     psql -U postgres -a -f pg_upgrade_fix.sql dbname
>     -- This must be run from a writable directory.
>     -- It will not lock any tables but will generate I/O.
>     --
>     CREATE TEMPORARY TABLE pg_upgrade_fix AS
>         SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'
>         FROM     pg_class c, pg_namespace n 
>         WHERE     c.relnamespace = n.oid AND 
>             n.nspname = 'pg_toast' AND
>             c.relkind = 't'
>         ORDER by c.oid;
>     \copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';
>     \i pg_upgrade_tmp.sql

OK, now that I have committed the fixes to git, I think it is time to
consider how we are going to handle this fix for people who have already
used pg_upgrade, or are using it in currently released versions.

I am thinking an announce list email with this query would be enough,
and state that we are planning a minor release with this fix in the
next few weeks.  I can provide details on the cause and behavior of the
bug.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> OK, now that I have committed the fixes to git, I think it is time to
> consider how we are going to handle this fix for people who have already
> used pg_upgrade, or are using it in currently released versions.
> 
> I am thinking an announce list email with this query would be enough,
> and state that we are planning a minor release with this fix in the
> next few weeks.  I can provide details on the cause and behavior of the
> bug.

OK, here is a draft email announcement:

---------------------------------------------------------------------------

A bug has been discovered in all released Postgres versions that
performed major version upgrades using pg_upgrade and (formerly)
pg_migrator.  The bug can cause queries to return the following
error:
ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or
directory
 

This error prevents access to very wide values stored in the database. 
To prevent such failures, users are encourage to run the following
psql script in all upgraded databases as soon as possible;  a fix will be
included in upcoming Postgres releases 8.4.8 and 9.0.4:
-- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.--
Runthe script using psql for every database in the cluster-- except 'template0', e.g.:--     psql -U postgres -a -f
pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate
I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS    SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'    FROM
pg_classc, pg_namespace n     WHERE     c.relnamespace = n.oid AND         n.nspname = 'pg_toast' AND        c.relkind
='t'    ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> OK, here is a draft email announcement:

Couple suggestions (also on IRC):

> ---------------------------------------------------------------------------

A bug has been discovered in all released versions of pg_upgrade and
(formerly) pg_migrator.  Anyone who has used pg_upgrade or pg_migrator
should take the following corrective actions as soon as possible.

This bug can cause queries to return the following error:
ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or
directory 

This error prevents access to very wide values stored in the database.
To prevent such failures users need to run the following psql script,
as the superuser, in all upgraded databases as soon as possible:
-- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.--
Runthe script using psql for every database in the cluster-- except 'template0', e.g.:--     psql -U postgres -a -f
pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate
I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS    SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'    FROM
pg_classc, pg_namespace n     WHERE     c.relnamespace = n.oid AND         n.nspname = 'pg_toast' AND        c.relkind
='t'    ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql 

A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
The fixed version of pg_uprade will remove the need for the above script
by correctly updating the TOAST tables in the migrated database.
Thanks,
    Stephen

Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> Bruce,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > OK, here is a draft email announcement:
> 
> Couple suggestions (also on IRC):

Yes, I like your version better;  I did adjust the wording of the last
sentence to mention it is really the release, not the new pg_upgrade,
which fixes the problem because the fixes are in pg_dump, and hence a
new pg_upgrade binary will not work;  you need a new install.

---------------------------------------------------------------------------

A bug has been discovered in all released versions of pg_upgrade and
(formerly) pg_migrator.  Anyone who has used pg_upgrade or pg_migrator
should take the following corrective actions as soon as possible.

This bug can cause queries to return the following error:
ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or
directory=20

This error prevents access to very wide values stored in the database.
To prevent such failures users need to run the following psql script,
as the superuser, in all upgraded databases as soon as possible:
-- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.--
Runthe script using psql for every database in the cluster-- except 'template0', e.g.:--     psql -U postgres -a -f
pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate
I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS    SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'    FROM
pg_classc, pg_namespace n=20    WHERE     c.relnamespace =3D n.oid AND=20        n.nspname =3D 'pg_toast' AND
c.relkind=3D 't'    ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql
 

A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
These releases will remove the need for the above script by correctly
updating all TOAST tables in the migrated databases.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Stephen Frost
Date:
* Bruce Momjian (bruce@momjian.us) wrote:
> Yes, I like your version better;  I did adjust the wording of the last
> sentence to mention it is really the release, not the new pg_upgrade,
> which fixes the problem because the fixes are in pg_dump, and hence a
> new pg_upgrade binary will not work;  you need a new install.

Err, right, good point.  You might even want to call that out
specifically, so no one is confused.  Also this:

>     -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8
>     -- servers that was upgraded by pg_upgrade and pg_migrator.

'that was' should be 'that were'.

> A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
> These releases will remove the need for the above script by correctly
> updating all TOAST tables in the migrated databases.

My suggestion:

A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
These releases will include an updated pg_dump which will remove the
need for the above script by correctly dumping all TOAST tables in the
migrated databases.
Thanks,
    Stephen

Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Fri, 2011-04-08 at 13:35 -0400, Noah Misch wrote:
> > 1. Make relfrozenxid go backward to the right value. There is currently
> > no mechanism to do this without compiling C code into the server,
> > because (a) VACUUM FREEZE will never move the relfrozenxid backward; and
> > (b) there is no way to find the oldest xid in a table with a normal
> > snapshot.
> 
> Couldn't you set relfrozenxid and datfrozenxid to txid_current() - 1100000000
> (the highest possible vacuum_freeze_min_age, plus some slop), then run "SET
> vacuum_freeze_table_age = 0; VACUUM tbl" on all tables for which you did this?
> There's no need to set relfrozenxid back to a particular "right" value. 

That's a good point that we don't need relfrozenxid to really be the
right value; we just need it to be less than or equal to the right
value. I don't think you need to mess around with
vacuum_freeze_table_age though -- that looks like it's taken care of in
the logic for deciding when to do a full table vacuum.

This has the additional merit that transaction IDs are not needlessly
removed; therefore leaving some forensic information if there are
further problems.

> 
> Suppose that your next xid at pg_upgrade time was 500M, and it's now 505M.  If
> you're using the default vacuum_freeze_min_age = 50M, "SET
> vacuum_freeze_table_age = 0; VACUUM tbl" will only freeze tuples covering 5M
> transaction ids.

If the pg_upgrade time was at txid 500M, then the relfrozenxid of the
toast table will be about 500M. That means you need to get rid of all
xids less than about 500M (unless you already fixed relfrozenxid,
perhaps using the process you mention above).

So if you only freeze tuples less than about 455M (505M - 50M), then
that is wrong.

The only difference really is that you don't really need to freeze those
last 5M transactions since the upgrade happened.

Regards,Jeff Davis



Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (bruce@momjian.us) wrote:
> > Yes, I like your version better;  I did adjust the wording of the last
> > sentence to mention it is really the release, not the new pg_upgrade,
> > which fixes the problem because the fixes are in pg_dump, and hence a
> > new pg_upgrade binary will not work;  you need a new install.
> 
> Err, right, good point.  You might even want to call that out
> specifically, so no one is confused.  Also this:
> 
> >     -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8
> >     -- servers that was upgraded by pg_upgrade and pg_migrator.
> 
> 'that was' should be 'that were'.
> 
> > A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
> > These releases will remove the need for the above script by correctly
> > updating all TOAST tables in the migrated databases.
> 
> My suggestion:
> 
> A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
> These releases will include an updated pg_dump which will remove the
> need for the above script by correctly dumping all TOAST tables in the
> migrated databases.

I am worried if I mention pg_dump that people will think pg_dump is
broken, when in fact it is only the --binary-upgrade mode of pg_dump
that is broken.

I adjusted the wording of the last paragraph slighly to be clearer, but
hopefully not confuse.

We don't actually check the pg_dump version and I am hesistant to add
such a check.

I was thinking of sending this out on Monday, but now think people might
like to have the weekend to fix this so I am thinking of sending it to
announce tonight, in 8 hours.  OK?

---------------------------------------------------------------------------

A bug has been discovered in all released versions of pg_upgrade and
(formerly) pg_migrator.  Anyone who has used pg_upgrade or pg_migrator
should take the following corrective actions as soon as possible.

This bug can cause queries to return the following error:
ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or
directory=20

This error prevents access to very wide values stored in the database.
To prevent such failures users need to run the following psql script,
as the superuser, in all upgraded databases as soon as possible:
-- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.--
Runthe script using psql for every database in the cluster-- except 'template0', e.g.:--     psql -U postgres -a -f
pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate
I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS    SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'    FROM
pg_classc, pg_namespace n=20    WHERE     c.relnamespace =3D n.oid AND=20        n.nspname =3D 'pg_toast' AND
c.relkind=3D 't'    ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql
 

A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
These releases will remove the need for the above script by correctly
dumping all TOAST tables in the migrated databases.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> I am worried if I mention pg_dump that people will think pg_dump is
> broken, when in fact it is only the --binary-upgrade mode of pg_dump
> that is broken.
> 
> I adjusted the wording of the last paragraph slighly to be clearer, but
> hopefully not confuse.
> 
> We don't actually check the pg_dump version and I am hesistant to add
> such a check.
> 
> I was thinking of sending this out on Monday, but now think people might
> like to have the weekend to fix this so I am thinking of sending it to
> announce tonight, in 8 hours.  OK?

Updated version with IRC user suggestions:

---------------------------------------------------------------------------
       Critical Fix for pg_upgrade/pg_migrator Users       ---------------------------------------------

A bug has been discovered in all released versions of pg_upgrade and
(formerly) pg_migrator.  Anyone who has used pg_upgrade or pg_migrator
should take the following corrective actions as soon as possible.

This bug can cause queries to return the following error:
ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or
directory=20

This error prevents access to very wide values stored in the database.
To prevent such failures users need to run the following psql script,
as the superuser, in all upgraded databases as soon as possible:
-- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that were upgraded by pg_upgrade and pg_migrator.--
Runthe script using psql for every database in the cluster-- except 'template0', e.g.:--     psql -U postgres -a -f
pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate
I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS    SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'    FROM
pg_classc, pg_namespace n    WHERE     c.relnamespace = n.oid AND        n.nspname = 'pg_toast' AND        c.relkind =
't'   ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql
 

A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
These releases will remove the need for the above script by correctly
restoring all TOAST tables in the migrated databases.

2011-04-08

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Noah Misch
Date:
On Fri, Apr 08, 2011 at 12:16:50PM -0700, Jeff Davis wrote:
> On Fri, 2011-04-08 at 13:35 -0400, Noah Misch wrote:
> > > 1. Make relfrozenxid go backward to the right value. There is currently
> > > no mechanism to do this without compiling C code into the server,
> > > because (a) VACUUM FREEZE will never move the relfrozenxid backward; and
> > > (b) there is no way to find the oldest xid in a table with a normal
> > > snapshot.
> > 
> > Couldn't you set relfrozenxid and datfrozenxid to txid_current() - 1100000000
> > (the highest possible vacuum_freeze_min_age, plus some slop), then run "SET
> > vacuum_freeze_table_age = 0; VACUUM tbl" on all tables for which you did this?
> > There's no need to set relfrozenxid back to a particular "right" value. 
> 
> That's a good point that we don't need relfrozenxid to really be the
> right value; we just need it to be less than or equal to the right
> value. I don't think you need to mess around with
> vacuum_freeze_table_age though -- that looks like it's taken care of in
> the logic for deciding when to do a full table vacuum.

Actually, I think the only reason to VACUUM at all after hacking relfrozenxid is
to visit every tuple and see whether you need to restore any clog segments from
backup.  Suppose your postgresql.conf overrides vacuum_freeze_table_age to the
maximum of 2B.  If you hacked relfrozenxid and just VACUUMed without modifying
vacuum_freeze_table_age, you wouldn't get a full table scan.  In another ~1B
transactions, you'll get that full-table VACUUM, and it might then discover
missing clog segments.  Though you wouldn't risk any new clog loss in the mean
time, by doing the VACUUM with vacuum_freeze_table_age=0 on each affected table,
you can go away confident that any clog restoration is behind you.

> This has the additional merit that transaction IDs are not needlessly
> removed; therefore leaving some forensic information if there are
> further problems.
> 
> > 
> > Suppose that your next xid at pg_upgrade time was 500M, and it's now 505M.  If
> > you're using the default vacuum_freeze_min_age = 50M, "SET
> > vacuum_freeze_table_age = 0; VACUUM tbl" will only freeze tuples covering 5M
> > transaction ids.
> 
> If the pg_upgrade time was at txid 500M, then the relfrozenxid of the
> toast table will be about 500M. That means you need to get rid of all
> xids less than about 500M (unless you already fixed relfrozenxid,
> perhaps using the process you mention above).
> 
> So if you only freeze tuples less than about 455M (505M - 50M), then
> that is wrong.

Agreed.  If you don't fix relfrozenxid, you can't win much in that example.

> 
> The only difference really is that you don't really need to freeze those
> last 5M transactions since the upgrade happened.

But change the numbers somewhat.  Say you ran pg_upgrade at xid 110M.  Your
TOAST table had relfrozenxid = 100M before pg_upgrade and 110M+epsilon after.
The next xid now sits at 170M.  Without any manual relfrozenxid changes, any
full-table VACUUM will bump the relfrozenxid to 120M.  A VACUUM FREEZE would
freeze tuples covering 70M transactions, while a VACUUM with
vacuum_freeze_table_age = 0 would freeze tuples across only 20M transactions.
An unadorned VACUUM wouldn't even perform a full-table scan.

All that being said, recommending VACUUM FREEZE seems sensibly conservative.

Thanks,
nm


Re: pg_upgrade bug found!

From
Josh Berkus
Date:
>     -- It will not lock any tables but will generate I/O.

add:
IMPORTANT: Depending on the size and configuration of your database,
this script may generate a lot of I/O and degrade database performance.
Users should execute this script during a low traffic period and watch
the database load.

>     --
>     CREATE TEMPORARY TABLE pg_upgrade_fix AS
>         SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';'
>         FROM     pg_class c, pg_namespace n
>         WHERE     c.relnamespace = n.oid AND
>             n.nspname = 'pg_toast' AND
>             c.relkind = 't'
>         ORDER by c.oid;
>     \copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';
>     \i pg_upgrade_tmp.sql
> 
> A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
> These releases will remove the need for the above script by correctly
> restoring all TOAST tables in the migrated databases.

add: However, users of databases which have been already migrated still
need to run the script, even if they upgrade to 9.0.4.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: pg_upgrade bug found!

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> -- It will not lock any tables but will generate I/O.

> add:
> IMPORTANT: Depending on the size and configuration of your database,
> this script may generate a lot of I/O and degrade database performance.
> Users should execute this script during a low traffic period and watch
> the database load.

It might be worth suggesting that people can adjust their vacuum delay
parameters to control the I/O load.
        regards, tom lane


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> >> -- It will not lock any tables but will generate I/O.
> 
> > add:
> > IMPORTANT: Depending on the size and configuration of your database,
> > this script may generate a lot of I/O and degrade database performance.
> > Users should execute this script during a low traffic period and watch
> > the database load.
> 
> It might be worth suggesting that people can adjust their vacuum delay
> parameters to control the I/O load.

I talked to Tom about this and I am worried people will adjust it so it
takes days to complete.  Is that a valid concern?  Does anyone have a
suggested value?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> > >> -- It will not lock any tables but will generate I/O.
> > 
> > > add:
> > > IMPORTANT: Depending on the size and configuration of your database,
> > > this script may generate a lot of I/O and degrade database performance.
> > > Users should execute this script during a low traffic period and watch
> > > the database load.
> > 
> > It might be worth suggesting that people can adjust their vacuum delay
> > parameters to control the I/O load.
> 
> I talked to Tom about this and I am worried people will adjust it so it
> takes days to complete.  Is that a valid concern?  Does anyone have a
> suggested value?

Josh Berkus helped me get lots more details on a wiki page for this:
http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix

I will reference the wiki in the email announcement.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Jeff Davis
Date:
On Fri, 2011-04-08 at 15:03 -0400, Bruce Momjian wrote:
> A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
> These releases will remove the need for the above script by correctly
> updating all TOAST tables in the migrated databases.

You might want to clarify that the fix may be required if you ever used
pg_upgrade before. Using the new version of pg_upgrade/dump when you
still have a bad relfrozenxid doesn't help.

Regards,Jeff Davis




Re: pg_upgrade bug found!

From
bricklen
Date:
On Fri, Apr 8, 2011 at 4:00 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2011-04-08 at 15:03 -0400, Bruce Momjian wrote:
>> A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
>> These releases will remove the need for the above script by correctly
>> updating all TOAST tables in the migrated databases.
>
> You might want to clarify that the fix may be required if you ever used
> pg_upgrade before. Using the new version of pg_upgrade/dump when you
> still have a bad relfrozenxid doesn't help.
>
> Regards,
>        Jeff Davis
>

I've been noticing in my logs for the past few days the message you
note in the wiki. It seems to occur during a vacuum around 7:30am
every day. I will be running the suggested script shortly, but can
anyone tell me in how bad of shape my db is in? This is our production
db with two hot standby's running off it.

grep -i 'could not access status of transaction' postgresql-2011-04*.log
postgresql-2011-04-06.log:2011-04-06 07:28:27 PDT [15882]: [1-1]
(user=postgres) (rhost=[local]) ERROR:  could not access status of
transaction 1273385235
postgresql-2011-04-07.log:2011-04-07 07:27:14 PDT [29790]: [1-1]
(user=postgres) (rhost=[local]) ERROR:  could not access status of
transaction 1273385235
postgresql-2011-04-08.log:2011-04-08 07:26:35 PDT [2402]: [1-1]
(user=postgres) (rhost=[local]) ERROR:  could not access status of
transaction 1273385235


Re: pg_upgrade bug found!

From
bricklen
Date:
On Fri, Apr 8, 2011 at 4:51 PM, bricklen <bricklen@gmail.com> wrote:
> I've been noticing in my logs for the past few days the message you
> note in the wiki. It seems to occur during a vacuum around 7:30am
> every day. I will be running the suggested script shortly, but can
> anyone tell me in how bad of shape my db is in? This is our production
> db with two hot standby's running off it.
>
> grep -i 'could not access status of transaction' postgresql-2011-04*.log
> postgresql-2011-04-06.log:2011-04-06 07:28:27 PDT [15882]: [1-1]
> (user=postgres) (rhost=[local]) ERROR:  could not access status of
> transaction 1273385235
> postgresql-2011-04-07.log:2011-04-07 07:27:14 PDT [29790]: [1-1]
> (user=postgres) (rhost=[local]) ERROR:  could not access status of
> transaction 1273385235
> postgresql-2011-04-08.log:2011-04-08 07:26:35 PDT [2402]: [1-1]
> (user=postgres) (rhost=[local]) ERROR:  could not access status of
> transaction 1273385235

version 9.03, if that helps


Re: pg_upgrade bug found!

From
Stephen Frost
Date:
bricklen,

* bricklen (bricklen@gmail.com) wrote:
> I've been noticing in my logs for the past few days the message you
> note in the wiki. It seems to occur during a vacuum around 7:30am
> every day. I will be running the suggested script shortly, but can
> anyone tell me in how bad of shape my db is in? This is our production
> db with two hot standby's running off it.

Unfortunately, I don't think the script that Bruce posted will help if
the clog files have been removed (which appears to be the case here).
Do you have a backup which includes older files which existed under the
'pg_clog' directory under your database's root?  Hopefully you do and
can restore those and restart the database.  If you restore and then
restart then Bruce's script could be run and hopefully would clear out
these errors.

Bruce, please correct me if I got any part of this wrong.
Thanks,
    Stephen

Re: pg_upgrade bug found!

From
bricklen
Date:
Hi Stephen,

On Fri, Apr 8, 2011 at 6:57 PM, Stephen Frost <sfrost@snowman.net> wrote:
> bricklen,
>
> * bricklen (bricklen@gmail.com) wrote:
>> I've been noticing in my logs for the past few days the message you
>> note in the wiki. It seems to occur during a vacuum around 7:30am
>> every day. I will be running the suggested script shortly, but can
>> anyone tell me in how bad of shape my db is in? This is our production
>> db with two hot standby's running off it.
>
> Unfortunately, I don't think the script that Bruce posted will help if
> the clog files have been removed (which appears to be the case here).
> Do you have a backup which includes older files which existed under the
> 'pg_clog' directory under your database's root?  Hopefully you do and
> can restore those and restart the database.  If you restore and then
> restart then Bruce's script could be run and hopefully would clear out
> these errors.
>
> Bruce, please correct me if I got any part of this wrong.
>
>        Thanks,
>
>                Stephen

I looked deeper into our backup archives, and it appears that I do
have the clog file reference in the error message "DETAIL:  Could not
open file "pg_clog/04BE": No such file or directory."

It exists in an untouched backup directory that I originally made when
I set up the backup and ran pg_upgrade. I'm not sure if it is from
version 8.4 or 9.0.2 though. Is it safe to just copy it into my
production pg_clog dir and restart?


Re: pg_upgrade bug found!

From
Stephen Frost
Date:
bricklen,

* bricklen (bricklen@gmail.com) wrote:
> I looked deeper into our backup archives, and it appears that I do
> have the clog file reference in the error message "DETAIL:  Could not
> open file "pg_clog/04BE": No such file or directory."

Great!  And there's no file in pg_clog which matches that name (or
exist which are smaller in value), right?

> It exists in an untouched backup directory that I originally made when
> I set up the backup and ran pg_upgrade. I'm not sure if it is from
> version 8.4 or 9.0.2 though. Is it safe to just copy it into my
> production pg_clog dir and restart?

It should be, provided you're not overwriting any files or putting a
clog file in place which is greater than the other clog files in that
directory.
Thanks,
    Stephen

Re: pg_upgrade bug found!

From
bricklen
Date:
On Fri, Apr 8, 2011 at 7:11 PM, Stephen Frost <sfrost@snowman.net> wrote:
> bricklen,
>
> * bricklen (bricklen@gmail.com) wrote:
>> I looked deeper into our backup archives, and it appears that I do
>> have the clog file reference in the error message "DETAIL:  Could not
>> open file "pg_clog/04BE": No such file or directory."
>
> Great!  And there's no file in pg_clog which matches that name (or
> exist which are smaller in value), right?
>
>> It exists in an untouched backup directory that I originally made when
>> I set up the backup and ran pg_upgrade. I'm not sure if it is from
>> version 8.4 or 9.0.2 though. Is it safe to just copy it into my
>> production pg_clog dir and restart?
>
> It should be, provided you're not overwriting any files or putting a
> clog file in place which is greater than the other clog files in that
> directory.

It appears that there are no files lower.

Missing clog: 04BE

production pg_clog dir:
ls -lhrt 9.0/data/pg_clog
total 38M
-rw------- 1 postgres postgres 256K Jan 25 21:04 04BF
-rw------- 1 postgres postgres 256K Jan 26 12:35 04C0
-rw------- 1 postgres postgres 256K Jan 26 20:58 04C1
-rw------- 1 postgres postgres 256K Jan 27 13:02 04C2
-rw------- 1 postgres postgres 256K Jan 28 01:00 04C3
...

old backup pg_clog dir (possibly v8.4)
...
-rw------- 1 postgres postgres 256K Jan 23 21:11 04BB
-rw------- 1 postgres postgres 256K Jan 24 08:56 04BC
-rw------- 1 postgres postgres 256K Jan 25 06:32 04BD
-rw------- 1 postgres postgres 256K Jan 25 10:58 04BE
-rw------- 1 postgres postgres 256K Jan 25 20:44 04BF
-rw------- 1 postgres postgres 8.0K Jan 25 20:54 04C0


So, if I have this right, my steps to take are:
- copy the backup 04BE to production pg_clog dir
- restart the database
- run Bruce's script

Does that sound right? Has anyone else experienced this? I'm leery of
testing this on my production db, as our last pg_dump was from early
this morning, so I apologize for being so cautious.

Thanks,

Bricklen


Re: pg_upgrade bug found!

From
bricklen
Date:
On Fri, Apr 8, 2011 at 7:20 PM, bricklen <bricklen@gmail.com> wrote:
> On Fri, Apr 8, 2011 at 7:11 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> bricklen,
>>
>> * bricklen (bricklen@gmail.com) wrote:
>>> I looked deeper into our backup archives, and it appears that I do
>>> have the clog file reference in the error message "DETAIL:  Could not
>>> open file "pg_clog/04BE": No such file or directory."
>>
>> Great!  And there's no file in pg_clog which matches that name (or
>> exist which are smaller in value), right?
>>
>>> It exists in an untouched backup directory that I originally made when
>>> I set up the backup and ran pg_upgrade. I'm not sure if it is from
>>> version 8.4 or 9.0.2 though. Is it safe to just copy it into my
>>> production pg_clog dir and restart?
>>
>> It should be, provided you're not overwriting any files or putting a
>> clog file in place which is greater than the other clog files in that
>> directory.
>
> It appears that there are no files lower.
>
> Missing clog: 04BE
>
> production pg_clog dir:
> ls -lhrt 9.0/data/pg_clog
> total 38M
> -rw------- 1 postgres postgres 256K Jan 25 21:04 04BF
> -rw------- 1 postgres postgres 256K Jan 26 12:35 04C0
> -rw------- 1 postgres postgres 256K Jan 26 20:58 04C1
> -rw------- 1 postgres postgres 256K Jan 27 13:02 04C2
> -rw------- 1 postgres postgres 256K Jan 28 01:00 04C3
> ...
>
> old backup pg_clog dir (possibly v8.4)
> ...
> -rw------- 1 postgres postgres 256K Jan 23 21:11 04BB
> -rw------- 1 postgres postgres 256K Jan 24 08:56 04BC
> -rw------- 1 postgres postgres 256K Jan 25 06:32 04BD
> -rw------- 1 postgres postgres 256K Jan 25 10:58 04BE
> -rw------- 1 postgres postgres 256K Jan 25 20:44 04BF
> -rw------- 1 postgres postgres 8.0K Jan 25 20:54 04C0
>
>
> So, if I have this right, my steps to take are:
> - copy the backup 04BE to production pg_clog dir
> - restart the database
> - run Bruce's script
>
> Does that sound right? Has anyone else experienced this? I'm leery of
> testing this on my production db, as our last pg_dump was from early
> this morning, so I apologize for being so cautious.
>
> Thanks,
>
> Bricklen

What I've tested and current status:

When I saw the announcement a few hours ago, I started setting up a
9.0.3 hot standby. I brought it live a few minutes ago.
- I copied over the 04BE clog from the original backup,
- restarted the standby cluster
- ran the script against the main database
and turned up a bunch of other transactions that were missing:

psql:pg_upgrade_tmp.sql:539: ERROR:  could not access status of
transaction 1248683931
DETAIL:  Could not open file "pg_clog/04A6": No such file or directory.

psql:pg_upgrade_tmp.sql:540: ERROR:  could not access status of
transaction 1249010987
DETAIL:  Could not open file "pg_clog/04A7": No such file or directory.

psql:pg_upgrade_tmp.sql:541: ERROR:  could not access status of
transaction 1250325059
DETAIL:  Could not open file "pg_clog/04A8": No such file or directory.

psql:pg_upgrade_tmp.sql:542: ERROR:  could not access status of
transaction 1252759918
DETAIL:  Could not open file "pg_clog/04AA": No such file or directory.

psql:pg_upgrade_tmp.sql:543: ERROR:  could not access status of
transaction 1254527871
DETAIL:  Could not open file "pg_clog/04AC": No such file or directory.

psql:pg_upgrade_tmp.sql:544: ERROR:  could not access status of
transaction 1256193334
DETAIL:  Could not open file "pg_clog/04AD": No such file or directory.

psql:pg_upgrade_tmp.sql:556: ERROR:  could not access status of
transaction 1268739471
DETAIL:  Could not open file "pg_clog/04B9": No such file or directory.

I checked, and found that each one of those files exists in the
original backup location.

- scp'd those files to the hot standby clog directory,
- pg_ctl stop -m fast
- pg_ctl start
- ran the script

Hit a bunch of missing clog file errors like above, repeated the scp +
bounce + script process 4 or 5 more times until no more missing clog
file messages surfaced.

Now, is this safe to run against my production database?

**Those steps again, to run against prod:

cp the clog files from the original backup to dir to my production pg_clog dir
bounce the database
run the script against all database in the cluster

Anyone have any suggestions or changes before I commit myself to this
course of action?

Thanks,

Bricklen


Re: pg_upgrade bug found!

From
Stephen Frost
Date:
bricklen,

* bricklen (bricklen@gmail.com) wrote:
> Now, is this safe to run against my production database?

Yes, with a few caveats.  One recommendation is to also increase
autovacuum_freeze_max_age to 500000000 (500m), which will hopefully
prevent autovacuum from 'butting in' and causing issues during the
process.  Also, a database-wide 'VACUUM FREEZE;' should be lower-risk,
if you can afford it (it will cause a lot of i/o on the system).  The
per-table 'VACUUM FREEZE <table>;' that the script does can end up
removing clog files prematurely.

> Anyone have any suggestions or changes before I commit myself to this
> course of action?

If you run into problems, and perhaps even before starting, you may want
to pop in to #postgresql on irc.freenode.net, there are people there who
can help you with this process who are very familiar with PG.
Thanks,
    Stephen

Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> bricklen,
> 
> * bricklen (bricklen@gmail.com) wrote:
> > Now, is this safe to run against my production database?
> 
> Yes, with a few caveats.  One recommendation is to also increase
> autovacuum_freeze_max_age to 500000000 (500m), which will hopefully
> prevent autovacuum from 'butting in' and causing issues during the
> process.  Also, a database-wide 'VACUUM FREEZE;' should be lower-risk,
> if you can afford it (it will cause a lot of i/o on the system).  The
> per-table 'VACUUM FREEZE <table>;' that the script does can end up
> removing clog files prematurely.
> 
> > Anyone have any suggestions or changes before I commit myself to this
> > course of action?
> 
> If you run into problems, and perhaps even before starting, you may want
> to pop in to #postgresql on irc.freenode.net, there are people there who
> can help you with this process who are very familiar with PG.

Stephen is 100% correct and we have updated the wiki to explain recovery
details:
http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
bricklen
Date:
On Fri, Apr 8, 2011 at 8:07 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Stephen Frost wrote:
> -- Start of PGP signed section.
>> bricklen,
>>
>> * bricklen (bricklen@gmail.com) wrote:
>> > Now, is this safe to run against my production database?
>>
>> Yes, with a few caveats.  One recommendation is to also increase
>> autovacuum_freeze_max_age to 500000000 (500m), which will hopefully
>> prevent autovacuum from 'butting in' and causing issues during the
>> process.  Also, a database-wide 'VACUUM FREEZE;' should be lower-risk,
>> if you can afford it (it will cause a lot of i/o on the system).  The
>> per-table 'VACUUM FREEZE <table>;' that the script does can end up
>> removing clog files prematurely.
>>
>> > Anyone have any suggestions or changes before I commit myself to this
>> > course of action?
>>
>> If you run into problems, and perhaps even before starting, you may want
>> to pop in to #postgresql on irc.freenode.net, there are people there who
>> can help you with this process who are very familiar with PG.
>
> Stephen is 100% correct and we have updated the wiki to explain recovery
> details:
>
>        http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix
>

Thanks guys, I really appreciate your help. For the vacuum freeze, you
say database-wide, should I run vacuumdb -a -v -F ? Will freezing the
other tables in the cluster help (not sure how that works with
template0/1 databases?)


Re: pg_upgrade bug found!

From
Stephen Frost
Date:
bricklen,

* bricklen (bricklen@gmail.com) wrote:
> Thanks guys, I really appreciate your help. For the vacuum freeze, you
> say database-wide, should I run vacuumdb -a -v -F ? Will freezing the
> other tables in the cluster help (not sure how that works with
> template0/1 databases?)

Yes, using the command-line 'vacuumdb -a -v -F' would work.  It won't
try to vacuum template0, and doing template1 is correct.
Thanks,
    Stephen

Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
bricklen wrote:
> On Fri, Apr 8, 2011 at 8:07 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Stephen Frost wrote:
> > -- Start of PGP signed section.
> >> bricklen,
> >>
> >> * bricklen (bricklen@gmail.com) wrote:
> >> > Now, is this safe to run against my production database?
> >>
> >> Yes, with a few caveats. ?One recommendation is to also increase
> >> autovacuum_freeze_max_age to 500000000 (500m), which will hopefully
> >> prevent autovacuum from 'butting in' and causing issues during the
> >> process. ?Also, a database-wide 'VACUUM FREEZE;' should be lower-risk,
> >> if you can afford it (it will cause a lot of i/o on the system). ?The
> >> per-table 'VACUUM FREEZE <table>;' that the script does can end up
> >> removing clog files prematurely.
> >>
> >> > Anyone have any suggestions or changes before I commit myself to this
> >> > course of action?
> >>
> >> If you run into problems, and perhaps even before starting, you may want
> >> to pop in to #postgresql on irc.freenode.net, there are people there who
> >> can help you with this process who are very familiar with PG.
> >
> > Stephen is 100% correct and we have updated the wiki to explain recovery
> > details:
> >
> > ? ? ? ?http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix
> >
> 
> Thanks guys, I really appreciate your help. For the vacuum freeze, you
> say database-wide, should I run vacuumdb -a -v -F ? Will freezing the
> other tables in the cluster help (not sure how that works with
> template0/1 databases?)

Exactly.  Internally pg_upgrade uses:
             vacuumdb --all --freeze

in the empty new cluster to remove any links to pg_clog files (because
we copy the old pg_clog files into the new cluster directory).  (This is
proof that the old and new clog files are the same format.)  If you run
vacuumdb as above in the new cluster, it will again remove any
requirement on pg_clog, which is our goal.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Alvaro Herrera
Date:
Why is it important to have the original pg_clog files around?  Since
the transactions in question are below the freeze horizon, surely the
tuples that involve those transaction have all been visited by vacuum
and thus removed if they were leftover from aborted transactions or
deleted, no?  So you could just fill those files with the 0x55 pattern
(signalling "all transactions are committed") and the net result should
be the same.  No?

Forgive me if I'm missing something.  I haven't been following this
thread and I'm more than a little tired (but wanted to shoot this today
because I'm gonna be able to, until Monday).

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: pg_upgrade bug found!

From
bricklen
Date:
On Fri, Apr 8, 2011 at 9:28 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
> Why is it important to have the original pg_clog files around?  Since
> the transactions in question are below the freeze horizon, surely the
> tuples that involve those transaction have all been visited by vacuum
> and thus removed if they were leftover from aborted transactions or
> deleted, no?  So you could just fill those files with the 0x55 pattern
> (signalling "all transactions are committed") and the net result should
> be the same.  No?
>
> Forgive me if I'm missing something.  I haven't been following this
> thread and I'm more than a little tired (but wanted to shoot this today
> because I'm gonna be able to, until Monday).
>
> --
> Álvaro Herrera <alvherre@commandprompt.com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Update on the status of the steps we took, which were:
- test on a hot standby by bringing it live, running the script,
determing the missing clog files, copying them into the live (hot
standby) pg_clog dir

Now, on the master, copied the same old clog files into the production
*master*, ran vacuumdb -a -v -F. The step I should have taken on the
master before the vacuumdb -F would have been to run the
http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix script to see
if I was missing any clog files on the master.
That vacuum freeze step pointed out a clog file, I copied that into
the master pg_clog dir, ran the aforementioned script. It didn't fail
on any of the clog files this time, so now I am rerunning the vacuum
freeze command and hoping like hell it works!

If the current run of the vacuum freeze fails, I'll report back.

Thanks again for everyone's help.


Re: pg_upgrade bug found!

From
bricklen
Date:
On Fri, Apr 8, 2011 at 10:01 PM, bricklen <bricklen@gmail.com> wrote:
> Update on the status of the steps we took, which were:
> - test on a hot standby by bringing it live, running the script,
> determing the missing clog files, copying them into the live (hot
> standby) pg_clog dir
>
> Now, on the master, copied the same old clog files into the production
> *master*, ran vacuumdb -a -v -F. The step I should have taken on the
> master before the vacuumdb -F would have been to run the
> http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix script to see
> if I was missing any clog files on the master.
> That vacuum freeze step pointed out a clog file, I copied that into
> the master pg_clog dir, ran the aforementioned script. It didn't fail
> on any of the clog files this time, so now I am rerunning the vacuum
> freeze command and hoping like hell it works!
>
> If the current run of the vacuum freeze fails, I'll report back.
>
> Thanks again for everyone's help.
>

The vacuumdb -a -v F completed successfully this time.

Cheers!


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
bricklen wrote:
> On Fri, Apr 8, 2011 at 10:01 PM, bricklen <bricklen@gmail.com> wrote:
> > Update on the status of the steps we took, which were:
> > - test on a hot standby by bringing it live, running the script,
> > determing the missing clog files, copying them into the live (hot
> > standby) pg_clog dir
> >
> > Now, on the master, copied the same old clog files into the production
> > *master*, ran vacuumdb -a -v -F. The step I should have taken on the
> > master before the vacuumdb -F would have been to run the
> > http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix script to see
> > if I was missing any clog files on the master.
> > That vacuum freeze step pointed out a clog file, I copied that into
> > the master pg_clog dir, ran the aforementioned script. It didn't fail
> > on any of the clog files this time, so now I am rerunning the vacuum
> > freeze command and hoping like hell it works!
> >
> > If the current run of the vacuum freeze fails, I'll report back.
> >
> > Thanks again for everyone's help.
> >
> 
> The vacuumdb -a -v F completed successfully this time.

YEA!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> 
> Why is it important to have the original pg_clog files around?  Since
> the transactions in question are below the freeze horizon, surely the
> tuples that involve those transaction have all been visited by vacuum
> and thus removed if they were leftover from aborted transactions or
> deleted, no?  So you could just fill those files with the 0x55 pattern
> (signalling "all transactions are committed") and the net result should
> be the same.  No?
> 
> Forgive me if I'm missing something.  I haven't been following this
> thread and I'm more than a little tired (but wanted to shoot this today
> because I'm gonna be able to, until Monday).

Well, in most cases vacuum (or SELECT?) is going to set those xids as
committed on the tuple, but if there have been few deletes in the toast
table, it is possible vacuum did not run.  I think the fact we only have
three report query error cases is because in most cases vacuum is
already taking care of this as part of space reuse.

relfrozenxid is not going to cause freeze to run and therefore those
xids, even though marked as committed, still are on the tuple, so we
need this script to be run.

In fact, if the tuple is marked as committed, do we even bother to mark
the xids as fixed via vacuum freeze?  Seems we don't have to.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > 
> > Why is it important to have the original pg_clog files around?  Since
> > the transactions in question are below the freeze horizon, surely the
> > tuples that involve those transaction have all been visited by vacuum
> > and thus removed if they were leftover from aborted transactions or
> > deleted, no?  So you could just fill those files with the 0x55 pattern
> > (signalling "all transactions are committed") and the net result should
> > be the same.  No?
> > 
> > Forgive me if I'm missing something.  I haven't been following this
> > thread and I'm more than a little tired (but wanted to shoot this today
> > because I'm gonna be able to, until Monday).

To answer your other question, it is true we _probably_ could assume all
the rows were committed, except that again, vacuum might not have run
and the pages might not be full so single-page cleanup wasn't done
either.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade bug found!

From
Aidan Van Dyk
Date:
On Sat, Apr 9, 2011 at 7:03 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Bruce Momjian wrote:
>> Alvaro Herrera wrote:
>> >
>> > Why is it important to have the original pg_clog files around?  Since
>> > the transactions in question are below the freeze horizon, surely the
>> > tuples that involve those transaction have all been visited by vacuum
>> > and thus removed if they were leftover from aborted transactions or
>> > deleted, no?  So you could just fill those files with the 0x55 pattern
>> > (signalling "all transactions are committed") and the net result should
>> > be the same.  No?
>> >
>> > Forgive me if I'm missing something.  I haven't been following this
>> > thread and I'm more than a little tired (but wanted to shoot this today
>> > because I'm gonna be able to, until Monday).
>
> To answer your other question, it is true we _probably_ could assume all
> the rows were committed, except that again, vacuum might not have run
> and the pages might not be full so single-page cleanup wasn't done
> either.

OK, continuing the thought of just making all the old clog files as
"all committed"...

Since it only affects "toast" tables, the only time the system (with
normal queries) would check for a particular toast tuple, the tuple
referring to it would have been committed, right?  So forcing "all
transactions committed" for the older clog segments might mean a scan
on a *toast* heap might return tuples as committed when they might
have been aborted, but the real table heap would never refer to those,
right?

a.


--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


Re: pg_upgrade bug found!

From
Noah Misch
Date:
On Sat, Apr 09, 2011 at 09:05:42AM -0400, Aidan Van Dyk wrote:
> On Sat, Apr 9, 2011 at 7:03 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Bruce Momjian wrote:
> >> Alvaro Herrera wrote:
> >> >
> >> > Why is it important to have the original pg_clog files around? ?Since
> >> > the transactions in question are below the freeze horizon, surely the
> >> > tuples that involve those transaction have all been visited by vacuum
> >> > and thus removed if they were leftover from aborted transactions or
> >> > deleted, no? ?So you could just fill those files with the 0x55 pattern
> >> > (signalling "all transactions are committed") and the net result should
> >> > be the same. ?No?
> >> >
> >> > Forgive me if I'm missing something. ?I haven't been following this
> >> > thread and I'm more than a little tired (but wanted to shoot this today
> >> > because I'm gonna be able to, until Monday).
> >
> > To answer your other question, it is true we _probably_ could assume all
> > the rows were committed, except that again, vacuum might not have run
> > and the pages might not be full so single-page cleanup wasn't done
> > either.
> 
> OK, continuing the thought of just making all the old clog files as
> "all committed"...
> 
> Since it only affects "toast" tables, the only time the system (with
> normal queries) would check for a particular toast tuple, the tuple
> referring to it would have been committed, right?  So forcing "all
> transactions committed" for the older clog segments might mean a scan
> on a *toast* heap might return tuples as committed when they might
> have been aborted, but the real table heap would never refer to those,
> right?

Yes; it would be relatively harmless to retain some unreferenced TOAST chunks.
However, "all xacts committed" is not equivalent to "all tuples visible".  If
the user rolled back a DELETE shortly before the pg_upgrade run, we need to
recognize that outcome to keep any deleted TOAST entries visible.


Re: pg_upgrade bug found!

From
Bruce Momjian
Date:
Aidan Van Dyk wrote:
> On Sat, Apr 9, 2011 at 7:03 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Bruce Momjian wrote:
> >> Alvaro Herrera wrote:
> >> >
> >> > Why is it important to have the original pg_clog files around? ?Since
> >> > the transactions in question are below the freeze horizon, surely the
> >> > tuples that involve those transaction have all been visited by vacuum
> >> > and thus removed if they were leftover from aborted transactions or
> >> > deleted, no? ?So you could just fill those files with the 0x55 pattern
> >> > (signalling "all transactions are committed") and the net result should
> >> > be the same. ?No?
> >> >
> >> > Forgive me if I'm missing something. ?I haven't been following this
> >> > thread and I'm more than a little tired (but wanted to shoot this today
> >> > because I'm gonna be able to, until Monday).
> >
> > To answer your other question, it is true we _probably_ could assume all
> > the rows were committed, except that again, vacuum might not have run
> > and the pages might not be full so single-page cleanup wasn't done
> > either.
> 
> OK, continuing the thought of just making all the old clog files as
> "all committed"...
> 
> Since it only affects "toast" tables, the only time the system (with
> normal queries) would check for a particular toast tuple, the tuple
> referring to it would have been committed, right?  So forcing "all
> transactions committed" for the older clog segments might mean a scan
> on a *toast* heap might return tuples as committed when they might
> have been aborted, but the real table heap would never refer to those,
> right?

Uh, good point.  I think you are right that you only get to a toast row
from a non-aborted heap row.  I think the problem might be in following
the toast chain but even then I am unclear how that works.  Anyone?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Robert Haas wrote:
> > On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > Robert Haas wrote:
> > >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
> > >> <heikki.linnakangas@enterprisedb.com> wrote:
> > >> >> ?I think the maintenance
> > >> >> overhead of an invisible variable is too much.
> > >> >
> > >> > A simple GUC or command-line switch isn't much code.
> > >>
> > >> I like the idea of a command-line switch.
> > >
> > > If you want to do that you should gereralize it as --binary-upgrade in
> > > case we have other needs for it.
> >
> > Yeah.  Or we could do a binary_upgrade GUC which has the effect of
> > forcibly suppressing autovacuum, and maybe other things later.  I
> > think that's a lot less hazardous than fiddling with the autovacuum
> > GUC.
>
> I like the idea of a command-line flag because it forces everything to
> be affected, and cannot be turned on and off in sessions --- if you are
> doing a binary upgrade, locked-down is good. :-)

The attached patch adds a new postmaster/postgres binary upgrade mode
(-b) which disables autovacuum, allows only super-user connections, and
prevents pg_upgrade_support oid assignment when not in upgrade mode.
It also modifies pg_upgrade to use this new mode rather than play with
trying to stop autovacuum.

This does fix a very rare bug that could happen if
autovacuum_freeze_max_age were set to maximum by the user.

I think this should be applied to PG 9.1.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 2a0f50e..df2f2db
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*************** start_postmaster(ClusterInfo *cluster, b
*** 173,178 ****
--- 173,183 ----
      const char *datadir;
      unsigned short port;
      bool        exit_hook_registered = false;
+ #ifndef WIN32
+     char        *output_filename = log_opts.filename;
+ #else
+     char        *output_filename = DEVNULL;
+ #endif

      bindir = cluster->bindir;
      datadir = cluster->pgdata;
*************** start_postmaster(ClusterInfo *cluster, b
*** 193,216 ****
       * same file because we get the error: "The process cannot access the file
       * because it is being used by another process." so we have to send all
       * other output to 'nul'.
!      *
!      * Using autovacuum=off disables cleanup vacuum and analyze, but freeze
!      * vacuums can still happen, so we set autovacuum_freeze_max_age to its
!      * maximum.  We assume all datfrozenxid and relfrozen values are less than
!      * a gap of 2000000000 from the current xid counter, so autovacuum will
!      * not touch them.
       */
      snprintf(cmd, sizeof(cmd),
               SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
!              "-o \"-p %d -c autovacuum=off "
!              "-c autovacuum_freeze_max_age=2000000000\" "
!              "start >> \"%s\" 2>&1" SYSTEMQUOTE,
!              bindir,
! #ifndef WIN32
!              log_opts.filename, datadir, port, log_opts.filename);
! #else
!              DEVNULL, datadir, port, DEVNULL);
! #endif
      exec_prog(true, "%s", cmd);

      /* wait for the server to start properly */
--- 198,212 ----
       * same file because we get the error: "The process cannot access the file
       * because it is being used by another process." so we have to send all
       * other output to 'nul'.
!      * Use binary upgrade mode on the server (-b), if supported.
       */
      snprintf(cmd, sizeof(cmd),
               SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
!              "-o \"-p %d %s\" start >> \"%s\" 2>&1" SYSTEMQUOTE,
!              bindir, output_filename, datadir, port,
!              (GET_MAJOR_VERSION(cluster->major_version) >= 901) ? "-b" : "",
!              log_opts.filename);
!
      exec_prog(true, "%s", cmd);

      /* wait for the server to start properly */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
new file mode 100644
index 8c5670f..870ddba
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
*************** heap_create_with_catalog(const char *rel
*** 1051,1057 ****
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (OidIsValid(binary_upgrade_next_heap_pg_class_oid) &&
              (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE ||
               relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE ||
               relkind == RELKIND_FOREIGN_TABLE))
--- 1051,1058 ----
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (IsBinaryUpgrade &&
!             OidIsValid(binary_upgrade_next_heap_pg_class_oid) &&
              (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE ||
               relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE ||
               relkind == RELKIND_FOREIGN_TABLE))
*************** heap_create_with_catalog(const char *rel
*** 1059,1065 ****
              relid = binary_upgrade_next_heap_pg_class_oid;
              binary_upgrade_next_heap_pg_class_oid = InvalidOid;
          }
!         else if (OidIsValid(binary_upgrade_next_toast_pg_class_oid) &&
                   relkind == RELKIND_TOASTVALUE)
          {
              relid = binary_upgrade_next_toast_pg_class_oid;
--- 1060,1067 ----
              relid = binary_upgrade_next_heap_pg_class_oid;
              binary_upgrade_next_heap_pg_class_oid = InvalidOid;
          }
!         else if (IsBinaryUpgrade &&
!                  OidIsValid(binary_upgrade_next_toast_pg_class_oid) &&
                   relkind == RELKIND_TOASTVALUE)
          {
              relid = binary_upgrade_next_toast_pg_class_oid;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
new file mode 100644
index c79402c..a662cfc
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
*************** index_create(Relation heapRelation,
*** 790,796 ****
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (OidIsValid(binary_upgrade_next_index_pg_class_oid))
          {
              indexRelationId = binary_upgrade_next_index_pg_class_oid;
              binary_upgrade_next_index_pg_class_oid = InvalidOid;
--- 790,797 ----
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (IsBinaryUpgrade &&
!             OidIsValid(binary_upgrade_next_index_pg_class_oid))
          {
              indexRelationId = binary_upgrade_next_index_pg_class_oid;
              binary_upgrade_next_index_pg_class_oid = InvalidOid;
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
new file mode 100644
index 08d8aa1..61a9322
*** a/src/backend/catalog/pg_enum.c
--- b/src/backend/catalog/pg_enum.c
***************
*** 21,26 ****
--- 21,27 ----
  #include "catalog/pg_enum.h"
  #include "catalog/pg_type.h"
  #include "storage/lmgr.h"
+ #include "miscadmin.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
  #include "utils/rel.h"
*************** restart:
*** 311,317 ****
      }

      /* Get a new OID for the new label */
!     if (OidIsValid(binary_upgrade_next_pg_enum_oid))
      {
          /*
           * Use binary-upgrade override for pg_enum.oid, if supplied. During
--- 312,318 ----
      }

      /* Get a new OID for the new label */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_enum_oid))
      {
          /*
           * Use binary-upgrade override for pg_enum.oid, if supplied. During
diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
new file mode 100644
index 9e35e73..80c1bfc
*** a/src/backend/catalog/pg_type.c
--- b/src/backend/catalog/pg_type.c
*************** TypeShellMake(const char *typeName, Oid
*** 125,131 ****
      tup = heap_form_tuple(tupDesc, values, nulls);

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (OidIsValid(binary_upgrade_next_pg_type_oid))
      {
          HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
          binary_upgrade_next_pg_type_oid = InvalidOid;
--- 125,131 ----
      tup = heap_form_tuple(tupDesc, values, nulls);

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid))
      {
          HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
          binary_upgrade_next_pg_type_oid = InvalidOid;
*************** TypeCreate(Oid newTypeOid,
*** 430,436 ****
          if (OidIsValid(newTypeOid))
              HeapTupleSetOid(tup, newTypeOid);
          /* Use binary-upgrade override for pg_type.oid, if supplied. */
!         else if (OidIsValid(binary_upgrade_next_pg_type_oid))
          {
              HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
              binary_upgrade_next_pg_type_oid = InvalidOid;
--- 430,436 ----
          if (OidIsValid(newTypeOid))
              HeapTupleSetOid(tup, newTypeOid);
          /* Use binary-upgrade override for pg_type.oid, if supplied. */
!         else if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid))
          {
              HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
              binary_upgrade_next_pg_type_oid = InvalidOid;
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
new file mode 100644
index 85fe57f..362d26d
*** a/src/backend/catalog/toasting.c
--- b/src/backend/catalog/toasting.c
*************** create_toast_table(Relation rel, Oid toa
*** 157,163 ****
       * creation even if it seems not to need one.
       */
      if (!needs_toast_table(rel) &&
!         !OidIsValid(binary_upgrade_next_toast_pg_class_oid))
          return false;

      /*
--- 157,164 ----
       * creation even if it seems not to need one.
       */
      if (!needs_toast_table(rel) &&
!         (!IsBinaryUpgrade ||
!          !OidIsValid(binary_upgrade_next_toast_pg_class_oid)))
          return false;

      /*
*************** create_toast_table(Relation rel, Oid toa
*** 202,208 ****
          namespaceid = PG_TOAST_NAMESPACE;

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (OidIsValid(binary_upgrade_next_toast_pg_type_oid))
      {
          toast_typid = binary_upgrade_next_toast_pg_type_oid;
          binary_upgrade_next_toast_pg_type_oid = InvalidOid;
--- 203,209 ----
          namespaceid = PG_TOAST_NAMESPACE;

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_toast_pg_type_oid))
      {
          toast_typid = binary_upgrade_next_toast_pg_type_oid;
          binary_upgrade_next_toast_pg_type_oid = InvalidOid;
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
new file mode 100644
index 1a20b0d..b3b6dc2
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
*************** AssignTypeArrayOid(void)
*** 1550,1556 ****
      Oid            type_array_oid;

      /* Use binary-upgrade override for pg_type.typarray, if supplied. */
!     if (OidIsValid(binary_upgrade_next_array_pg_type_oid))
      {
          type_array_oid = binary_upgrade_next_array_pg_type_oid;
          binary_upgrade_next_array_pg_type_oid = InvalidOid;
--- 1550,1556 ----
      Oid            type_array_oid;

      /* Use binary-upgrade override for pg_type.typarray, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_array_pg_type_oid))
      {
          type_array_oid = binary_upgrade_next_array_pg_type_oid;
          binary_upgrade_next_array_pg_type_oid = InvalidOid;
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
new file mode 100644
index 3f7d499..838d6eb
*** a/src/backend/commands/user.c
--- b/src/backend/commands/user.c
*************** CreateRole(CreateRoleStmt *stmt)
*** 388,394 ****
       * pg_largeobject_metadata contains pg_authid.oid's, so we use the
       * binary-upgrade override, if specified.
       */
!     if (OidIsValid(binary_upgrade_next_pg_authid_oid))
      {
          HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid);
          binary_upgrade_next_pg_authid_oid = InvalidOid;
--- 388,394 ----
       * pg_largeobject_metadata contains pg_authid.oid's, so we use the
       * binary-upgrade override, if specified.
       */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_authid_oid))
      {
          HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid);
          binary_upgrade_next_pg_authid_oid = InvalidOid;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
new file mode 100644
index 6e7f664..c0cf033
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** PostmasterMain(int argc, char *argv[])
*** 529,535 ****
       * tcop/postgres.c (the option sets should not conflict) and with the
       * common help() function in main/main.c.
       */
!     while ((opt = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1)
      {
          switch (opt)
          {
--- 529,535 ----
       * tcop/postgres.c (the option sets should not conflict) and with the
       * common help() function in main/main.c.
       */
!     while ((opt = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1)
      {
          switch (opt)
          {
*************** PostmasterMain(int argc, char *argv[])
*** 541,546 ****
--- 541,551 ----
                  SetConfigOption("shared_buffers", optarg, PGC_POSTMASTER, PGC_S_ARGV);
                  break;

+             case 'b':
+                 /* Undocumented flag used for binary upgrades */
+                 IsBinaryUpgrade = true;
+                 break;
+
              case 'D':
                  userDoption = optarg;
                  break;
*************** ServerLoop(void)
*** 1480,1487 ****
          if (WalWriterPID == 0 && pmState == PM_RUN)
              WalWriterPID = StartWalWriter();

!         /* If we have lost the autovacuum launcher, try to start a new one */
!         if (AutoVacPID == 0 &&
              (AutoVacuumingActive() || start_autovac_launcher) &&
              pmState == PM_RUN)
          {
--- 1485,1497 ----
          if (WalWriterPID == 0 && pmState == PM_RUN)
              WalWriterPID = StartWalWriter();

!         /*
!          *    If we have lost the autovacuum launcher, try to start a new one.
!          *    We don't want autovacuum to run in binary upgrade mode because
!          *    autovacuum might update relfrozenxid for empty tables before
!          *    the physical files are put in place.
!          */
!         if (!IsBinaryUpgrade && AutoVacPID == 0 &&
              (AutoVacuumingActive() || start_autovac_launcher) &&
              pmState == PM_RUN)
          {
*************** reaper(SIGNAL_ARGS)
*** 2413,2419 ****
               */
              if (WalWriterPID == 0)
                  WalWriterPID = StartWalWriter();
!             if (AutoVacuumingActive() && AutoVacPID == 0)
                  AutoVacPID = StartAutoVacLauncher();
              if (XLogArchivingActive() && PgArchPID == 0)
                  PgArchPID = pgarch_start();
--- 2423,2429 ----
               */
              if (WalWriterPID == 0)
                  WalWriterPID = StartWalWriter();
!             if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
                  AutoVacPID = StartAutoVacLauncher();
              if (XLogArchivingActive() && PgArchPID == 0)
                  PgArchPID = pgarch_start();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
new file mode 100644
index 59b7666..a07661f
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** process_postgres_switches(int argc, char
*** 3238,3244 ****
       * postmaster/postmaster.c (the option sets should not conflict) and with
       * the common help() function in main/main.c.
       */
!     while ((flag = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1)
      {
          switch (flag)
          {
--- 3238,3244 ----
       * postmaster/postmaster.c (the option sets should not conflict) and with
       * the common help() function in main/main.c.
       */
!     while ((flag = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1)
      {
          switch (flag)
          {
*************** process_postgres_switches(int argc, char
*** 3250,3255 ****
--- 3250,3260 ----
                  SetConfigOption("shared_buffers", optarg, ctx, gucsource);
                  break;

+             case 'b':
+                 /* Undocumented flag used for binary upgrades */
+                 IsBinaryUpgrade = true;
+                 break;
+
              case 'D':
                  if (secure)
                      userDoption = strdup(optarg);
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
new file mode 100644
index 984ffd0..c4c4154
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
*************** pid_t        PostmasterPid = 0;
*** 85,90 ****
--- 85,91 ----
   */
  bool        IsPostmasterEnvironment = false;
  bool        IsUnderPostmaster = false;
+ bool        IsBinaryUpgrade = false;

  bool        ExitOnAnyError = false;

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
new file mode 100644
index a4c5d4c..1f6fba5
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*************** InitPostgres(const char *in_dbname, Oid
*** 626,631 ****
--- 626,641 ----
      }

      /*
+      * Binary upgrades only allowed super-user connections
+      */
+     if (IsBinaryUpgrade && !am_superuser)
+     {
+             ereport(FATAL,
+                     (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+             errmsg("must be superuser to connect in binary upgrade mode")));
+     }
+
+     /*
       * The last few connections slots are reserved for superusers. Although
       * replication connections currently require superuser privileges, we
       * don't allow them to consume the reserved slots, which are intended for
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
new file mode 100644
index aa8cce5..9d19417
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
*************** do { \
*** 124,129 ****
--- 124,130 ----
  extern pid_t PostmasterPid;
  extern bool IsPostmasterEnvironment;
  extern PGDLLIMPORT bool IsUnderPostmaster;
+ extern bool IsBinaryUpgrade;

  extern bool ExitOnAnyError;


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Robert Haas wrote:
> > > On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > > Robert Haas wrote:
> > > >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
> > > >> <heikki.linnakangas@enterprisedb.com> wrote:
> > > >> >> ?I think the maintenance
> > > >> >> overhead of an invisible variable is too much.
> > > >> >
> > > >> > A simple GUC or command-line switch isn't much code.
> > > >>
> > > >> I like the idea of a command-line switch.
> > > >
> > > > If you want to do that you should gereralize it as --binary-upgrade in
> > > > case we have other needs for it.
> > > 
> > > Yeah.  Or we could do a binary_upgrade GUC which has the effect of
> > > forcibly suppressing autovacuum, and maybe other things later.  I
> > > think that's a lot less hazardous than fiddling with the autovacuum
> > > GUC.
> > 
> > I like the idea of a command-line flag because it forces everything to
> > be affected, and cannot be turned on and off in sessions --- if you are
> > doing a binary upgrade, locked-down is good. :-)
> 
> The attached patch adds a new postmaster/postgres binary upgrade mode
> (-b) which disables autovacuum, allows only super-user connections, and
> prevents pg_upgrade_support oid assignment when not in upgrade mode. 
> It also modifies pg_upgrade to use this new mode rather than play with
> trying to stop autovacuum.
> 
> This does fix a very rare bug that could happen if
> autovacuum_freeze_max_age were set to maximum by the user.
> 
> I think this should be applied to PG 9.1.

One big problem with this patch is that it will not allow people to use
pg_upgrade when upgrading from 9.1 alpha to beta.  What I could do is to
use the mode only on the "new" cluster for 9.1.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Patch for pg_upgrade to turn off autovacuum

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
>> The attached patch adds a new postmaster/postgres binary upgrade mode
>> (-b) which disables autovacuum, allows only super-user connections, and
>> prevents pg_upgrade_support oid assignment when not in upgrade mode. 
>> It also modifies pg_upgrade to use this new mode rather than play with
>> trying to stop autovacuum.

> One big problem with this patch is that it will not allow people to use
> pg_upgrade when upgrading from 9.1 alpha to beta.

Huh?  Why would that be?  Seems like you've done something in the wrong
place if that's an issue.
        regards, tom lane


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> >> The attached patch adds a new postmaster/postgres binary upgrade mode
> >> (-b) which disables autovacuum, allows only super-user connections, and
> >> prevents pg_upgrade_support oid assignment when not in upgrade mode. 
> >> It also modifies pg_upgrade to use this new mode rather than play with
> >> trying to stop autovacuum.
> 
> > One big problem with this patch is that it will not allow people to use
> > pg_upgrade when upgrading from 9.1 alpha to beta.
> 
> Huh?  Why would that be?  Seems like you've done something in the wrong
> place if that's an issue.

Yeah, it is complicated.  I don't really care if autovacuum runs on the
old cluster (we only move the files while the server is down).  We only
want autovacuum not to mess with the relfrozenxids we set on the new
cluster while the table file is empty.

The other issue is that the old alpha binary will not know about the -b
flag and hence will not start.

This all came up when we were looking for the relfrozenxid bug, which we
found as TOAST which has been fixed.  This is a very small problem so
maybe we just skip the fix for 9.1.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Patch for pg_upgrade to turn off autovacuum

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Huh?  Why would that be?  Seems like you've done something in the wrong
>> place if that's an issue.

> Yeah, it is complicated.  I don't really care if autovacuum runs on the
> old cluster (we only move the files while the server is down).  We only
> want autovacuum not to mess with the relfrozenxids we set on the new
> cluster while the table file is empty.

> The other issue is that the old alpha binary will not know about the -b
> flag and hence will not start.

Well, once again, why are you trying to do that?  It's not the source
postmaster that needs this flag.
        regards, tom lane


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> Huh?  Why would that be?  Seems like you've done something in the wrong
> >> place if that's an issue.
> 
> > Yeah, it is complicated.  I don't really care if autovacuum runs on the
> > old cluster (we only move the files while the server is down).  We only
> > want autovacuum not to mess with the relfrozenxids we set on the new
> > cluster while the table file is empty.
> 
> > The other issue is that the old alpha binary will not know about the -b
> > flag and hence will not start.
> 
> Well, once again, why are you trying to do that?  It's not the source
> postmaster that needs this flag.

Well, consider that this also locks out non-super users so I figured it
would be good to run the old and new in the same binary upgrade mode. 
Again, we can do just the new cluster for 9.1.   I can also control the
behavior based on the catalog version number, which seems the most
logical.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Patch for pg_upgrade to turn off autovacuum

From
Jeff Davis
Date:
On Thu, 2011-04-21 at 18:22 -0400, Bruce Momjian wrote:
> I can also control the
> behavior based on the catalog version number, which seems the most
> logical.

It seems like we want a simple "use -b if available; else don't". Is
that right?

If so, switching based on the version seems reasonable. However, can you
get the information directly from the bianry, rather than trying to
infer it from the catalog version?

Regards,Jeff Davis



Re: Patch for pg_upgrade to turn off autovacuum

From
Robert Haas
Date:
On Apr 21, 2011, at 6:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> Tom Lane wrote:
>>>> Huh?  Why would that be?  Seems like you've done something in the wrong
>>>> place if that's an issue.
>>
>>> Yeah, it is complicated.  I don't really care if autovacuum runs on the
>>> old cluster (we only move the files while the server is down).  We only
>>> want autovacuum not to mess with the relfrozenxids we set on the new
>>> cluster while the table file is empty.
>>
>>> The other issue is that the old alpha binary will not know about the -b
>>> flag and hence will not start.
>>
>> Well, once again, why are you trying to do that?  It's not the source
>> postmaster that needs this flag.
>
> Well, consider that this also locks out non-super users so I figured it
> would be good to run the old and new in the same binary upgrade mode.
> Again, we can do just the new cluster for 9.1.   I can also control the
> behavior based on the catalog version number, which seems the most
> logical.

I think you are over-engineering this. Just use it for the new cluster only, full stop, and you'll be right as rain.

...Robert

Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Apr 21, 2011, at 6:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Tom Lane wrote:
> >> Bruce Momjian <bruce@momjian.us> writes:
> >>> Tom Lane wrote:
> >>>> Huh?  Why would that be?  Seems like you've done something in the wrong
> >>>> place if that's an issue.
> >>
> >>> Yeah, it is complicated.  I don't really care if autovacuum runs on the
> >>> old cluster (we only move the files while the server is down).  We only
> >>> want autovacuum not to mess with the relfrozenxids we set on the new
> >>> cluster while the table file is empty.
> >>
> >>> The other issue is that the old alpha binary will not know about the -b
> >>> flag and hence will not start.
> >>
> >> Well, once again, why are you trying to do that?  It's not the source
> >> postmaster that needs this flag.
> >
> > Well, consider that this also locks out non-super users so I figured it
> > would be good to run the old and new in the same binary upgrade mode.
> > Again, we can do just the new cluster for 9.1.   I can also control the
> > behavior based on the catalog version number, which seems the most
> > logical.
>
> I think you are over-engineering this. Just use it for the new cluster
> only, full stop, and you'll be right as rain.

I thought some more about this and I don't want autovacuum to run on the
old server.  This is because pg_dumpall --binary-upgrade --schema-only
grabs the datfrozenxid for all the databases at the start, then connects
to each database to gets the relfrozenxids.  I don't want to risk any
advancement of either of those during the pg_dumpall run.

FYI, the existing code already doesn't allow autovacuum to run on the old
or new cluster by setting autovacuum off and autovacuum_freeze_max_age
very high, so this is not a behavior change --- just a more formalized
way of turning off autovacuum.

The attached patch uses catalog version to test;  we use catalog version
checking already for tablespace subdirectories.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index d1dc5db..7623c5e
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*************** check_cluster_compatibility(bool live_ch
*** 264,270 ****

      /* Is it 9.0 but without tablespace directories? */
      if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 &&
!         new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS)
          pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n"
                 "because of backend API changes made during development.\n");
  }
--- 264,270 ----

      /* Is it 9.0 but without tablespace directories? */
      if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 &&
!         new_cluster.controldata.cat_ver < CAT_VER_TABLE_SPACE_SUBDIRS)
          pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n"
                 "because of backend API changes made during development.\n");
  }
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index 5ca570e..70d74ba
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
***************
*** 58,64 ****
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))

  /* OID system catalog preservation added during PG 9.0 development */
! #define TABLE_SPACE_SUBDIRS 201001111

  /*
   * Each relation is represented by a relinfo structure.
--- 58,66 ----
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))

  /* OID system catalog preservation added during PG 9.0 development */
! #define CAT_VER_TABLE_SPACE_SUBDIRS 201001111
! /* postmaster/postgres -b (binary_upgrade) flag added during PG 9.1 development */
! #define CAT_VER_BINARY_UPGRADE_SERVER_FLAG 201104221

  /*
   * Each relation is represented by a relinfo structure.
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 2a0f50e..f7d7653
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*************** start_postmaster(ClusterInfo *cluster, b
*** 173,178 ****
--- 173,183 ----
      const char *datadir;
      unsigned short port;
      bool        exit_hook_registered = false;
+ #ifndef WIN32
+     char        *output_filename = log_opts.filename;
+ #else
+     char        *output_filename = DEVNULL;
+ #endif

      bindir = cluster->bindir;
      datadir = cluster->pgdata;
*************** start_postmaster(ClusterInfo *cluster, b
*** 193,216 ****
       * same file because we get the error: "The process cannot access the file
       * because it is being used by another process." so we have to send all
       * other output to 'nul'.
!      *
!      * Using autovacuum=off disables cleanup vacuum and analyze, but freeze
!      * vacuums can still happen, so we set autovacuum_freeze_max_age to its
!      * maximum.  We assume all datfrozenxid and relfrozen values are less than
!      * a gap of 2000000000 from the current xid counter, so autovacuum will
!      * not touch them.
       */
      snprintf(cmd, sizeof(cmd),
               SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
!              "-o \"-p %d -c autovacuum=off "
!              "-c autovacuum_freeze_max_age=2000000000\" "
!              "start >> \"%s\" 2>&1" SYSTEMQUOTE,
!              bindir,
! #ifndef WIN32
!              log_opts.filename, datadir, port, log_opts.filename);
! #else
!              DEVNULL, datadir, port, DEVNULL);
! #endif
      exec_prog(true, "%s", cmd);

      /* wait for the server to start properly */
--- 198,213 ----
       * same file because we get the error: "The process cannot access the file
       * because it is being used by another process." so we have to send all
       * other output to 'nul'.
!      * Use binary upgrade mode on the server (-b), if supported.
       */
      snprintf(cmd, sizeof(cmd),
               SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
!              "-o \"-p %d%s\" start >> \"%s\" 2>&1" SYSTEMQUOTE,
!              bindir, output_filename, datadir, port,
!              (cluster->controldata.cat_ver >=
!                 CAT_VER_BINARY_UPGRADE_SERVER_FLAG) ? " -b" : "",
!              log_opts.filename);
!
      exec_prog(true, "%s", cmd);

      /* wait for the server to start properly */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
new file mode 100644
index 8c5670f..870ddba
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
*************** heap_create_with_catalog(const char *rel
*** 1051,1057 ****
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (OidIsValid(binary_upgrade_next_heap_pg_class_oid) &&
              (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE ||
               relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE ||
               relkind == RELKIND_FOREIGN_TABLE))
--- 1051,1058 ----
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (IsBinaryUpgrade &&
!             OidIsValid(binary_upgrade_next_heap_pg_class_oid) &&
              (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE ||
               relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE ||
               relkind == RELKIND_FOREIGN_TABLE))
*************** heap_create_with_catalog(const char *rel
*** 1059,1065 ****
              relid = binary_upgrade_next_heap_pg_class_oid;
              binary_upgrade_next_heap_pg_class_oid = InvalidOid;
          }
!         else if (OidIsValid(binary_upgrade_next_toast_pg_class_oid) &&
                   relkind == RELKIND_TOASTVALUE)
          {
              relid = binary_upgrade_next_toast_pg_class_oid;
--- 1060,1067 ----
              relid = binary_upgrade_next_heap_pg_class_oid;
              binary_upgrade_next_heap_pg_class_oid = InvalidOid;
          }
!         else if (IsBinaryUpgrade &&
!                  OidIsValid(binary_upgrade_next_toast_pg_class_oid) &&
                   relkind == RELKIND_TOASTVALUE)
          {
              relid = binary_upgrade_next_toast_pg_class_oid;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
new file mode 100644
index c79402c..a662cfc
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
*************** index_create(Relation heapRelation,
*** 790,796 ****
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (OidIsValid(binary_upgrade_next_index_pg_class_oid))
          {
              indexRelationId = binary_upgrade_next_index_pg_class_oid;
              binary_upgrade_next_index_pg_class_oid = InvalidOid;
--- 790,797 ----
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (IsBinaryUpgrade &&
!             OidIsValid(binary_upgrade_next_index_pg_class_oid))
          {
              indexRelationId = binary_upgrade_next_index_pg_class_oid;
              binary_upgrade_next_index_pg_class_oid = InvalidOid;
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
new file mode 100644
index 08d8aa1..61a9322
*** a/src/backend/catalog/pg_enum.c
--- b/src/backend/catalog/pg_enum.c
***************
*** 21,26 ****
--- 21,27 ----
  #include "catalog/pg_enum.h"
  #include "catalog/pg_type.h"
  #include "storage/lmgr.h"
+ #include "miscadmin.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
  #include "utils/rel.h"
*************** restart:
*** 311,317 ****
      }

      /* Get a new OID for the new label */
!     if (OidIsValid(binary_upgrade_next_pg_enum_oid))
      {
          /*
           * Use binary-upgrade override for pg_enum.oid, if supplied. During
--- 312,318 ----
      }

      /* Get a new OID for the new label */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_enum_oid))
      {
          /*
           * Use binary-upgrade override for pg_enum.oid, if supplied. During
diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
new file mode 100644
index 9e35e73..80c1bfc
*** a/src/backend/catalog/pg_type.c
--- b/src/backend/catalog/pg_type.c
*************** TypeShellMake(const char *typeName, Oid
*** 125,131 ****
      tup = heap_form_tuple(tupDesc, values, nulls);

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (OidIsValid(binary_upgrade_next_pg_type_oid))
      {
          HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
          binary_upgrade_next_pg_type_oid = InvalidOid;
--- 125,131 ----
      tup = heap_form_tuple(tupDesc, values, nulls);

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid))
      {
          HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
          binary_upgrade_next_pg_type_oid = InvalidOid;
*************** TypeCreate(Oid newTypeOid,
*** 430,436 ****
          if (OidIsValid(newTypeOid))
              HeapTupleSetOid(tup, newTypeOid);
          /* Use binary-upgrade override for pg_type.oid, if supplied. */
!         else if (OidIsValid(binary_upgrade_next_pg_type_oid))
          {
              HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
              binary_upgrade_next_pg_type_oid = InvalidOid;
--- 430,436 ----
          if (OidIsValid(newTypeOid))
              HeapTupleSetOid(tup, newTypeOid);
          /* Use binary-upgrade override for pg_type.oid, if supplied. */
!         else if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid))
          {
              HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
              binary_upgrade_next_pg_type_oid = InvalidOid;
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
new file mode 100644
index 85fe57f..362d26d
*** a/src/backend/catalog/toasting.c
--- b/src/backend/catalog/toasting.c
*************** create_toast_table(Relation rel, Oid toa
*** 157,163 ****
       * creation even if it seems not to need one.
       */
      if (!needs_toast_table(rel) &&
!         !OidIsValid(binary_upgrade_next_toast_pg_class_oid))
          return false;

      /*
--- 157,164 ----
       * creation even if it seems not to need one.
       */
      if (!needs_toast_table(rel) &&
!         (!IsBinaryUpgrade ||
!          !OidIsValid(binary_upgrade_next_toast_pg_class_oid)))
          return false;

      /*
*************** create_toast_table(Relation rel, Oid toa
*** 202,208 ****
          namespaceid = PG_TOAST_NAMESPACE;

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (OidIsValid(binary_upgrade_next_toast_pg_type_oid))
      {
          toast_typid = binary_upgrade_next_toast_pg_type_oid;
          binary_upgrade_next_toast_pg_type_oid = InvalidOid;
--- 203,209 ----
          namespaceid = PG_TOAST_NAMESPACE;

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_toast_pg_type_oid))
      {
          toast_typid = binary_upgrade_next_toast_pg_type_oid;
          binary_upgrade_next_toast_pg_type_oid = InvalidOid;
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
new file mode 100644
index 1a20b0d..b3b6dc2
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
*************** AssignTypeArrayOid(void)
*** 1550,1556 ****
      Oid            type_array_oid;

      /* Use binary-upgrade override for pg_type.typarray, if supplied. */
!     if (OidIsValid(binary_upgrade_next_array_pg_type_oid))
      {
          type_array_oid = binary_upgrade_next_array_pg_type_oid;
          binary_upgrade_next_array_pg_type_oid = InvalidOid;
--- 1550,1556 ----
      Oid            type_array_oid;

      /* Use binary-upgrade override for pg_type.typarray, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_array_pg_type_oid))
      {
          type_array_oid = binary_upgrade_next_array_pg_type_oid;
          binary_upgrade_next_array_pg_type_oid = InvalidOid;
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
new file mode 100644
index 3f7d499..838d6eb
*** a/src/backend/commands/user.c
--- b/src/backend/commands/user.c
*************** CreateRole(CreateRoleStmt *stmt)
*** 388,394 ****
       * pg_largeobject_metadata contains pg_authid.oid's, so we use the
       * binary-upgrade override, if specified.
       */
!     if (OidIsValid(binary_upgrade_next_pg_authid_oid))
      {
          HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid);
          binary_upgrade_next_pg_authid_oid = InvalidOid;
--- 388,394 ----
       * pg_largeobject_metadata contains pg_authid.oid's, so we use the
       * binary-upgrade override, if specified.
       */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_authid_oid))
      {
          HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid);
          binary_upgrade_next_pg_authid_oid = InvalidOid;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
new file mode 100644
index 6e7f664..c0cf033
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** PostmasterMain(int argc, char *argv[])
*** 529,535 ****
       * tcop/postgres.c (the option sets should not conflict) and with the
       * common help() function in main/main.c.
       */
!     while ((opt = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1)
      {
          switch (opt)
          {
--- 529,535 ----
       * tcop/postgres.c (the option sets should not conflict) and with the
       * common help() function in main/main.c.
       */
!     while ((opt = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1)
      {
          switch (opt)
          {
*************** PostmasterMain(int argc, char *argv[])
*** 541,546 ****
--- 541,551 ----
                  SetConfigOption("shared_buffers", optarg, PGC_POSTMASTER, PGC_S_ARGV);
                  break;

+             case 'b':
+                 /* Undocumented flag used for binary upgrades */
+                 IsBinaryUpgrade = true;
+                 break;
+
              case 'D':
                  userDoption = optarg;
                  break;
*************** ServerLoop(void)
*** 1480,1487 ****
          if (WalWriterPID == 0 && pmState == PM_RUN)
              WalWriterPID = StartWalWriter();

!         /* If we have lost the autovacuum launcher, try to start a new one */
!         if (AutoVacPID == 0 &&
              (AutoVacuumingActive() || start_autovac_launcher) &&
              pmState == PM_RUN)
          {
--- 1485,1497 ----
          if (WalWriterPID == 0 && pmState == PM_RUN)
              WalWriterPID = StartWalWriter();

!         /*
!          *    If we have lost the autovacuum launcher, try to start a new one.
!          *    We don't want autovacuum to run in binary upgrade mode because
!          *    autovacuum might update relfrozenxid for empty tables before
!          *    the physical files are put in place.
!          */
!         if (!IsBinaryUpgrade && AutoVacPID == 0 &&
              (AutoVacuumingActive() || start_autovac_launcher) &&
              pmState == PM_RUN)
          {
*************** reaper(SIGNAL_ARGS)
*** 2413,2419 ****
               */
              if (WalWriterPID == 0)
                  WalWriterPID = StartWalWriter();
!             if (AutoVacuumingActive() && AutoVacPID == 0)
                  AutoVacPID = StartAutoVacLauncher();
              if (XLogArchivingActive() && PgArchPID == 0)
                  PgArchPID = pgarch_start();
--- 2423,2429 ----
               */
              if (WalWriterPID == 0)
                  WalWriterPID = StartWalWriter();
!             if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
                  AutoVacPID = StartAutoVacLauncher();
              if (XLogArchivingActive() && PgArchPID == 0)
                  PgArchPID = pgarch_start();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
new file mode 100644
index 59b7666..a07661f
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** process_postgres_switches(int argc, char
*** 3238,3244 ****
       * postmaster/postmaster.c (the option sets should not conflict) and with
       * the common help() function in main/main.c.
       */
!     while ((flag = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1)
      {
          switch (flag)
          {
--- 3238,3244 ----
       * postmaster/postmaster.c (the option sets should not conflict) and with
       * the common help() function in main/main.c.
       */
!     while ((flag = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1)
      {
          switch (flag)
          {
*************** process_postgres_switches(int argc, char
*** 3250,3255 ****
--- 3250,3260 ----
                  SetConfigOption("shared_buffers", optarg, ctx, gucsource);
                  break;

+             case 'b':
+                 /* Undocumented flag used for binary upgrades */
+                 IsBinaryUpgrade = true;
+                 break;
+
              case 'D':
                  if (secure)
                      userDoption = strdup(optarg);
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
new file mode 100644
index 984ffd0..c4c4154
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
*************** pid_t        PostmasterPid = 0;
*** 85,90 ****
--- 85,91 ----
   */
  bool        IsPostmasterEnvironment = false;
  bool        IsUnderPostmaster = false;
+ bool        IsBinaryUpgrade = false;

  bool        ExitOnAnyError = false;

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
new file mode 100644
index a4c5d4c..1f6fba5
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*************** InitPostgres(const char *in_dbname, Oid
*** 626,631 ****
--- 626,641 ----
      }

      /*
+      * Binary upgrades only allowed super-user connections
+      */
+     if (IsBinaryUpgrade && !am_superuser)
+     {
+             ereport(FATAL,
+                     (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+             errmsg("must be superuser to connect in binary upgrade mode")));
+     }
+
+     /*
       * The last few connections slots are reserved for superusers. Although
       * replication connections currently require superuser privileges, we
       * don't allow them to consume the reserved slots, which are intended for
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
new file mode 100644
index 53c684a..22926b0
*** a/src/include/catalog/catversion.h
--- b/src/include/catalog/catversion.h
***************
*** 53,58 ****
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    201104181

  #endif
--- 53,58 ----
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    201104221

  #endif
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
new file mode 100644
index aa8cce5..9d19417
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
*************** do { \
*** 124,129 ****
--- 124,130 ----
  extern pid_t PostmasterPid;
  extern bool IsPostmasterEnvironment;
  extern PGDLLIMPORT bool IsUnderPostmaster;
+ extern bool IsBinaryUpgrade;

  extern bool ExitOnAnyError;


Re: Patch for pg_upgrade to turn off autovacuum

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> I thought some more about this and I don't want autovacuum to run on the
> old server.  This is because pg_dumpall --binary-upgrade --schema-only
> grabs the datfrozenxid for all the databases at the start, then connects
> to each database to gets the relfrozenxids.  I don't want to risk any
> advancement of either of those during the pg_dumpall run.

Why?  It doesn't really matter --- if you grab a value that is older
than the latest, it's still valid.  As Robert said, you're
over-engineering this, and thereby introducing potential failure modes,
for no gain.
        regards, tom lane


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > > Well, consider that this also locks out non-super users so I figured it
> > > would be good to run the old and new in the same binary upgrade mode.
> > > Again, we can do just the new cluster for 9.1.   I can also control the
> > > behavior based on the catalog version number, which seems the most
> > > logical.
> >
> > I think you are over-engineering this. Just use it for the new cluster
> > only, full stop, and you'll be right as rain.
>
> I thought some more about this and I don't want autovacuum to run on the
> old server.  This is because pg_dumpall --binary-upgrade --schema-only
> grabs the datfrozenxid for all the databases at the start, then connects
> to each database to gets the relfrozenxids.  I don't want to risk any
> advancement of either of those during the pg_dumpall run.
>
> FYI, the existing code already doesn't allow autovacuum to run on the old
> or new cluster by setting autovacuum off and autovacuum_freeze_max_age
> very high, so this is not a behavior change --- just a more formalized
> way of turning off autovacuum.
>
> The attached patch uses catalog version to test;  we use catalog version
> checking already for tablespace subdirectories.

OK, thinking even more, it seems I have to keep the old vacuum disable
flag for pre-9.1 old clusters.  This new patch does that.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index d1dc5db..7623c5e
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*************** check_cluster_compatibility(bool live_ch
*** 264,270 ****

      /* Is it 9.0 but without tablespace directories? */
      if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 &&
!         new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS)
          pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n"
                 "because of backend API changes made during development.\n");
  }
--- 264,270 ----

      /* Is it 9.0 but without tablespace directories? */
      if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 &&
!         new_cluster.controldata.cat_ver < CAT_VER_TABLE_SPACE_SUBDIRS)
          pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n"
                 "because of backend API changes made during development.\n");
  }
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index 5ca570e..70d74ba
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
***************
*** 58,64 ****
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))

  /* OID system catalog preservation added during PG 9.0 development */
! #define TABLE_SPACE_SUBDIRS 201001111

  /*
   * Each relation is represented by a relinfo structure.
--- 58,66 ----
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))

  /* OID system catalog preservation added during PG 9.0 development */
! #define CAT_VER_TABLE_SPACE_SUBDIRS 201001111
! /* postmaster/postgres -b (binary_upgrade) flag added during PG 9.1 development */
! #define CAT_VER_BINARY_UPGRADE_SERVER_FLAG 201104221

  /*
   * Each relation is represented by a relinfo structure.
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 2a0f50e..ca4a957
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*************** start_postmaster(ClusterInfo *cluster, b
*** 173,178 ****
--- 173,183 ----
      const char *datadir;
      unsigned short port;
      bool        exit_hook_registered = false;
+ #ifndef WIN32
+     char        *output_filename = log_opts.filename;
+ #else
+     char        *output_filename = DEVNULL;
+ #endif

      bindir = cluster->bindir;
      datadir = cluster->pgdata;
*************** start_postmaster(ClusterInfo *cluster, b
*** 193,199 ****
       * same file because we get the error: "The process cannot access the file
       * because it is being used by another process." so we have to send all
       * other output to 'nul'.
-      *
       * Using autovacuum=off disables cleanup vacuum and analyze, but freeze
       * vacuums can still happen, so we set autovacuum_freeze_max_age to its
       * maximum.  We assume all datfrozenxid and relfrozen values are less than
--- 198,203 ----
*************** start_postmaster(ClusterInfo *cluster, b
*** 202,216 ****
       */
      snprintf(cmd, sizeof(cmd),
               SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
!              "-o \"-p %d -c autovacuum=off "
!              "-c autovacuum_freeze_max_age=2000000000\" "
!              "start >> \"%s\" 2>&1" SYSTEMQUOTE,
!              bindir,
! #ifndef WIN32
!              log_opts.filename, datadir, port, log_opts.filename);
! #else
!              DEVNULL, datadir, port, DEVNULL);
! #endif
      exec_prog(true, "%s", cmd);

      /* wait for the server to start properly */
--- 206,218 ----
       */
      snprintf(cmd, sizeof(cmd),
               SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
!              "-o \"-p %d %s\" start >> \"%s\" 2>&1" SYSTEMQUOTE,
!              bindir, output_filename, datadir, port,
!              (cluster->controldata.cat_ver >=
!                 CAT_VER_BINARY_UPGRADE_SERVER_FLAG) ? "-b" :
!                 "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
!              log_opts.filename);
!
      exec_prog(true, "%s", cmd);

      /* wait for the server to start properly */
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
new file mode 100644
index 8c5670f..870ddba
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
*************** heap_create_with_catalog(const char *rel
*** 1051,1057 ****
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (OidIsValid(binary_upgrade_next_heap_pg_class_oid) &&
              (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE ||
               relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE ||
               relkind == RELKIND_FOREIGN_TABLE))
--- 1051,1058 ----
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (IsBinaryUpgrade &&
!             OidIsValid(binary_upgrade_next_heap_pg_class_oid) &&
              (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE ||
               relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE ||
               relkind == RELKIND_FOREIGN_TABLE))
*************** heap_create_with_catalog(const char *rel
*** 1059,1065 ****
              relid = binary_upgrade_next_heap_pg_class_oid;
              binary_upgrade_next_heap_pg_class_oid = InvalidOid;
          }
!         else if (OidIsValid(binary_upgrade_next_toast_pg_class_oid) &&
                   relkind == RELKIND_TOASTVALUE)
          {
              relid = binary_upgrade_next_toast_pg_class_oid;
--- 1060,1067 ----
              relid = binary_upgrade_next_heap_pg_class_oid;
              binary_upgrade_next_heap_pg_class_oid = InvalidOid;
          }
!         else if (IsBinaryUpgrade &&
!                  OidIsValid(binary_upgrade_next_toast_pg_class_oid) &&
                   relkind == RELKIND_TOASTVALUE)
          {
              relid = binary_upgrade_next_toast_pg_class_oid;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
new file mode 100644
index c79402c..a662cfc
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
*************** index_create(Relation heapRelation,
*** 790,796 ****
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (OidIsValid(binary_upgrade_next_index_pg_class_oid))
          {
              indexRelationId = binary_upgrade_next_index_pg_class_oid;
              binary_upgrade_next_index_pg_class_oid = InvalidOid;
--- 790,797 ----
           * Use binary-upgrade override for pg_class.oid/relfilenode, if
           * supplied.
           */
!         if (IsBinaryUpgrade &&
!             OidIsValid(binary_upgrade_next_index_pg_class_oid))
          {
              indexRelationId = binary_upgrade_next_index_pg_class_oid;
              binary_upgrade_next_index_pg_class_oid = InvalidOid;
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
new file mode 100644
index 08d8aa1..61a9322
*** a/src/backend/catalog/pg_enum.c
--- b/src/backend/catalog/pg_enum.c
***************
*** 21,26 ****
--- 21,27 ----
  #include "catalog/pg_enum.h"
  #include "catalog/pg_type.h"
  #include "storage/lmgr.h"
+ #include "miscadmin.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
  #include "utils/rel.h"
*************** restart:
*** 311,317 ****
      }

      /* Get a new OID for the new label */
!     if (OidIsValid(binary_upgrade_next_pg_enum_oid))
      {
          /*
           * Use binary-upgrade override for pg_enum.oid, if supplied. During
--- 312,318 ----
      }

      /* Get a new OID for the new label */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_enum_oid))
      {
          /*
           * Use binary-upgrade override for pg_enum.oid, if supplied. During
diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
new file mode 100644
index 9e35e73..80c1bfc
*** a/src/backend/catalog/pg_type.c
--- b/src/backend/catalog/pg_type.c
*************** TypeShellMake(const char *typeName, Oid
*** 125,131 ****
      tup = heap_form_tuple(tupDesc, values, nulls);

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (OidIsValid(binary_upgrade_next_pg_type_oid))
      {
          HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
          binary_upgrade_next_pg_type_oid = InvalidOid;
--- 125,131 ----
      tup = heap_form_tuple(tupDesc, values, nulls);

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid))
      {
          HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
          binary_upgrade_next_pg_type_oid = InvalidOid;
*************** TypeCreate(Oid newTypeOid,
*** 430,436 ****
          if (OidIsValid(newTypeOid))
              HeapTupleSetOid(tup, newTypeOid);
          /* Use binary-upgrade override for pg_type.oid, if supplied. */
!         else if (OidIsValid(binary_upgrade_next_pg_type_oid))
          {
              HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
              binary_upgrade_next_pg_type_oid = InvalidOid;
--- 430,436 ----
          if (OidIsValid(newTypeOid))
              HeapTupleSetOid(tup, newTypeOid);
          /* Use binary-upgrade override for pg_type.oid, if supplied. */
!         else if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid))
          {
              HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid);
              binary_upgrade_next_pg_type_oid = InvalidOid;
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
new file mode 100644
index 85fe57f..362d26d
*** a/src/backend/catalog/toasting.c
--- b/src/backend/catalog/toasting.c
*************** create_toast_table(Relation rel, Oid toa
*** 157,163 ****
       * creation even if it seems not to need one.
       */
      if (!needs_toast_table(rel) &&
!         !OidIsValid(binary_upgrade_next_toast_pg_class_oid))
          return false;

      /*
--- 157,164 ----
       * creation even if it seems not to need one.
       */
      if (!needs_toast_table(rel) &&
!         (!IsBinaryUpgrade ||
!          !OidIsValid(binary_upgrade_next_toast_pg_class_oid)))
          return false;

      /*
*************** create_toast_table(Relation rel, Oid toa
*** 202,208 ****
          namespaceid = PG_TOAST_NAMESPACE;

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (OidIsValid(binary_upgrade_next_toast_pg_type_oid))
      {
          toast_typid = binary_upgrade_next_toast_pg_type_oid;
          binary_upgrade_next_toast_pg_type_oid = InvalidOid;
--- 203,209 ----
          namespaceid = PG_TOAST_NAMESPACE;

      /* Use binary-upgrade override for pg_type.oid, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_toast_pg_type_oid))
      {
          toast_typid = binary_upgrade_next_toast_pg_type_oid;
          binary_upgrade_next_toast_pg_type_oid = InvalidOid;
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
new file mode 100644
index 1a20b0d..b3b6dc2
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
*************** AssignTypeArrayOid(void)
*** 1550,1556 ****
      Oid            type_array_oid;

      /* Use binary-upgrade override for pg_type.typarray, if supplied. */
!     if (OidIsValid(binary_upgrade_next_array_pg_type_oid))
      {
          type_array_oid = binary_upgrade_next_array_pg_type_oid;
          binary_upgrade_next_array_pg_type_oid = InvalidOid;
--- 1550,1556 ----
      Oid            type_array_oid;

      /* Use binary-upgrade override for pg_type.typarray, if supplied. */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_array_pg_type_oid))
      {
          type_array_oid = binary_upgrade_next_array_pg_type_oid;
          binary_upgrade_next_array_pg_type_oid = InvalidOid;
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
new file mode 100644
index 3f7d499..838d6eb
*** a/src/backend/commands/user.c
--- b/src/backend/commands/user.c
*************** CreateRole(CreateRoleStmt *stmt)
*** 388,394 ****
       * pg_largeobject_metadata contains pg_authid.oid's, so we use the
       * binary-upgrade override, if specified.
       */
!     if (OidIsValid(binary_upgrade_next_pg_authid_oid))
      {
          HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid);
          binary_upgrade_next_pg_authid_oid = InvalidOid;
--- 388,394 ----
       * pg_largeobject_metadata contains pg_authid.oid's, so we use the
       * binary-upgrade override, if specified.
       */
!     if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_authid_oid))
      {
          HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid);
          binary_upgrade_next_pg_authid_oid = InvalidOid;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
new file mode 100644
index 6e7f664..c0cf033
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** PostmasterMain(int argc, char *argv[])
*** 529,535 ****
       * tcop/postgres.c (the option sets should not conflict) and with the
       * common help() function in main/main.c.
       */
!     while ((opt = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1)
      {
          switch (opt)
          {
--- 529,535 ----
       * tcop/postgres.c (the option sets should not conflict) and with the
       * common help() function in main/main.c.
       */
!     while ((opt = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1)
      {
          switch (opt)
          {
*************** PostmasterMain(int argc, char *argv[])
*** 541,546 ****
--- 541,551 ----
                  SetConfigOption("shared_buffers", optarg, PGC_POSTMASTER, PGC_S_ARGV);
                  break;

+             case 'b':
+                 /* Undocumented flag used for binary upgrades */
+                 IsBinaryUpgrade = true;
+                 break;
+
              case 'D':
                  userDoption = optarg;
                  break;
*************** ServerLoop(void)
*** 1480,1487 ****
          if (WalWriterPID == 0 && pmState == PM_RUN)
              WalWriterPID = StartWalWriter();

!         /* If we have lost the autovacuum launcher, try to start a new one */
!         if (AutoVacPID == 0 &&
              (AutoVacuumingActive() || start_autovac_launcher) &&
              pmState == PM_RUN)
          {
--- 1485,1497 ----
          if (WalWriterPID == 0 && pmState == PM_RUN)
              WalWriterPID = StartWalWriter();

!         /*
!          *    If we have lost the autovacuum launcher, try to start a new one.
!          *    We don't want autovacuum to run in binary upgrade mode because
!          *    autovacuum might update relfrozenxid for empty tables before
!          *    the physical files are put in place.
!          */
!         if (!IsBinaryUpgrade && AutoVacPID == 0 &&
              (AutoVacuumingActive() || start_autovac_launcher) &&
              pmState == PM_RUN)
          {
*************** reaper(SIGNAL_ARGS)
*** 2413,2419 ****
               */
              if (WalWriterPID == 0)
                  WalWriterPID = StartWalWriter();
!             if (AutoVacuumingActive() && AutoVacPID == 0)
                  AutoVacPID = StartAutoVacLauncher();
              if (XLogArchivingActive() && PgArchPID == 0)
                  PgArchPID = pgarch_start();
--- 2423,2429 ----
               */
              if (WalWriterPID == 0)
                  WalWriterPID = StartWalWriter();
!             if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0)
                  AutoVacPID = StartAutoVacLauncher();
              if (XLogArchivingActive() && PgArchPID == 0)
                  PgArchPID = pgarch_start();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
new file mode 100644
index 59b7666..a07661f
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** process_postgres_switches(int argc, char
*** 3238,3244 ****
       * postmaster/postmaster.c (the option sets should not conflict) and with
       * the common help() function in main/main.c.
       */
!     while ((flag = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1)
      {
          switch (flag)
          {
--- 3238,3244 ----
       * postmaster/postmaster.c (the option sets should not conflict) and with
       * the common help() function in main/main.c.
       */
!     while ((flag = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1)
      {
          switch (flag)
          {
*************** process_postgres_switches(int argc, char
*** 3250,3255 ****
--- 3250,3260 ----
                  SetConfigOption("shared_buffers", optarg, ctx, gucsource);
                  break;

+             case 'b':
+                 /* Undocumented flag used for binary upgrades */
+                 IsBinaryUpgrade = true;
+                 break;
+
              case 'D':
                  if (secure)
                      userDoption = strdup(optarg);
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
new file mode 100644
index 984ffd0..c4c4154
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
*************** pid_t        PostmasterPid = 0;
*** 85,90 ****
--- 85,91 ----
   */
  bool        IsPostmasterEnvironment = false;
  bool        IsUnderPostmaster = false;
+ bool        IsBinaryUpgrade = false;

  bool        ExitOnAnyError = false;

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
new file mode 100644
index a4c5d4c..1f6fba5
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*************** InitPostgres(const char *in_dbname, Oid
*** 626,631 ****
--- 626,641 ----
      }

      /*
+      * Binary upgrades only allowed super-user connections
+      */
+     if (IsBinaryUpgrade && !am_superuser)
+     {
+             ereport(FATAL,
+                     (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+             errmsg("must be superuser to connect in binary upgrade mode")));
+     }
+
+     /*
       * The last few connections slots are reserved for superusers. Although
       * replication connections currently require superuser privileges, we
       * don't allow them to consume the reserved slots, which are intended for
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
new file mode 100644
index 53c684a..22926b0
*** a/src/include/catalog/catversion.h
--- b/src/include/catalog/catversion.h
***************
*** 53,58 ****
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    201104181

  #endif
--- 53,58 ----
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    201104221

  #endif
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
new file mode 100644
index aa8cce5..9d19417
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
*************** do { \
*** 124,129 ****
--- 124,130 ----
  extern pid_t PostmasterPid;
  extern bool IsPostmasterEnvironment;
  extern PGDLLIMPORT bool IsUnderPostmaster;
+ extern bool IsBinaryUpgrade;

  extern bool ExitOnAnyError;


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I thought some more about this and I don't want autovacuum to run on the
> > old server.  This is because pg_dumpall --binary-upgrade --schema-only
> > grabs the datfrozenxid for all the databases at the start, then connects
> > to each database to gets the relfrozenxids.  I don't want to risk any
> > advancement of either of those during the pg_dumpall run.
> 
> Why?  It doesn't really matter --- if you grab a value that is older
> than the latest, it's still valid.  As Robert said, you're
> over-engineering this, and thereby introducing potential failure modes,
> for no gain.

Uh, I am kind of paranoid about pg_upgrade because it is trying to do
something Postgres was never designed to do.  I am a little worried that
we would be assuming that pg_dumpall always does the datfrozenxid first
and if we ever did it last we would have relfrozenxids before the
datfrozenxid.  I am worried if we don't prevent autovacuum on the old
server that pg_upgrade will be more fragile to changes in other parts of
the system.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Patch for pg_upgrade to turn off autovacuum

From
Jeff Davis
Date:
On Fri, 2011-04-22 at 17:34 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > I thought some more about this and I don't want autovacuum to run on the
> > > old server.  This is because pg_dumpall --binary-upgrade --schema-only
> > > grabs the datfrozenxid for all the databases at the start, then connects
> > > to each database to gets the relfrozenxids.  I don't want to risk any
> > > advancement of either of those during the pg_dumpall run.
> > 
> > Why?  It doesn't really matter --- if you grab a value that is older
> > than the latest, it's still valid.  As Robert said, you're
> > over-engineering this, and thereby introducing potential failure modes,
> > for no gain.
> 
> Uh, I am kind of paranoid about pg_upgrade because it is trying to do
> something Postgres was never designed to do.  I am a little worried that
> we would be assuming that pg_dumpall always does the datfrozenxid first
> and if we ever did it last we would have relfrozenxids before the
> datfrozenxid.  I am worried if we don't prevent autovacuum on the old
> server that pg_upgrade will be more fragile to changes in other parts of
> the system.

If we back-patch the "-b" to 8.3, then we can always use it on both the
old and new systems. Upgrading to the latest patch-level on both old and
new should be a prerequisite for pg_upgrade anyway.

That would turn the catalog check from a special case (use "-b"
sometimes, other times don't; which could cause fragility and bugs),
into just another sanity check with an easy workaround ("your postgres
doesn't support '-b', upgrade to the latest patch-level before
upgrading").

One of the things I like about the design of pg_upgrade is that it
doesn't seem to have a lot of special cases for different version
combinations.

What do you think?

Regards,Jeff Davis




Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Jeff Davis wrote:
> On Fri, 2011-04-22 at 17:34 -0400, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > I thought some more about this and I don't want autovacuum to run on the
> > > > old server.  This is because pg_dumpall --binary-upgrade --schema-only
> > > > grabs the datfrozenxid for all the databases at the start, then connects
> > > > to each database to gets the relfrozenxids.  I don't want to risk any
> > > > advancement of either of those during the pg_dumpall run.
> > > 
> > > Why?  It doesn't really matter --- if you grab a value that is older
> > > than the latest, it's still valid.  As Robert said, you're
> > > over-engineering this, and thereby introducing potential failure modes,
> > > for no gain.
> > 
> > Uh, I am kind of paranoid about pg_upgrade because it is trying to do
> > something Postgres was never designed to do.  I am a little worried that
> > we would be assuming that pg_dumpall always does the datfrozenxid first
> > and if we ever did it last we would have relfrozenxids before the
> > datfrozenxid.  I am worried if we don't prevent autovacuum on the old
> > server that pg_upgrade will be more fragile to changes in other parts of
> > the system.
> 
> If we back-patch the "-b" to 8.3, then we can always use it on both the
> old and new systems. Upgrading to the latest patch-level on both old and
> new should be a prerequisite for pg_upgrade anyway.
>
> That would turn the catalog check from a special case (use "-b"
> sometimes, other times don't; which could cause fragility and bugs),
> into just another sanity check with an easy workaround ("your postgres
> doesn't support '-b', upgrade to the latest patch-level before
> upgrading").
> 
> One of the things I like about the design of pg_upgrade is that it
> doesn't seem to have a lot of special cases for different version
> combinations.
> 
> What do you think?

Well, I am concerned that there isn't enough testing of the -b flag to
be sure it has zero effect on a running server that is not doing a
binary upgrade, which is why I liked doing it only in 9.1.  And we would
still need code to check if the -b flag is supported.

We can save this for 9.2 if people prefer, but we would still need a PG
version check, rather than a catalog version check.

Of course, if people prefer backpatch, we can do that, but I would need
many more eyes on this patch.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > I thought some more about this and I don't want autovacuum to run on the
> > > old server.  This is because pg_dumpall --binary-upgrade --schema-only
> > > grabs the datfrozenxid for all the databases at the start, then connects
> > > to each database to gets the relfrozenxids.  I don't want to risk any
> > > advancement of either of those during the pg_dumpall run.
> > 
> > Why?  It doesn't really matter --- if you grab a value that is older
> > than the latest, it's still valid.  As Robert said, you're
> > over-engineering this, and thereby introducing potential failure modes,
> > for no gain.
> 
> Uh, I am kind of paranoid about pg_upgrade because it is trying to do
> something Postgres was never designed to do.  I am a little worried that
> we would be assuming that pg_dumpall always does the datfrozenxid first
> and if we ever did it last we would have relfrozenxids before the
> datfrozenxid.  I am worried if we don't prevent autovacuum on the old
> server that pg_upgrade will be more fragile to changes in other parts of
> the system.

Hold, I overstated the fragility issue above.  I now realize that the
old system is not going to change and that I only need to worry about
future changes, where are handled by the new -b flag, so maybe we can
get away with only stopping autovacuum on the new server, but I would
need someone to verify that, and this would be a change in the way 9.0
pg_upgrade operated because it did disable autovacuum on the old and new
servers with 99.9% reliability.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > I thought some more about this and I don't want autovacuum to run on the
> > > > old server.  This is because pg_dumpall --binary-upgrade --schema-only
> > > > grabs the datfrozenxid for all the databases at the start, then connects
> > > > to each database to gets the relfrozenxids.  I don't want to risk any
> > > > advancement of either of those during the pg_dumpall run.
> > > 
> > > Why?  It doesn't really matter --- if you grab a value that is older
> > > than the latest, it's still valid.  As Robert said, you're
> > > over-engineering this, and thereby introducing potential failure modes,
> > > for no gain.
> > 
> > Uh, I am kind of paranoid about pg_upgrade because it is trying to do
> > something Postgres was never designed to do.  I am a little worried that
> > we would be assuming that pg_dumpall always does the datfrozenxid first
> > and if we ever did it last we would have relfrozenxids before the
> > datfrozenxid.  I am worried if we don't prevent autovacuum on the old
> > server that pg_upgrade will be more fragile to changes in other parts of
> > the system.
> 
> Hold, I overstated the fragility issue above.  I now realize that the
> old system is not going to change and that I only need to worry about
> future changes, where are handled by the new -b flag, so maybe we can
> get away with only stopping autovacuum on the new server, but I would
> need someone to verify that, and this would be a change in the way 9.0
> pg_upgrade operated because it did disable autovacuum on the old and new
> servers with 99.9% reliability.

Well, having seen no replies, I am going to apply the version of the
patch in a few days that keeps the old vacuum-disable behavior for older
releases, and uses the -b flag for newer ones by testing the catalog
version, e.g.:
   snprintf(cmd, sizeof(cmd),            SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "            "-o \"-p %d %s\"
start>> \"%s\" 2>&1" SYSTEMQUOTE,            bindir, output_filename, datadir, port,
(cluster->controldata.cat_ver>=               BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" :               "-c
autovacuum=off-c autovacuum_freeze_max_age=2000000000",            log_opts.filename);
 

I know people like that pg_upgrade doesn't care much about what version
it is running on, but it is really the ability of pg_upgrade to ignore
changes made to the server that is really why pg_upgrade is useful, and
this change makes pg_upgrade even more immune to such changes.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Patch for pg_upgrade to turn off autovacuum

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Well, having seen no replies, I am going to apply the version of the
> patch in a few days that keeps the old vacuum-disable behavior for older
> releases, and uses the -b flag for newer ones by testing the catalog
> version, e.g.:
> 
>     snprintf(cmd, sizeof(cmd),
>              SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
>              "-o \"-p %d %s\" start >> \"%s\" 2>&1" SYSTEMQUOTE,
>              bindir, output_filename, datadir, port,
>              (cluster->controldata.cat_ver >=
>                 BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" :
>                 "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
>              log_opts.filename);
> 
> I know people like that pg_upgrade doesn't care much about what version
> it is running on, but it is really the ability of pg_upgrade to ignore
> changes made to the server that is really why pg_upgrade is useful, and
> this change makes pg_upgrade even more immune to such changes.

Applied.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +