Thread: pg_upgrade - link mode and transaction-wraparound data loss

pg_upgrade - link mode and transaction-wraparound data loss

From

jesper@krogh.cc

Date:

18 May 2010, 10:12:02

Hi

I tried running pg_upgrade from the current snapshot of postgresql and
upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
in the process and all that came out was only "ok's" but when I tried a
simple query on the databse it keeps throwing these message out of the back
side.

DETAIL:  You might have already suffered transaction-wraparound data loss.
WARNING:  some databases have not been vacuumed in over 2 billion
transactions


The database was around 600GB and it took a couple of minutes to run
pg_upgrade after I had all the binaries in the correct place.

It is not really an easy task to throw around 600GB of data, so I cannot
gaurantee that the above is reproducible, but I'll see if I can get time
and try to reproduce it.

Jesper

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

18 May 2010, 13:57:54

jesper@krogh.cc wrote:
> Hi
> 
> I tried running pg_upgrade from the current snapshot of postgresql and
> upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
> in the process and all that came out was only "ok's" but when I tried a
> simple query on the databse it keeps throwing these message out of the back
> side.
> 
> DETAIL:  You might have already suffered transaction-wraparound data loss.
> WARNING:  some databases have not been vacuumed in over 2 billion
> transactions
> 
> 
> The database was around 600GB and it took a couple of minutes to run
> pg_upgrade after I had all the binaries in the correct place.
> 
> It is not really an easy task to throw around 600GB of data, so I cannot
> gaurantee that the above is reproducible, but I'll see if I can get time
> and try to reproduce it.

This certainly should never have happened, so I am guessing it is a bug.
pg_upgrade tries hard to make sure all your datfrozenxid and
relfrozenxid are properly migrated from the old server, and the
transaction id is set properly.  Unfortunately this is the first time I
have heard of such a problem, so I am unclear on its cause.

The warning is issued from vacuum.c::vac_truncate_clog().  Can you run
this query and show us the output:
SELECT datname, datfrozenxid FROM pg_database;

It would be good to see these numbers on both the old and new servers.
I would also like to see:
SELECT txid_current();

on the old and new servers, but if you can only provide these values on
one of the two servers, it is still useful.  Thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

jesper@krogh.cc

Date:

18 May 2010, 14:22:28

> jesper@krogh.cc wrote:
>> Hi
>>
>> I tried running pg_upgrade from the current snapshot of postgresql and
>> upgrading from 8.4.4 to the snapshot version. Everything seem to look
>> fine
>> in the process and all that came out was only "ok's" but when I tried a
>> simple query on the databse it keeps throwing these message out of the
>> back
>> side.
>>
>> DETAIL:  You might have already suffered transaction-wraparound data
>> loss.
>> WARNING:  some databases have not been vacuumed in over 2 billion
>> transactions
>>
>>
>> The database was around 600GB and it took a couple of minutes to run
>> pg_upgrade after I had all the binaries in the correct place.
>>
>> It is not really an easy task to throw around 600GB of data, so I cannot
>> gaurantee that the above is reproducible, but I'll see if I can get time
>> and try to reproduce it.
>
> This certainly should never have happened, so I am guessing it is a bug.
> pg_upgrade tries hard to make sure all your datfrozenxid and
> relfrozenxid are properly migrated from the old server, and the
> transaction id is set properly.  Unfortunately this is the first time I
> have heard of such a problem, so I am unclear on its cause.
>
> The warning is issued from vacuum.c::vac_truncate_clog().  Can you run
> this query and show us the output:
>
>     SELECT datname, datfrozenxid FROM pg_database;
>
> It would be good to see these numbers on both the old and new servers.
> I would also like to see:
>
>     SELECT txid_current();
>
> on the old and new servers, but if you can only provide these values on
> one of the two servers, it is still useful.  Thanks.

Hi Bruce, thanks for your prompt response.

First the new one..

jk@pal:~$ psql -p 5433
psql (9.0beta1)
Type "help" for help.

data=# SELECT datname, datfrozenxid FROM pg_database; datname  | datfrozenxid
-----------+--------------template0 |          654postgres  |   2374592801data      |   2023782337jk        |
2023822188template1|   2374592801workqueue |   2023822188
 
(6 rows)

data=#         SELECT txid_current();txid_current
--------------  2375384556
(1 row)

data=# \q

Then the old one.

jk@pal:~$ psql data
psql (9.0beta1, server 8.4.1)
WARNING: psql version 9.0, server version 8.4.        Some psql features might not work.
Type "help" for help.

data# SELECT datname, datfrozenxid FROM pg_database; datname  | datfrozenxid
-----------+--------------template0 |   2073823552postgres  |   2023820521data      |   2023782337jk        |
2023822188template1|   2073823552workqueue |   2023822188
 
(6 rows)

data=# SELECT txid_current();txid_current
--------------  2390524243
(1 row)


The old database has been "copied" over using rsync and
pg_start_backup()/pg_stop_backup() procecures and started up
using a recovery.conf file.

Jesper

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

18 May 2010, 15:52:27

jesper@krogh.cc wrote:
> > on the old and new servers, but if you can only provide these values on
> > one of the two servers, it is still useful.  Thanks.
> 
> Hi Bruce, thanks for your prompt response.
> 
> First the new one..

Great.

> jk@pal:~$ psql -p 5433
> psql (9.0beta1)
> Type "help" for help.
> 
> data=# SELECT datname, datfrozenxid FROM pg_database;
>   datname  | datfrozenxid
> -----------+--------------
>  template0 |          654
>  postgres  |   2374592801
>  data      |   2023782337
>  jk        |   2023822188
>  template1 |   2374592801
>  workqueue |   2023822188
> (6 rows)
> 
> data=#         SELECT txid_current();
>  txid_current
> --------------
>    2375384556
> (1 row)
> 
> data=# \q
> 
> Then the old one.
> 
> jk@pal:~$ psql data
> psql (9.0beta1, server 8.4.1)
> WARNING: psql version 9.0, server version 8.4.
>          Some psql features might not work.
> Type "help" for help.
> 
> data# SELECT datname, datfrozenxid FROM pg_database;
>   datname  | datfrozenxid
> -----------+--------------
>  template0 |   2073823552

This line above looks very odd because I didn't think the template0
datfrozenxid could be advanced.  Can I see the output of this query:
SELECT datname, datfrozenxid, datallowconn FROM pg_database;

I am wondering if you set datallowconn for template0 to 'true'.

>  postgres  |   2023820521
>  data      |   2023782337
>  jk        |   2023822188
>  template1 |   2073823552
>  workqueue |   2023822188
> (6 rows)
> 
> data=# SELECT txid_current();
>  txid_current
> --------------
>    2390524243
> (1 row)
> 
> 
> The old database has been "copied" over using rsync and
> pg_start_backup()/pg_stop_backup() procecures and started up
> using a recovery.conf file.

My other idea is that somehow recovery touches datallowconn for
template0.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Jesper Krogh

Date:

18 May 2010, 16:22:05

On 2010-05-18 20:52, Bruce Momjian wrote:
> This line above looks very odd because I didn't think the template0
> datfrozenxid could be advanced.  Can I see the output of this query:
>
>     SELECT datname, datfrozenxid, datallowconn FROM pg_database;
>
>    

Only from the "old" database:
data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;  datname  | datfrozenxid | datallowconn
-----------+--------------+-------------- template0 |   2073823552 | f postgres  |   2023820521 | t data      |
2023782337| t jk        |   2023822188 | t template1 |   2073823552 | t workqueue |   2023822188 | t
 
(6 rows)

> I am wondering if you set datallowconn for template0 to 'true'.
From this database, I cannot give any more results, I ran some other
queries and then restarted postgres, subsequently it seemed
totally broken. I'm in the process of running the test over again, but
it'll take a while before data is in.  I'll report back.

-- 
Jesper

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

18 May 2010, 16:57:01

Jesper Krogh wrote:
> On 2010-05-18 20:52, Bruce Momjian wrote:
> > This line above looks very odd because I didn't think the template0
> > datfrozenxid could be advanced.  Can I see the output of this query:
> >
> >     SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >
> >    
> 
> Only from the "old" database:
> data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
>    datname  | datfrozenxid | datallowconn
> -----------+--------------+--------------
>   template0 |   2073823552 | f
>   postgres  |   2023820521 | t
>   data      |   2023782337 | t
>   jk        |   2023822188 | t
>   template1 |   2073823552 | t
>   workqueue |   2023822188 | t
> (6 rows)

OK, datallowconn = false is right for template0, but I am still confused
how it got set to that high value.

> > I am wondering if you set datallowconn for template0 to 'true'.
> 
>  From this database, I cannot give any more results, I ran some other
> queries and then restarted postgres, subsequently it seemed
> totally broken. I'm in the process of running the test over again, but
> it'll take a while before data is in.  I'll report back.

OK, thanks.  This does seem odd.  Frankly, having template0's
datfrozenxid be wrong would not cause any kind of instability because
template0 is used only by pg_dump, so I am wondering if something else
is seriously wrong.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

18 May 2010, 17:18:01

jesper@krogh.cc wrote:
> First the new one..
> 
> jk@pal:~$ psql -p 5433
> psql (9.0beta1)
> Type "help" for help.
> 
> data=# SELECT datname, datfrozenxid FROM pg_database;
>   datname  | datfrozenxid
> -----------+--------------
>  template0 |          654
>  postgres  |   2374592801
>  data      |   2023782337
>  jk        |   2023822188
>  template1 |   2374592801
>  workqueue |   2023822188
> (6 rows)
> 
> data=#         SELECT txid_current();
>  txid_current
> --------------
>    2375384556
> (1 row)

I just ran a test and all the datfrozenxids are less than the current
xid, so the only database that could be generating a wraparound warning
is 'template0'.  But, again, I though that template0 was not touched for
wraparound protection --- I am starting to think I am wrong.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Alvaro Herrera

Date:

18 May 2010, 17:20:31

Excerpts from jesper's message of mar may 18 13:22:12 -0400 2010:

> jk@pal:~$ psql data
> psql (9.0beta1, server 8.4.1)
> WARNING: psql version 9.0, server version 8.4.
>          Some psql features might not work.
> Type "help" for help.
> 
> data# SELECT datname, datfrozenxid FROM pg_database;
>   datname  | datfrozenxid
> -----------+--------------
>  template0 |   2073823552
>  postgres  |   2023820521
>  data      |   2023782337
>  jk        |   2023822188
>  template1 |   2073823552
>  workqueue |   2023822188
> (6 rows)

Does the old server have pg_database.datallowconn = true for template0?
--

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

18 May 2010, 17:32:39

Alvaro Herrera wrote:
> Excerpts from jesper's message of mar may 18 13:22:12 -0400 2010:
> 
> > jk@pal:~$ psql data
> > psql (9.0beta1, server 8.4.1)
> > WARNING: psql version 9.0, server version 8.4.
> >          Some psql features might not work.
> > Type "help" for help.
> > 
> > data# SELECT datname, datfrozenxid FROM pg_database;
> >   datname  | datfrozenxid
> > -----------+--------------
> >  template0 |   2073823552
> >  postgres  |   2023820521
> >  data      |   2023782337
> >  jk        |   2023822188
> >  template1 |   2073823552
> >  workqueue |   2023822188
> > (6 rows)
> 
> Does the old server have pg_database.datallowconn = true for template0?

The user reported back that it did not:
Only from the "old" database:data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;   datname  |
datfrozenxid| datallowconn-----------+--------------+--------------  template0 |   2073823552 | f  postgres  |
2023820521| t  data      |   2023782337 | t  jk        |   2023822188 | t  template1 |   2073823552 | t  workqueue |
2023822188| t(6 rows)
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Jesper Krogh

Date:

18 May 2010, 18:08:16

On 2010-05-18 21:56, Bruce Momjian wrote:
> Jesper Krogh wrote:
>    
>> On 2010-05-18 20:52, Bruce Momjian wrote:
>>      
>>> This line above looks very odd because I didn't think the template0
>>> datfrozenxid could be advanced.  Can I see the output of this query:
>>>
>>>     SELECT datname, datfrozenxid, datallowconn FROM pg_database;
>>>
>>>
>>>        
>> Only from the "old" database:
>> data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
>>     datname  | datfrozenxid | datallowconn
>> -----------+--------------+--------------
>>    template0 |   2073823552 | f
>>    postgres  |   2023820521 | t
>>    data      |   2023782337 | t
>>    jk        |   2023822188 | t
>>    template1 |   2073823552 | t
>>    workqueue |   2023822188 | t
>> (6 rows)
>>      
> OK, datallowconn = false is right for template0, but I am still confused
> how it got set to that high value.
>    

This is the "production system". I have absolutely no indications that
anything should be wrong in there. It has run rock-solid since it got
migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
scared about you telling that it seems wrong. (but that cannot be
attributed to pg_upgrade)

> OK, thanks.  This does seem odd.  Frankly, having template0's
> datfrozenxid be wrong would not cause any kind of instability because
> template0 is used only by pg_dump, so I am wondering if something else
> is seriously wrong.
>    
I also think that something was seriously wrong with the pg_upgrade'd
version. I'll try to reproduce and be a bit more carefull in tracking 
the steps
this time.

-- 
Jesper

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

18 May 2010, 18:16:38

Jesper Krogh wrote:
> On 2010-05-18 21:56, Bruce Momjian wrote:
> > Jesper Krogh wrote:
> >    
> >> On 2010-05-18 20:52, Bruce Momjian wrote:
> >>      
> >>> This line above looks very odd because I didn't think the template0
> >>> datfrozenxid could be advanced.  Can I see the output of this query:
> >>>
> >>>     SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >>>
> >>>
> >>>        
> >> Only from the "old" database:
> >> data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >>     datname  | datfrozenxid | datallowconn
> >> -----------+--------------+--------------
> >>    template0 |   2073823552 | f
> >>    postgres  |   2023820521 | t
> >>    data      |   2023782337 | t
> >>    jk        |   2023822188 | t
> >>    template1 |   2073823552 | t
> >>    workqueue |   2023822188 | t
> >> (6 rows)
> >>      
> > OK, datallowconn = false is right for template0, but I am still confused
> > how it got set to that high value.
> >    
> 
> This is the "production system". I have absolutely no indications that
> anything should be wrong in there. It has run rock-solid since it got
> migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> scared about you telling that it seems wrong. (but that cannot be
> attributed to pg_upgrade)

I am on chat with Alvaro now and it seems we do somehow connect to
template0 for transaction id wraparound.  I think Alvaro will post
shortly on this.

> > OK, thanks.  This does seem odd.  Frankly, having template0's
> > datfrozenxid be wrong would not cause any kind of instability because
> > template0 is used only by pg_dump, so I am wondering if something else
> > is seriously wrong.
> >    
> I also think that something was seriously wrong with the pg_upgrade'd
> version. I'll try to reproduce and be a bit more carefull in tracking 
> the steps
> this time.

Thanks, but I think the entire problem might be this template0 xid issue
that Alvaro and I are researching.  I can now see how invalid template0
xids could cause the instability you saw in the new database.  Odd no
one has seen this bug before.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Jesper Krogh

Date:

19 May 2010, 14:29:55

On 2010-05-18 18:57, Bruce Momjian wrote:
> jesper@krogh.cc wrote:
>    
>> Hi
>>
>> I tried running pg_upgrade from the current snapshot of postgresql and
>> upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
>> in the process and all that came out was only "ok's" but when I tried a
>> simple query on the databse it keeps throwing these message out of the back
>> side.
>>
>> DETAIL:  You might have already suffered transaction-wraparound data loss.
>> WARNING:  some databases have not been vacuumed in over 2 billion
>> transactions
>>
>>
>> The database was around 600GB and it took a couple of minutes to run
>> pg_upgrade after I had all the binaries in the correct place.
>>
>> It is not really an easy task to throw around 600GB of data, so I cannot
>> gaurantee that the above is reproducible, but I'll see if I can get time
>> and try to reproduce it.
>>      
> This certainly should never have happened, so I am guessing it is a bug.
> pg_upgrade tries hard to make sure all your datfrozenxid and
> relfrozenxid are properly migrated from the old server, and the
> transaction id is set properly.  Unfortunately this is the first time I
> have heard of such a problem, so I am unclear on its cause.
>    

Other people are typically way faster than I am looking into it.
Depesz has produced a full trace to reproduce the problem here:
http://www.depesz.com/index.php/2010/05/19/waiting-for-9-0-pg_upgrade/

Jesper
-- 
Jesper

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

19 May 2010, 16:31:06

Bruce Momjian wrote:
> > This is the "production system". I have absolutely no indications that
> > anything should be wrong in there. It has run rock-solid since it got
> > migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> > scared about you telling that it seems wrong. (but that cannot be
> > attributed to pg_upgrade)
>
> I am on chat with Alvaro now and it seems we do somehow connect to
> template0 for transaction id wraparound.  I think Alvaro will post
> shortly on this.
>
> > > OK, thanks.  This does seem odd.  Frankly, having template0's
> > > datfrozenxid be wrong would not cause any kind of instability because
> > > template0 is used only by pg_dump, so I am wondering if something else
> > > is seriously wrong.
> > >
> > I also think that something was seriously wrong with the pg_upgrade'd
> > version. I'll try to reproduce and be a bit more carefull in tracking
> > the steps
> > this time.
>
> Thanks, but I think the entire problem might be this template0 xid issue
> that Alvaro and I are researching.  I can now see how invalid template0
> xids could cause the instability you saw in the new database.  Odd no
> one has seen this bug before.

OK, after talking to Alvaro and Heikki, the problem is that while you
cannot connect to template0, it is accessed by autovacuum for vacuum
freeze, even if autovacuum is turned off.  I think the reason you are
seeing this bug is that your xid counter is near 2 billion (50% to
wraparound) and the original template0 xids are the maximum distance
from your counter.

I am attaching the newest patch which fixes this issue.  I did modify
this code yesterday with another patch, and I am unclear exactly if you
need that patch as well.  CVS now has all these changes.

If you could test with this and the earlier patch, I think it will now
work fine.  Thanks for the valuable testing, and quick feedback.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
Index: contrib/pg_upgrade/pg_upgrade.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/pg_upgrade/pg_upgrade.c,v
retrieving revision 1.3
diff -c -c -r1.3 pg_upgrade.c
*** contrib/pg_upgrade/pg_upgrade.c    18 May 2010 18:40:51 -0000    1.3
--- contrib/pg_upgrade/pg_upgrade.c    19 May 2010 18:20:03 -0000
***************
*** 164,170 ****
      check_ok(ctx);

      /*
!      * We do freeze after analyze so pg_statistic is also frozen
       */
      prep_status(ctx, "Freezing all rows on the new cluster");
      exec_prog(ctx, true,
--- 164,173 ----
      check_ok(ctx);

      /*
!      * We do freeze after analyze so pg_statistic is also frozen.
!      * template0 is not frozen here, but data rows were frozen by initdb,
!      * and we set its datfrozenxid and relfrozenxids later to match the
!      * new xid counter later.
       */
      prep_status(ctx, "Freezing all rows on the new cluster");
      exec_prog(ctx, true,
***************
*** 292,339 ****
  set_frozenxids(migratorContext *ctx)
  {
      int            dbnum;
!     PGconn       *conn;
      PGresult   *dbres;
      int            ntups;

      prep_status(ctx, "Setting frozenxid counters in new cluster");

!     conn = connectToServer(ctx, "template1", CLUSTER_NEW);

      /* set pg_database.datfrozenxid */
!     PQclear(executeQueryOrDie(ctx, conn,
                                "UPDATE pg_catalog.pg_database "
!                               "SET    datfrozenxid = '%u' "
!                               "WHERE datallowconn = true",
                                ctx->old.controldata.chkpnt_nxtxid));

      /* get database names */
!     dbres = executeQueryOrDie(ctx, conn,
!                               "SELECT    datname "
!                               "FROM    pg_catalog.pg_database "
!                               "WHERE datallowconn = true");

!     /* free dbres below */
!     PQfinish(conn);

      ntups = PQntuples(dbres);
      for (dbnum = 0; dbnum < ntups; dbnum++)
      {
!         conn = connectToServer(ctx, PQgetvalue(dbres, dbnum, 0), CLUSTER_NEW);

          /* set pg_class.relfrozenxid */
          PQclear(executeQueryOrDie(ctx, conn,
                                    "UPDATE    pg_catalog.pg_class "
                                    "SET    relfrozenxid = '%u' "
          /* only heap and TOAST are vacuumed */
!                                   "WHERE    relkind = 'r' OR "
!                                   "        relkind = 't'",
                                    ctx->old.controldata.chkpnt_nxtxid));
          PQfinish(conn);
      }

      PQclear(dbres);

      check_ok(ctx);
  }

--- 295,366 ----
  set_frozenxids(migratorContext *ctx)
  {
      int            dbnum;
!     PGconn       *conn, *conn_template1;
      PGresult   *dbres;
      int            ntups;
+     int            i_datname;
+     int            i_datallowconn;

      prep_status(ctx, "Setting frozenxid counters in new cluster");

!     conn_template1 = connectToServer(ctx, "template1", CLUSTER_NEW);

      /* set pg_database.datfrozenxid */
!     PQclear(executeQueryOrDie(ctx, conn_template1,
                                "UPDATE pg_catalog.pg_database "
!                               "SET    datfrozenxid = '%u'",
                                ctx->old.controldata.chkpnt_nxtxid));

      /* get database names */
!     dbres = executeQueryOrDie(ctx, conn_template1,
!                               "SELECT    datname, datallowconn "
!                               "FROM    pg_catalog.pg_database");

!     i_datname = PQfnumber(dbres, "datname");
!     i_datallowconn = PQfnumber(dbres, "datallowconn");

      ntups = PQntuples(dbres);
      for (dbnum = 0; dbnum < ntups; dbnum++)
      {
!         char *datname = PQgetvalue(dbres, dbnum, i_datname);
!         char *datallowconn= PQgetvalue(dbres, dbnum, i_datallowconn);
!
!         /*
!          *    We must update databases where datallowconn = false, e.g.
!          *    template0, because autovacuum increments their datfrozenxids and
!          *    relfrozenxids even if autovacuum is turned off, and even though
!          *    all the data rows are already frozen  To enable this, we
!          *    temporarily change datallowconn.
!          */
!         if (strcmp(datallowconn, "f") == 0)
!             PQclear(executeQueryOrDie(ctx, conn_template1,
!                                   "UPDATE pg_catalog.pg_database "
!                                   "SET    datallowconn = true "
!                                   "WHERE datname = '%s'", datname));
!
!         conn = connectToServer(ctx, datname, CLUSTER_NEW);

          /* set pg_class.relfrozenxid */
          PQclear(executeQueryOrDie(ctx, conn,
                                    "UPDATE    pg_catalog.pg_class "
                                    "SET    relfrozenxid = '%u' "
          /* only heap and TOAST are vacuumed */
!                                   "WHERE    relkind IN ('r', 't')",
                                    ctx->old.controldata.chkpnt_nxtxid));
          PQfinish(conn);
+
+         /* Reset datallowconn flag */
+         if (strcmp(datallowconn, "f") == 0)
+             PQclear(executeQueryOrDie(ctx, conn_template1,
+                                   "UPDATE pg_catalog.pg_database "
+                                   "SET    datallowconn = false "
+                                   "WHERE datname = '%s'", datname));
      }

      PQclear(dbres);

+     PQfinish(conn_template1);
+
      check_ok(ctx);
  }

Re: pg_upgrade - link mode and transaction-wraparound data loss

From

Bruce Momjian

Date:

19 May 2010, 16:58:42

Jesper Krogh wrote:
> On 2010-05-18 18:57, Bruce Momjian wrote:
> > jesper@krogh.cc wrote:
> >    
> >> Hi
> >>
> >> I tried running pg_upgrade from the current snapshot of postgresql and
> >> upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
> >> in the process and all that came out was only "ok's" but when I tried a
> >> simple query on the databse it keeps throwing these message out of the back
> >> side.
> >>
> >> DETAIL:  You might have already suffered transaction-wraparound data loss.
> >> WARNING:  some databases have not been vacuumed in over 2 billion
> >> transactions
> >>
> >>
> >> The database was around 600GB and it took a couple of minutes to run
> >> pg_upgrade after I had all the binaries in the correct place.
> >>
> >> It is not really an easy task to throw around 600GB of data, so I cannot
> >> gaurantee that the above is reproducible, but I'll see if I can get time
> >> and try to reproduce it.
> >>      
> > This certainly should never have happened, so I am guessing it is a bug.
> > pg_upgrade tries hard to make sure all your datfrozenxid and
> > relfrozenxid are properly migrated from the old server, and the
> > transaction id is set properly.  Unfortunately this is the first time I
> > have heard of such a problem, so I am unclear on its cause.
> >    
> 
> Other people are typically way faster than I am looking into it.
> Depesz has produced a full trace to reproduce the problem here:
> http://www.depesz.com/index.php/2010/05/19/waiting-for-9-0-pg_upgrade/

Thanks. I have commented on the blog to mention that we have fixed the
bug reported there.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com