Re: 9.2 to 9.5 pg_upgrade losing data - Mailing list pgsql-general

From Pete Fuller
Subject Re: 9.2 to 9.5 pg_upgrade losing data
Date
Msg-id 003C41BF-5732-4685-9FE9-DD0C807EB15C@3sitracking.com
Whole thread Raw
In response to Re: 9.2 to 9.5 pg_upgrade losing data  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: 9.2 to 9.5 pg_upgrade losing data  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Directories are correct.  We do not utilize tablespaces. 




On Aug 15, 2016, at 10:06 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 08/15/2016 06:20 AM, Pete Fuller wrote:
Hello all,

We are attempting to upgrade a production 9.2 postgres cluster to 9.5. The server is running Centos 7 with the Centos version - currently 9.2.15 We are installing the postgresql provided rpm of postgresql9.5 from the postgresql repo, currently 9.5.4. In testing and on our development cluster, the upgrade worked without issue but something is happening on our production DB that we can not figure out.

The 9.2 cluster is in streaming replication controlled by pacemaker. Steps we have been using are the following.

• verify the replica server is in a good state with up to date data, then put the machine in standby, stopping the 9.2 db
• Removing the recovery.conf file and starting the 9.2 server manually to get some stats to verify later (counts of rows from several key tables)
• Stop the 9.2 server, (/bin/pg_ctl stop -D /var/lib/pgsql/data ) and install the 9.5 binaries.
• init the 9.5 db
• run a pg_upgrade with the link option. The full command we are using is
/usr/pgsql-9.5/bin/pg_upgrade -d /var/lib/pgsql/data -D /var/lib/pgsql/9.5/data -b /bin -B /usr/pgsql-9.5/bin/ --link -r -v

Are you sure you are linking the appropriate directories?

Are there any tablespaces in the mix?

• run the full vacuum analyze script that pg_upgrade generated

What we are finding in testing is we get thru the upgrade process, start the 9.5 server manually to compare the row counts we collected immediately before the upgrade and we are seeing different data, sometimes drastically. This data includes missing user accounts, etc, that is not perishable data.

For example - 9.2 standalone before running pg_upgrade gives this information
tracks=# select count(*) from users;
count
-------
13945
(1 row)

tracks=# select count(*) from sessions;
count
-------
  559
(1 row)

PG 9.5 after the upgrade - same machine, upgraded database

tracks=# select count(*) from sessions;
-------
  155
(1 row)

tracks=# select count(*) from users;
count
-------
13157
(1 row)


We have enabled verbose logging and have reviewed it but are seeing no errors in the migration. The 9.5 server starts without complaint and if we did not do these manual checks we would not see a problem. We have attempted this upgrade on this machine repeatedly in prepping for our maintenance windows, and on occasion our manual checks have shown good numbers.  At this point, though, it has failed silently enough times that I have little confidence in the data even if our rudimentary checks do show identical counts after the upgrade.

Has anyone else seen this issue?

The only 'hack' during the upgrade that isn't straight from the postgres_upgrade man page is to handle a renamed option in postgresql that cents back ported - unix_socket_directories. Using the rename of pg_ctl that is referenced in the top answer here to force a couple variables into the centos provided pg_ctl binary - http://dba.stackexchange.com/questions/50135/pg-upgrade-unrecognized-configuration-parameter-unix-socket-directory


Any information or suggestions would be helpful at this point.








-- 
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Ioana Danes
Date:
Subject: Re: ERROR: MultiXactId XXXXX has not been created yet -- apparent wraparound
Next
From: Adrian Klaver
Date:
Subject: Re: ERROR: MultiXactId XXXXX has not been created yet -- apparent wraparound