Thread: 9.3 to 9.5 upgrade problems

9.3 to 9.5 upgrade problems

From

Andy Colson

Date:

03 July 2016, 18:07:05

Hi all,

I have a master (web1) and two slaves (web2, webserv), one slave is quite far from the master, the db is 112 Gig, so
pg_basebackupis my last resort. 

I followed the page here:
https://www.postgresql.org/docs/9.5/static/pgupgrade.html

including the rsync stuff.  I practiced it _twice_, once in PG 9.5 beta, and again a week ago, on two VM's I created
locally. Both practice sessions worked perfect. 

I just ran it on the live databases.  The master seems ok, its running PG 9.5 now, I can login to it, and no errors in
thelog. 

Neither slave works.  After I'd gotten done with the pgupgrade steps, both slaves gave me this error:

FATAL:  database system identifier differs between the primary and standby

Sure enough pg_controldata show'd their database system id different (all three web1, web2, webserv were different.  no
matchesat all), so I'm assuming the rsync didnt rsync right, or I missed a step and ran it to early, or something ...
I'mnot quite sure. 

I needed to get the live website back up and running again, so I let the master go, ran analyze, and when it was
finished,used the steps here to try and resync: 

https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

on Master:
select pg_start_backup('clone',true);
rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/* web2:/pub/pg95/
select pg_stop_backup();
rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/


That ran pretty quick, and pg_controldata shows matching numbers, but when I start the slave I get:

,,2016-07-03 06:06:57.173 CDT,: LOG:  entering standby mode
,,2016-07-03 06:06:57.205 CDT,: LOG:  redo starts at 369/D6002228
,,2016-07-03 06:06:57.984 CDT,: LOG:  consistent recovery state reached at 369/DCC5DB90
,,2016-07-03 06:06:57.984 CDT,: LOG:  database system is ready to accept read only connections
,,2016-07-03 06:06:57.984 CDT,: LOG:  invalid record length at 369/DD038ED0
,,2016-07-03 06:06:58.344 CDT,: LOG:  started streaming WAL from primary at 369/DD000000 on timeline 1
web,[unknown],2016-07-03 06:07:11.176 CDT,[local]: FATAL:  role "andy" does not exist

I can login as myself on the master, but not on the slave.  when I "psql -U postgres" on the slave I get:

psql: FATAL:  cache lookup failed for database 16401

This is only on web2, its close to web1, so I'm hoping I can get it fixed and then rsync it quickly to the far away
slave.

I'm at a loss here, any hints or suggestions would be appreciated.

Thanks,

-Andy

Re: 9.3 to 9.5 upgrade problems

From

Vick Khera

Date:

03 July 2016, 18:11:52

binary replication requires the versions be identical. Also, once you ran pg_upgrade you altered one of the copies so binary replication can no longer work on that either.

On Sun, Jul 3, 2016 at 11:06 AM, Andy Colson <andy@squeakycode.net> wrote:

Hi all,

I have a master (web1) and two slaves (web2, webserv), one slave is quite far from the master, the db is 112 Gig, so pg_basebackup is my last resort.

I followed the page here:
https://www.postgresql.org/docs/9.5/static/pgupgrade.html

including the rsync stuff. I practiced it _twice_, once in PG 9.5 beta, and again a week ago, on two VM's I created locally. Both practice sessions worked perfect.

I just ran it on the live databases. The master seems ok, its running PG 9.5 now, I can login to it, and no errors in the log.

Neither slave works. After I'd gotten done with the pgupgrade steps, both slaves gave me this error:

FATAL: database system identifier differs between the primary and standby

Sure enough pg_controldata show'd their database system id different (all three web1, web2, webserv were different. no matches at all), so I'm assuming the rsync didnt rsync right, or I missed a step and ran it to early, or something ... I'm not quite sure.

I needed to get the live website back up and running again, so I let the master go, ran analyze, and when it was finished, used the steps here to try and resync:

https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

on Master:
select pg_start_backup('clone',true);
rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/* web2:/pub/pg95/
select pg_stop_backup();
rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/

That ran pretty quick, and pg_controldata shows matching numbers, but when I start the slave I get:

,,2016-07-03 06:06:57.173 CDT,: LOG: entering standby mode
,,2016-07-03 06:06:57.205 CDT,: LOG: redo starts at 369/D6002228
,,2016-07-03 06:06:57.984 CDT,: LOG: consistent recovery state reached at 369/DCC5DB90
,,2016-07-03 06:06:57.984 CDT,: LOG: database system is ready to accept read only connections
,,2016-07-03 06:06:57.984 CDT,: LOG: invalid record length at 369/DD038ED0
,,2016-07-03 06:06:58.344 CDT,: LOG: started streaming WAL from primary at 369/DD000000 on timeline 1
web,[unknown],2016-07-03 06:07:11.176 CDT,[local]: FATAL: role "andy" does not exist

I can login as myself on the master, but not on the slave. when I "psql -U postgres" on the slave I get:

psql: FATAL: cache lookup failed for database 16401

This is only on web2, its close to web1, so I'm hoping I can get it fixed and then rsync it quickly to the far away slave.

I'm at a loss here, any hints or suggestions would be appreciated.

Thanks,

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: 9.3 to 9.5 upgrade problems

From

Andy Colson

Date:

03 July 2016, 18:18:08

> On Sun, Jul 3, 2016 at 11:06 AM, Andy Colson <andy@squeakycode.net <mailto:andy@squeakycode.net>> wrote:
>
>     Hi all,
>
>     I have a master (web1) and two slaves (web2, webserv), one slave is quite far from the master, the db is 112 Gig,
sopg_basebackup is my last resort. 
>
>     I followed the page here:
>     https://www.postgresql.org/docs/9.5/static/pgupgrade.html
>
>     including the rsync stuff.  I practiced it _twice_, once in PG 9.5 beta, and again a week ago, on two VM's I
createdlocally.  Both practice sessions worked perfect. 
>

On 07/03/2016 10:11 AM, Vick Khera wrote:
> binary replication requires the versions be identical. Also, once you ran pg_upgrade you altered one of the copies so
binaryreplication can no longer work on that either. 
>


Yes, all three boxes are running Pg 9.5.

I've uninstalled the PG 9.3 package, and delete the /pub/pg93 database directory.

My rsync copied the pg95 folder from web1 to web2.

-Andy

Re: 9.3 to 9.5 upgrade problems

From

Adrian Klaver

Date:

03 July 2016, 18:35:27

On 07/03/2016 08:06 AM, Andy Colson wrote:
> Hi all,
>
> I have a master (web1) and two slaves (web2, webserv), one slave is
> quite far from the master, the db is 112 Gig, so pg_basebackup is my
> last resort.
>
> I followed the page here:
> https://www.postgresql.org/docs/9.5/static/pgupgrade.html
>
> including the rsync stuff.  I practiced it _twice_, once in PG 9.5 beta,
> and again a week ago, on two VM's I created locally.  Both practice
> sessions worked perfect.
>
> I just ran it on the live databases.  The master seems ok, its running
> PG 9.5 now, I can login to it, and no errors in the log.
>
> Neither slave works.  After I'd gotten done with the pgupgrade steps,
> both slaves gave me this error:
>
> FATAL:  database system identifier differs between the primary and standby
>
> Sure enough pg_controldata show'd their database system id different
> (all three web1, web2, webserv were different.  no matches at all), so
> I'm assuming the rsync didnt rsync right, or I missed a step and ran it
> to early, or something ... I'm not quite sure.
>
> I needed to get the live website back up and running again, so I let the
> master go, ran analyze, and when it was finished, used the steps here to
> try and resync:
>
> https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
>
> on Master:
> select pg_start_backup('clone',true);
> rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
> web2:/pub/pg95/
> select pg_stop_backup();
> rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/

Not sure about above rsync, that seems to undo what you did previously.

Also was the remote directory empty when you did this?

>
>
> That ran pretty quick, and pg_controldata shows matching numbers, but
> when I start the slave I get:
>
> ,,2016-07-03 06:06:57.173 CDT,: LOG:  entering standby mode
> ,,2016-07-03 06:06:57.205 CDT,: LOG:  redo starts at 369/D6002228
> ,,2016-07-03 06:06:57.984 CDT,: LOG:  consistent recovery state reached
> at 369/DCC5DB90
> ,,2016-07-03 06:06:57.984 CDT,: LOG:  database system is ready to accept
> read only connections
> ,,2016-07-03 06:06:57.984 CDT,: LOG:  invalid record length at 369/DD038ED0
> ,,2016-07-03 06:06:58.344 CDT,: LOG:  started streaming WAL from primary
> at 369/DD000000 on timeline 1
> web,[unknown],2016-07-03 06:07:11.176 CDT,[local]: FATAL:  role "andy"
> does not exist
>
> I can login as myself on the master, but not on the slave.  when I "psql
> -U postgres" on the slave I get:
>
> psql: FATAL:  cache lookup failed for database 16401
>
> This is only on web2, its close to web1, so I'm hoping I can get it
> fixed and then rsync it quickly to the far away slave.
>
> I'm at a loss here, any hints or suggestions would be appreciated.
>
> Thanks,
>
> -Andy
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: 9.3 to 9.5 upgrade problems

From

Andy Colson

Date:

03 July 2016, 18:49:29

On 07/03/2016 10:35 AM, Adrian Klaver wrote:
> On 07/03/2016 08:06 AM, Andy Colson wrote:
>> Hi all,
>>
>> I have a master (web1) and two slaves (web2, webserv), one slave is
>> quite far from the master, the db is 112 Gig, so pg_basebackup is my
>> last resort.
>>
>> I followed the page here:
>> https://www.postgresql.org/docs/9.5/static/pgupgrade.html
>>
>> including the rsync stuff.  I practiced it _twice_, once in PG 9.5 beta,
>> and again a week ago, on two VM's I created locally.  Both practice
>> sessions worked perfect.
>>
>> I just ran it on the live databases.  The master seems ok, its running
>> PG 9.5 now, I can login to it, and no errors in the log.
>>
>> Neither slave works.  After I'd gotten done with the pgupgrade steps,
>> both slaves gave me this error:
>>
>> FATAL:  database system identifier differs between the primary and standby
>>
>> Sure enough pg_controldata show'd their database system id different
>> (all three web1, web2, webserv were different.  no matches at all), so
>> I'm assuming the rsync didnt rsync right, or I missed a step and ran it
>> to early, or something ... I'm not quite sure.
>>
>> I needed to get the live website back up and running again, so I let the
>> master go, ran analyze, and when it was finished, used the steps here to
>> try and resync:
>>
>> https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
>>
>> on Master:
>> select pg_start_backup('clone',true);
>> rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
>> web2:/pub/pg95/
>> select pg_stop_backup();
>> rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/
>
> Not sure about above rsync, that seems to undo what you did previously.
>
> Also was the remote directory empty when you did this?
>

Not sure what you mean by undo.  pgupgrade.html page, step 10, has you rsync the master to the slave, so the pg95
directoryis hard linked to the pg93, which save's a ton to bandwidth when your servers are cross county. 

My second rsync did the same thing, but only on the pg95 directory (my db lives in /pub/pg95).

No, the directory was not empty, and I'm really trying to avoid a fresh copy of 112 Gig.

-Andy

Re: 9.3 to 9.5 upgrade problems

From

Adrian Klaver

Date:

03 July 2016, 19:04:55

On 07/03/2016 08:49 AM, Andy Colson wrote:
> On 07/03/2016 10:35 AM, Adrian Klaver wrote:
>> On 07/03/2016 08:06 AM, Andy Colson wrote:
>>> Hi all,
>>>
>>> I have a master (web1) and two slaves (web2, webserv), one slave is
>>> quite far from the master, the db is 112 Gig, so pg_basebackup is my
>>> last resort.
>>>
>>> I followed the page here:
>>> https://www.postgresql.org/docs/9.5/static/pgupgrade.html
>>>
>>> including the rsync stuff.  I practiced it _twice_, once in PG 9.5 beta,
>>> and again a week ago, on two VM's I created locally.  Both practice
>>> sessions worked perfect.
>>>
>>> I just ran it on the live databases.  The master seems ok, its running
>>> PG 9.5 now, I can login to it, and no errors in the log.
>>>
>>> Neither slave works.  After I'd gotten done with the pgupgrade steps,
>>> both slaves gave me this error:
>>>
>>> FATAL:  database system identifier differs between the primary and
>>> standby
>>>
>>> Sure enough pg_controldata show'd their database system id different
>>> (all three web1, web2, webserv were different.  no matches at all), so
>>> I'm assuming the rsync didnt rsync right, or I missed a step and ran it
>>> to early, or something ... I'm not quite sure.
>>>
>>> I needed to get the live website back up and running again, so I let the
>>> master go, ran analyze, and when it was finished, used the steps here to
>>> try and resync:
>>>
>>> https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
>>>
>>> on Master:
>>> select pg_start_backup('clone',true);
>>> rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
>>> web2:/pub/pg95/
>>> select pg_stop_backup();
>>> rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/
>>
>> Not sure about above rsync, that seems to undo what you did previously.
>>
>> Also was the remote directory empty when you did this?
>>
>
> Not sure what you mean by undo.  pgupgrade.html page, step 10, has you
> rsync the master to the slave, so the pg95 directory is hard linked to
> the pg93, which save's a ton to bandwidth when your servers are cross
> county.

I understand I am just trying to figure out what mixing methods
(pg-upgrade, pg_start_backup) is doing?

In particular the section on pg_start_backup:

https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP

starts with:

"Ensure that WAL archiving is enabled and working."

and from I gather that is not the case.

>
> My second rsync did the same thing, but only on the pg95 directory (my
> db lives in /pub/pg95).
>
> No, the directory was not empty, and I'm really trying to avoid a fresh
> copy of 112 Gig.
>
> -Andy
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: 9.3 to 9.5 upgrade problems

From

Andy Colson

Date:

03 July 2016, 19:42:24

On 07/03/2016 11:04 AM, Adrian Klaver wrote:
> On 07/03/2016 08:49 AM, Andy Colson wrote:
>> On 07/03/2016 10:35 AM, Adrian Klaver wrote:
>>> On 07/03/2016 08:06 AM, Andy Colson wrote:
>>>> Hi all,
>>>>
>>>> I have a master (web1) and two slaves (web2, webserv), one slave is
>>>> quite far from the master, the db is 112 Gig, so pg_basebackup is my
>>>> last resort.
>>>>
>>>> I followed the page here:
>>>> https://www.postgresql.org/docs/9.5/static/pgupgrade.html
>>>>
>>>> including the rsync stuff.  I practiced it _twice_, once in PG 9.5 beta,
>>>> and again a week ago, on two VM's I created locally.  Both practice
>>>> sessions worked perfect.
>>>>
>>>> I just ran it on the live databases.  The master seems ok, its running
>>>> PG 9.5 now, I can login to it, and no errors in the log.
>>>>
>>>> Neither slave works.  After I'd gotten done with the pgupgrade steps,
>>>> both slaves gave me this error:
>>>>
>>>> FATAL:  database system identifier differs between the primary and
>>>> standby
>>>>
>>>> Sure enough pg_controldata show'd their database system id different
>>>> (all three web1, web2, webserv were different.  no matches at all), so
>>>> I'm assuming the rsync didnt rsync right, or I missed a step and ran it
>>>> to early, or something ... I'm not quite sure.
>>>>
>>>> I needed to get the live website back up and running again, so I let the
>>>> master go, ran analyze, and when it was finished, used the steps here to
>>>> try and resync:
>>>>
>>>> https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
>>>>
>>>> on Master:
>>>> select pg_start_backup('clone',true);
>>>> rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
>>>> web2:/pub/pg95/
>>>> select pg_stop_backup();
>>>> rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/
>>>
>>> Not sure about above rsync, that seems to undo what you did previously.
>>>
>>> Also was the remote directory empty when you did this?
>>>
>>
>> Not sure what you mean by undo.  pgupgrade.html page, step 10, has you
>> rsync the master to the slave, so the pg95 directory is hard linked to
>> the pg93, which save's a ton to bandwidth when your servers are cross
>> county.
>
> I understand I am just trying to figure out what mixing methods (pg-upgrade, pg_start_backup) is doing?
>
> In particular the section on pg_start_backup:
>
> https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
>
> starts with:
>
> "Ensure that WAL archiving is enabled and working."
>
> and from I gather that is not the case.
>

The slave log says it reached a consistent state, and is accepting connections, which tells me it should run ok, maybe
notwith the newest data, but at least not get: 

FATAL:  cache lookup failed for database 16401

>
> Also was the remote directory empty when you did this?

Now that I think about this more, I think you're on to something.  I'm trying to get an _exact_ copy of the master db
ontothe slave.  Checking rsync man, it matches only on size and modified time, and I didn't include deletes. 

I'm going to re-try with this:

select pg_start_backup('clone',true);
rsync -av --delete --checksum --exclude pg_xlog --exclude postgresql.conf /pub/pg95/* web2:/pub/pg95/
select pg_stop_backup();
rsync -av --delete --checksum /pub/pg95/pg_xlog web2:/pub/pg95/

That should make sure the copies are exact.


-Andy

Re: 9.3 to 9.5 upgrade problems

From

Andy Colson

Date:

03 July 2016, 22:41:29

On 07/03/2016 11:42 AM, Andy Colson wrote:
>
> Now that I think about this more, I think you're on to something.
> I'm trying to get an _exact_ copy of the master db onto the slave.
> Checking rsync man, it matches only on size and modified time, and I
> didn't include deletes.
>
> I'm going to re-try with this:
>
> select pg_start_backup('clone',true); rsync -av --delete --checksum
> --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
> web2:/pub/pg95/ select pg_stop_backup(); rsync -av --delete
> --checksum /pub/pg95/pg_xlog web2:/pub/pg95/
>
> That should make sure the copies are exact.
>
>
> -Andy
>
>
>

That seemed to have worked.  Took forever, but once I started the slave on web2 I was able to login and run queries
withoutany errors or problems. 

Thanks Adrian,

-Andy

Re: 9.3 to 9.5 upgrade problems

From

Adrian Klaver

Date:

03 July 2016, 22:47:05

On 07/03/2016 12:41 PM, Andy Colson wrote:
> On 07/03/2016 11:42 AM, Andy Colson wrote:
>>
>> Now that I think about this more, I think you're on to something.
>> I'm trying to get an _exact_ copy of the master db onto the slave.
>> Checking rsync man, it matches only on size and modified time, and I
>> didn't include deletes.
>>
>> I'm going to re-try with this:
>>
>> select pg_start_backup('clone',true); rsync -av --delete --checksum
>> --exclude pg_xlog --exclude postgresql.conf /pub/pg95/*
>> web2:/pub/pg95/ select pg_stop_backup(); rsync -av --delete
>> --checksum /pub/pg95/pg_xlog web2:/pub/pg95/
>>
>> That should make sure the copies are exact.
>>
>>
>> -Andy
>>
>>
>>
>
> That seemed to have worked.  Took forever, but once I started the slave
> on web2 I was able to login and run queries without any errors or problems.

Glad it worked out.

>
> Thanks Adrian,
>
> -Andy
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com