Re: Recovery - New Slave PostgreSQL 9.2 - Mailing list pgsql-admin

From Rajesh Madiwale
Subject Re: Recovery - New Slave PostgreSQL 9.2
Date
Msg-id CALDEMcQRS8sWYpS-bpXa_1bDq+E15z3SMk0BUuJ1XBjd7enhJA@mail.gmail.com
Whole thread Raw
In response to Re: Recovery - New Slave PostgreSQL 9.2  ("drum.lucas@gmail.com" <drum.lucas@gmail.com>)
Responses Re: Recovery - New Slave PostgreSQL 9.2
List pgsql-admin

Hi Lucas,

 If  .history file  present in newstandby/pg_xlog directory then move that file from it and also check same file in wal_archive and move from there as well and try by restarting new standby 

Regards,
Rajesh.


..On Sunday, January 10, 2016, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
Should I point of replication new slave to same DB?

Lucas

On Sunday, 10 January 2016, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
John, 

I'd recommend that you'd specify -X s, as just specifying -X or
-xiog gives you the default value of fetch rather than stream.

Sorry.. I've understood it wrong. 
So you'd recommend to re-run the pg_basebackup with --xlog-method=stream ?

I'd hope that could find another way. As the pg_basebackup takes 30h to complete :(



Lucas Possamai


On 10 January 2016 at 12:21, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
I'd recommend that you'd specify -X s, as just specifying -X or
-xiog gives you the default value of fetch rather than stream. Also, from your current WAL directory listing that you just provided, that's indicating that your server's timelines are far different.

I don't think it's necessary to use -X - Check HERE
--xlog
Using this option is equivalent of using -X with method fetch.

-------------------------------

Now, you're saying that one system went down, which is why you're trying to do this, but was the first slave that failed? Or did your primary fail? That would possibly explain why the timelines are different. If your primary failed and this standby assumed command, then its timeline would have incremented. So, if you're trying to put this one back as a slave, that's not a really trivial process. You'd have to set the old primary back up a slave to the current primary, and then execute another failover, this time back to your original primary, and then rebuild all the slaves all over. 

PAST SCENARIO:
master1 -->slave1 -->slave2
-->slave1 -->db-slave0 - this one went down

NEW SCENARIO:
master1 -->slave1 -->slave2
-->slave1 -->newslave (This is that one I'm setting up)



Lucas Possamai


On 10 January 2016 at 12:16, John Scalia <jayknowsunix@gmail.com> wrote:
I'd recommend that you'd specify -X s, as just specifying -X or
-xiog gives you the default value of fetch rather than stream. Also, from your current WAL directory listing that you just provided, that's indicating that your server's timelines are far different.

Now, you're saying that one system went down, which is why you're trying to do this, but was the first slave that failed? Or did your primary fail? That would possibly explain why the timelines are different. If your primary failed and this standby assumed command, then its timeline would have incremented. So, if you're trying to put this one back as a slave, that's not a really trivial process. You'd have to set the old primary back up a slave to the current primary, and then execute another failover, this time back to your original primary, and then rebuild all the slaves all over. 

Just saying,
Jay

Sent from my iPad

On Jan 9, 2016, at 3:48 PM, "drum.lucas@gmail.com" <drum.lucas@gmail.com> wrote:

Hi John,

First, when you built the slave server, I'm assuming you used pg_basebackup and if you did, did you specify -X s in your command?

Yep. I ran the pg_basebackup into the new slave from ANOTHER SLAVE... 
ssh postgres@slave1 'pg_basebackup --pgdata=- --format=tar --label=bb_master --progress --host=localhost --port=5432 --username=replicator --xlog | pv --quiet --rate-limit 100M' | tar -x --no-same-owner

-X = --xlog

On my new Slave, I've got all the wall archives. (The master copies the wal at all the time...)
ls /var/lib/pgsql/9.2/wal_archive:
0000000200000C6A0000002D
0000000200000C6A0000002E

and not 
../wal_archive/0000000400000C68000000C8` not found
../wal_archive/00000005.history` not found

Remember that I'm trying to do a cascading replication (It was working with another slave. But the server went down and I'm trying to set up a new one)

I would suggest, in spite of of the 2TB size, rebuilding the standby servers with a proper pg_basebackup.

I've already ran the pg_basebackup over than once. And I always get the same error... :(

Is there anything else guys? please,, help hehehhe



Lucas Possamai


On 10 January 2016 at 10:33, John Scalia <jayknowsunix@gmail.com> wrote:
Hi,

I'm a little late to this thread, but in looking at the errors you originally posted, two things come to mind:

First, when you built the slave server, I'm assuming you used pg_basebackup and if you did, did you specify -X s in your command?

Second, the missing history file isn't an issue, in case you're unfamiliar with this. However, yeah, the missing WAL segment is, as well as the bad timeline error.  Is that missing segment still on your primary?  You know you could just copy it manually to your standby and start from that. As far as the timeline error, that's disturbing to me as it's claiming the primary is actually a failed over standby. AFAIK, that's the main if not only way transaction timelines increment.

I would suggest, in spite of of the 2TB size, rebuilding the standby servers with a proper pg_basebackup.
--
Jay

Sent from my iPad

On Jan 9, 2016, at 2:19 PM, "drum.lucas@gmail.com" <drum.lucas@gmail.com> wrote:

Hi, thanks for your reply... I've been working on this problem for 20h =(

# cat postgresql.conf | grep synchronous_standby_names
#synchronous_standby_names = '' - It's commented 

# cat postgresql.conf |  grep application_name
log_line_prefix = '%m|%p|%q[%c]@%r|%u|%a|%d '
( %a = application name )

I can't resyc all the DB again, because it has 2TB of data :(

Is there anything else I can do?
Thank you



Lucas Possamai


On 10 January 2016 at 04:22, Shreeyansh Dba <shreeyansh2014@gmail.com> wrote:


On Sat, Jan 9, 2016 at 3:28 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
My recovery was like that!
I was already using that way.. I still have the problem =\

Is there anything I can do?



Lucas Possamai


On 9 January 2016 at 22:53, Shreeyansh Dba <shreeyansh2014@gmail.com> wrote:

Hi Lucas,

Yes , now recovery.conf looks good.
Hope this solve you problem.


Thanks and regards,
ShreeyanshDBA Team
Shreeyansh Technologies



On Sat, Jan 9, 2016 at 3:07 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
Hi there!

Yep, it's correct: 
It looks like You have a set up A (Master) ---> B (Replica) ---> C Replica (Base backup from Replica B)

Master (A): 192.168.100.1
Slave1 (B): 192.168.100.2
Slave2 (C): 192.168.100.3

My recovery.conf in slave2(C) is:
restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.100.2 port=5432 user=replicator application_name=replication_slave02'
So, seems to be right to me... Is that u mean?

Thanks


Lucas Possamai


On 9 January 2016 at 22:25, Shreeyansh Dba <shreeyansh2014@gmail.com> wrote:
On Sat, Jan 9, 2016 at 8:29 AM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
* NOTE: I ran the pg_basebackup from another STANDBY SERVER. Not from the MASTER



Lucas Possamai


On 9 January 2016 at 15:28, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
Still trying to solve the problem...
Anyone can help please?

Lucas


Lucas Possamai


On 9 January 2016 at 14:45, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
Sure... Here's the total information:

recovery.conf:
restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.100.XX port=5432 user=replicator application_name=replication_new_slave'



Lucas Possamai


On 9 January 2016 at 14:37, Ian Barwick <ian@2ndquadrant.com> wrote:
On 16/01/09 9:23, drum.lucas@gmail.com wrote:
> Hi all!
>
> I've done the pg_basebackup from the live to a new slave server...
>
> I've recovery the wal files, but now that I configured to replicate from the master (recovery.conf) I got this error:
>
> ../wal_archive/0000000400000C68000000C8` not found
> ../wal_archive/00000005.history` not found
>
> FATAL:  timeline 2 of the primary does not match recovery target timeline 1

Can you post the contents of your recovery.conf file, suitably
anonymised if necessary?

Regards

Ian Barwick


Hi Lucas,

I followed your question I generated the same error:

cp: cannot stat `/pgdata/arch/00000003.history': No such file or directory
2016-01-09 14:11:42 IST FATAL:  timeline 1 of the primary does not
match recovery target timeline 2

It looks like You have a set up A (Master) ---> B (Replica) ---> C Replica (Base backup from Replica B)

It seems you have used recovery.conf (to replicate from master to slave) to new replica setup C and there is high probability not changing the primary connection info
in C's recovery.conf (Replica B's Connection info)

During testing providing B's connection info in C's recovery.conf resolved the issue.

Please verify the Primary connection info parameter in recovery.conf (C replica) might resolve your problem.



Thanks and regards,
ShreeyanshDBA Team
Shreeyansh Technologies





Hi Lucas,

It looks like application_name parameter that set in recovery.conf may mismatch. 
Please verify the value to synchronous_standby_names  value set in the postgresql.conf of Replica - C and the value that using as application_name in recovery.conf

Also, check whether the Async replication works with out using application_name in recovery.conf of replica -C and check the status in pg_stat_replication catalog table.


Thanks and regards
ShreeyanshDBA Team
Shreeyansh Technologies
Y





--


Lucas Possamai


pgsql-admin by date:

Previous
From: John Scalia
Date:
Subject: Re: Recovery - New Slave PostgreSQL 9.2
Next
From: "drum.lucas@gmail.com"
Date:
Subject: Re: Recovery - New Slave PostgreSQL 9.2