Re: Recovery - New Slave PostgreSQL 9.2 - Mailing list pgsql-admin
From | Rajesh Madiwale |
---|---|
Subject | Re: Recovery - New Slave PostgreSQL 9.2 |
Date | |
Msg-id | CALDEMcQRS8sWYpS-bpXa_1bDq+E15z3SMk0BUuJ1XBjd7enhJA@mail.gmail.com Whole thread Raw |
In response to | Re: Recovery - New Slave PostgreSQL 9.2 ("drum.lucas@gmail.com" <drum.lucas@gmail.com>) |
Responses |
Re: Recovery - New Slave PostgreSQL 9.2
|
List | pgsql-admin |
Hi Lucas,
If .history file present in newstandby/pg_xlog directory then move that file from it and also check same file in wal_archive and move from there as well and try by restarting new standby
Regards,
Rajesh.
..On Sunday, January 10, 2016, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:
Should I point of replication new slave to same DB?Lucas
On Sunday, 10 January 2016, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:John,
I'd recommend that you'd specify -X s, as just specifying -X or
-xiog gives you the default value of fetch rather than stream.Sorry.. I've understood it wrong.So you'd recommend to re-run the pg_basebackup with --xlog-method=stream ?I'd hope that could find another way. As the pg_basebackup takes 30h to complete :(On 10 January 2016 at 12:21, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:I'd recommend that you'd specify -X s, as just specifying -X or
-xiog gives you the default value of fetch rather than stream. Also, from your current WAL directory listing that you just provided, that's indicating that your server's timelines are far different.I don't think it's necessary to use -X - Check HERE--xlogUsing this option is equivalent of using -X with method fetch.-------------------------------Now, you're saying that one system went down, which is why you're trying to do this, but was the first slave that failed? Or did your primary fail? That would possibly explain why the timelines are different. If your primary failed and this standby assumed command, then its timeline would have incremented. So, if you're trying to put this one back as a slave, that's not a really trivial process. You'd have to set the old primary back up a slave to the current primary, and then execute another failover, this time back to your original primary, and then rebuild all the slaves all over.PAST SCENARIO:master1 -->slave1 -->slave2-->slave1 -->db-slave0 - this one went downNEW SCENARIO:master1 -->slave1 -->slave2-->slave1 -->newslave (This is that one I'm setting up)On 10 January 2016 at 12:16, John Scalia <jayknowsunix@gmail.com> wrote:I'd recommend that you'd specify -X s, as just specifying -X or-xiog gives you the default value of fetch rather than stream. Also, from your current WAL directory listing that you just provided, that's indicating that your server's timelines are far different.Now, you're saying that one system went down, which is why you're trying to do this, but was the first slave that failed? Or did your primary fail? That would possibly explain why the timelines are different. If your primary failed and this standby assumed command, then its timeline would have incremented. So, if you're trying to put this one back as a slave, that's not a really trivial process. You'd have to set the old primary back up a slave to the current primary, and then execute another failover, this time back to your original primary, and then rebuild all the slaves all over.Just saying,Jay
Sent from my iPadHi John,First, when you built the slave server, I'm assuming you used pg_basebackup and if you did, did you specify -X s in your command?Yep. I ran the pg_basebackup into the new slave from ANOTHER SLAVE...ssh postgres@slave1 'pg_basebackup --pgdata=- --format=tar --label=bb_master --progress --host=localhost --port=5432 --username=replicator --xlog | pv --quiet --rate-limit 100M' | tar -x --no-same-owner-X = --xlogOn my new Slave, I've got all the wall archives. (The master copies the wal at all the time...)ls /var/lib/pgsql/9.2/wal_archive:0000000200000C6A0000002D0000000200000C6A0000002Eand not../wal_archive/0000000400000C68000000C8` not found../wal_archive/00000005.history` not foundRemember that I'm trying to do a cascading replication (It was working with another slave. But the server went down and I'm trying to set up a new one)I would suggest, in spite of of the 2TB size, rebuilding the standby servers with a proper pg_basebackup.I've already ran the pg_basebackup over than once. And I always get the same error... :(Is there anything else guys? please,, help hehehheOn 10 January 2016 at 10:33, John Scalia <jayknowsunix@gmail.com> wrote:Hi,I'm a little late to this thread, but in looking at the errors you originally posted, two things come to mind:First, when you built the slave server, I'm assuming you used pg_basebackup and if you did, did you specify -X s in your command?Second, the missing history file isn't an issue, in case you're unfamiliar with this. However, yeah, the missing WAL segment is, as well as the bad timeline error. Is that missing segment still on your primary? You know you could just copy it manually to your standby and start from that. As far as the timeline error, that's disturbing to me as it's claiming the primary is actually a failed over standby. AFAIK, that's the main if not only way transaction timelines increment.I would suggest, in spite of of the 2TB size, rebuilding the standby servers with a proper pg_basebackup.--Jay
Sent from my iPadHi, thanks for your reply... I've been working on this problem for 20h =(# cat postgresql.conf | grep synchronous_standby_names#synchronous_standby_names = '' - It's commented# cat postgresql.conf | grep application_namelog_line_prefix = '%m|%p|%q[%c]@%r|%u|%a|%d '( %a = application name )I can't resyc all the DB again, because it has 2TB of data :(Is there anything else I can do?Thank youOn 10 January 2016 at 04:22, Shreeyansh Dba <shreeyansh2014@gmail.com> wrote:YOn Sat, Jan 9, 2016 at 3:28 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:My recovery was like that!I was already using that way.. I still have the problem =\Is there anything I can do?On 9 January 2016 at 22:53, Shreeyansh Dba <shreeyansh2014@gmail.com> wrote:Hi Lucas,Yes , now recovery.conf looks good.Hope this solve you problem.Thanks and regards,ShreeyanshDBA TeamShreeyansh TechnologiesOn Sat, Jan 9, 2016 at 3:07 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:Hi there!Yep, it's correct:It looks like You have a set up A (Master) ---> B (Replica) ---> C Replica (Base backup from Replica B)Master (A): 192.168.100.1Slave1 (B): 192.168.100.2Slave2 (C): 192.168.100.3My recovery.conf in slave2(C) is:restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"' archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.100.2 port=5432 user=replicator application_name=replication_slave02'
So, seems to be right to me... Is that u mean?ThanksOn 9 January 2016 at 22:25, Shreeyansh Dba <shreeyansh2014@gmail.com> wrote:On Sat, Jan 9, 2016 at 8:29 AM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:* NOTE: I ran the pg_basebackup from another STANDBY SERVER. Not from the MASTEROn 9 January 2016 at 15:28, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:Still trying to solve the problem...Anyone can help please?LucasOn 9 January 2016 at 14:45, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:Sure... Here's the total information:recovery.conf:restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"' archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.100.XX port=5432 user=replicator application_name=replication_new_slave'
On 9 January 2016 at 14:37, Ian Barwick <ian@2ndquadrant.com> wrote:On 16/01/09 9:23, drum.lucas@gmail.com wrote:
> Hi all!
>
> I've done the pg_basebackup from the live to a new slave server...
>
> I've recovery the wal files, but now that I configured to replicate from the master (recovery.conf) I got this error:
>
> ../wal_archive/0000000400000C68000000C8` not found
> ../wal_archive/00000005.history` not found
>
> FATAL: timeline 2 of the primary does not match recovery target timeline 1
Can you post the contents of your recovery.conf file, suitably
anonymised if necessary?
Regards
Ian BarwickHi Lucas,I followed your question I generated the same error:cp: cannot stat `/pgdata/arch/00000003.history': No such file or directory2016-01-09 14:11:42 IST FATAL: timeline 1 of the primary does notmatch recovery target timeline 2It looks like You have a set up A (Master) ---> B (Replica) ---> C Replica (Base backup from Replica B)It seems you have used recovery.conf (to replicate from master to slave) to new replica setup C and there is high probability not changing the primary connection infoin C's recovery.conf (Replica B's Connection info)During testing providing B's connection info in C's recovery.conf resolved the issue.Please verify the Primary connection info parameter in recovery.conf (C replica) might resolve your problem.Thanks and regards,ShreeyanshDBA TeamShreeyansh TechnologiesHi Lucas,It looks like application_name parameter that set in recovery.conf may mismatch.
Please verify the value to synchronous_standby_names value set in the postgresql.conf of Replica - C and the value that using as application_name in recovery.conf
Also, check whether the Async replication works with out using application_name in recovery.conf of replica -C and check the status in pg_stat_replication catalog table.Thanks and regardsShreeyanshDBA TeamShreeyansh Technologies
--
pgsql-admin by date: