Home > mailing lists

Re: postgres 9.0.2 replicated database is crashing - Mailing list pgsql-general

From	akp geek
Subject	Re: postgres 9.0.2 replicated database is crashing
Date	March 9, 2013 21:51:08
Msg-id	CACnhOfJ_D66_roOM8k2taMj-MCWmQJomO2go=KP9dmZmBi4Ztw@mail.gmail.com Whole thread Raw
In response to	Re: postgres 9.0.2 replicated database is crashing (Lonni J Friedman <netllama@gmail.com>)
Responses	Re: postgres 9.0.2 replicated database is crashing
List	pgsql-general

Tree view

thank you. As you mentioned, I understood that I am starting the streaming scratch which is not what I wanted to do.

Here is what I am planning to .

Our replication process was down since March5th.

1. Is it Ok to get all wals from March5th till now to standby pg_xlog

2. take pg_basebackup

3. export the data dir

4. stop backup

5. restart the standby.

Based on my understanding it should work, because when standby started, it is not able find the files, as I have deleted from the slave them when I tried to fix the issue

00000 2013-03-09 04:56:08 GMT LOG: entering standby mode

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E1

00000 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E1000020

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E2

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E3

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E4

00000 2013-03-09 04:56:10 GMT LOG: consistent recovery state reached at 106/E45AD4A8

00000 2013-03-09 04:56:10 GMT LOG: database system is ready to accept read only connections

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E5

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E6

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E7

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E8

cp: cannot access /backup/9.0.4/archive/0000000100000106000000E9

cp: cannot access /backup/9.0.4/archive/0000000100000106000000EA

00000 2013-03-09 04:56:12 GMT LOG: invalid record length at 106/EA10B8C0

cp: cannot access /backup/9.0.4/archive/0000000100000106000000EA

00000 2013-03-09 04:56:12 GMT LOG: streaming replication successfully connected to primary

[unknown] 00000 2013-03-09 04:57:00 GMT [unknown]LOG: connection received: host=[local]

[unknown] 00000 2013-03-09 04:57:03 GMT [unknown]LOG: connection received: host=[local]

[unknown] 10.155.253.43(51257) 00000 SELECT 2013-03-09 07:07:18 GMT prodLog: duration: 6316.649 ms

[unknown] 10.155.253.43(51257) 00000 idle 2013-03-09 07:47:53 GMT prodLog: disconnection: session time: 0:41:06.529 user=postgres database=fprod host=

10.155.253.43 port=51257

00000 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time

00000 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote 19419 buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s

00000 2013-03-09 08:25:47 GMT LOG: recovery restart point at 107/FB01B238

00000 2013-03-09 08:25:47 GMT DETAIL: last completed transaction was at log time 2013-03-09 08:25:41.85776+00

00000 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time

psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected EOF on client connection

psql [local] 00000 idle 2013-03-09 09:10:52 GMT prodLog: disconnection: session time: 2:15:06.351 user=postgres database=fprod host=[local]

XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index page in 100924/100937/1225845

XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf F isdelete

F updateBlkno:36483

00000 2013-03-09 09:23:46 GMT LOG: startup process (PID 3880) exited with exit code 1

00000 2013-03-09 09:23:46 GMT LOG: terminating any other active server processes

thanks

On Sat, Mar 9, 2013 at 11:16 AM, Lonni J Friedman <netllama@gmail.com> wrote:

That process merely sets up a new server, it doesn't start streaming,
unless the server has been configured correctly. You state that the
slave crashed after two hours. How did you make this determination?

All you seem to be doing is setting up the slave from scratch
repeatedly, and assuming that it will magically just work, rather than
understanding & debugging why its not working. Where is the log
output from your servers that shows what is really transpiring?

On Sat, Mar 9, 2013 at 6:51 AM, akp geek <akpgeek@gmail.com> wrote:
> Hi all -
>
> I am in desperate need of your help. The replication/streaming
> stopped working on March5th. I followed the following procedure to restart
> the streaming. After running it for couple of hours , the database is
> crashing on the slave. This is on our production server. Thanks for your
> help.
>
> Steps that I followed
>
> 1. SELECT pg_start_backup('label', true);
> 2. I created tar of data.
> 3. un tar the data on the slave.
> 4. deleted the pid file on the slave
> 5. created recovery.conf
> 5. SELECT pg_stop_backup(); on the primary
> 6. started the postgres on slave
> 7. After 2 hours, the slave server crashes
>
> Please help.
>
> Regards
>

pgsql-general by date:

From: Paul Jungwirth
Date: 09 March 2013, 21:20:33
Subject: Avoiding a deadlock

From: Lonni J Friedman
Date: 09 March 2013, 22:09:46
Subject: Re: postgres 9.0.2 replicated database is crashing

Re: postgres 9.0.2 replicated database is crashing - Mailing list pgsql-general

Previous

Next