Thread: pg_standby replication problem

pg_standby replication problem

From
Khangelani Gama
Date:

 

Please help me with this, my secondary server shows a replication problem. It stopped at the file called 0000000500004BAF000000AFthen from here primary server kept on sending walfiles, until the walfiles used up the disc space in the data directory. How do I fix this problem. It’s postgres 9.1.2.

Postgres log file Postgres-2014-06-08_000000.log file has the following details :

 2014-06-08 00:15:54 SAST LOG:  restored log file "0000000500004BAF000000AF" from archive

Trigger file:         /tmp/recovery.pgsql.trigger.5432

Waiting for WAL file: 0000000500004BAF000000B0

WAL file path:        /pgsql2/walfiles/0000000500004BAF000000B0

Restoring to:         pg_xlog/RECOVERYXLOG

Sleep interval:       2 seconds

Max wait interval:    0 forever

Command for restore:  cp "/pgsql2/walfiles/0000000500004BAF000000B0" "pg_xlog/RECOVERYXLOG"

Keep archive history: 0000000500004BAE000000F7 and later

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

 

 

 

 


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

Re: pg_standby replication problem

From
Khangelani Gama
Date:

 

 

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda3              57G   15G   39G  28% /

/dev/mapper/vg0-pgsql2

                      5.4T  5.3T     0 100% /pgsql2

/dev/sda1              99M   12M   83M  13% /boot

tmpfs                  30G     0   30G   0% /dev/shm

 

 

Disc space Breakdown:

 

 

4.0K    ./backup

12K     ./copy

4.9T    ./data

204K    ./test

16K     ./lost+found

361G    ./walfiles

5.3T    .

 

 

 

 

 

From: Khangelani Gama [mailto:kgama@argility.com]
Sent: Monday, June 09, 2014 1:42 PM
To: pgsql-general@postgresql.org
Subject: pg_standby replication problem

 

 

Please help me with this, my secondary server shows a replication problem. It stopped at the file called 0000000500004BAF000000AFthen from here primary server kept on sending walfiles, until the walfiles used up the disc space in the data directory. How do I fix this problem. It’s postgres 9.1.2.

Postgres log file Postgres-2014-06-08_000000.log file has the following details :

 2014-06-08 00:15:54 SAST LOG:  restored log file "0000000500004BAF000000AF" from archive

Trigger file:         /tmp/recovery.pgsql.trigger.5432

Waiting for WAL file: 0000000500004BAF000000B0

WAL file path:        /pgsql2/walfiles/0000000500004BAF000000B0

Restoring to:         pg_xlog/RECOVERYXLOG

Sleep interval:       2 seconds

Max wait interval:    0 forever

Command for restore:  cp "/pgsql2/walfiles/0000000500004BAF000000B0" "pg_xlog/RECOVERYXLOG"

Keep archive history: 0000000500004BAE000000F7 and later

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

 

 

 

 

 


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

Re: pg_standby replication problem

From
Khangelani Gama
Date:

The big question we can’t answer is that when the replication was at this point (Command for restore:  cp "/pgsql2/walfiles/0000000500004BAF000000B0" "pg_xlog/RECOVERYXLOG"

) , it then started to say WAL file not present yet. We can’t find this 0000000500004BAF000000B0 file any where .

 

Command for restore:  cp "/pgsql2/walfiles/0000000500004BAF000000B0" "pg_xlog/RECOVERYXLOG"

Keep archive history: 0000000500004BAE000000F7 and later

WAL file not present yet. Checking for trigger file...

 

 

-rw------- 1 postgres postgres 16M Jun  7 21:42 0000000500004BAE000000F7

 

 

 

 

 

 

 

From: Khangelani Gama [mailto:kgama@argility.com]
Sent: Monday, June 09, 2014 1:45 PM
To: pgsql-general@postgresql.org
Subject: RE: pg_standby replication problem

 

 

 

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda3              57G   15G   39G  28% /

/dev/mapper/vg0-pgsql2

                      5.4T  5.3T     0 100% /pgsql2

/dev/sda1              99M   12M   83M  13% /boot

tmpfs                  30G     0   30G   0% /dev/shm

 

 

Disc space Breakdown:

 

 

4.0K    ./backup

12K     ./copy

4.9T    ./data

204K    ./test

16K     ./lost+found

361G    ./walfiles

5.3T    .

 

 

 

 

 

From: Khangelani Gama [mailto:kgama@argility.com]
Sent: Monday, June 09, 2014 1:42 PM
To: pgsql-general@postgresql.org
Subject: pg_standby replication problem

 

 

Please help me with this, my secondary server shows a replication problem. It stopped at the file called 0000000500004BAF000000AFthen from here primary server kept on sending walfiles, until the walfiles used up the disc space in the data directory. How do I fix this problem. It’s postgres 9.1.2.

Postgres log file Postgres-2014-06-08_000000.log file has the following details :

 2014-06-08 00:15:54 SAST LOG:  restored log file "0000000500004BAF000000AF" from archive

Trigger file:         /tmp/recovery.pgsql.trigger.5432

Waiting for WAL file: 0000000500004BAF000000B0

WAL file path:        /pgsql2/walfiles/0000000500004BAF000000B0

Restoring to:         pg_xlog/RECOVERYXLOG

Sleep interval:       2 seconds

Max wait interval:    0 forever

Command for restore:  cp "/pgsql2/walfiles/0000000500004BAF000000B0" "pg_xlog/RECOVERYXLOG"

Keep archive history: 0000000500004BAE000000F7 and later

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

 

 

 

 

 


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

Re: pg_standby replication problem

From
Khangelani Gama
Date:

Please please help

 

From: Khangelani Gama [mailto:kgama@argility.com]
Sent: Monday, June 09, 2014 1:42 PM
To: pgsql-general@postgresql.org
Subject: pg_standby replication problem

 

 

Please help me with this, my secondary server shows a replication problem. It stopped at the file called 0000000500004BAF000000AFthen from here primary server kept on sending walfiles, until the walfiles used up the disc space in the data directory. How do I fix this problem. It’s postgres 9.1.2.

Postgres log file Postgres-2014-06-08_000000.log file has the following details :

 2014-06-08 00:15:54 SAST LOG:  restored log file "0000000500004BAF000000AF" from archive

Trigger file:         /tmp/recovery.pgsql.trigger.5432

Waiting for WAL file: 0000000500004BAF000000B0

WAL file path:        /pgsql2/walfiles/0000000500004BAF000000B0

Restoring to:         pg_xlog/RECOVERYXLOG

Sleep interval:       2 seconds

Max wait interval:    0 forever

Command for restore:  cp "/pgsql2/walfiles/0000000500004BAF000000B0" "pg_xlog/RECOVERYXLOG"

Keep archive history: 0000000500004BAE000000F7 and later

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

WAL file not present yet. Checking for trigger file...

 

 

 

 

 


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

Re: pg_standby replication problem

From
Adrian Klaver
Date:
On 06/09/2014 07:28 AM, Khangelani Gama wrote:
> Please please help

Before anyone can help you will need to provide more information on what
your archiving, replication setup is. To begin:

1)Are you doing both archiving and streaming replication?

2) What are the settings in the configuration files for those operations?

3) What is the layout for archiving, in other words do the archived
files get copied remotely to a third site or some other arrangement?

4) What caused the trigger file to be set?


>
> *From:*Khangelani Gama [mailto:kgama@argility.com
> <mailto:kgama@argility.com>]
> *Sent:* Monday, June 09, 2014 1:42 PM
> *To:* pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>
> *Subject:* pg_standby replication problem
>
> Please help me with this, my secondary server shows a replication
> problem. It stopped at the file called *0000000500004BAF000000AF …*then
> from here primary server kept on sending walfiles, until the walfiles
> used up the disc space in the data directory. How do I fix this problem.
> It’s postgres 9.1.2.
>
> *_Postgres log file Postgres-2014-06-08_000000.log file _*_has the
> following details :_
>
>   2014-06-08 00:15:54 SAST LOG:  restored log file
> *"0000000500004BAF000000AF" from*archive
>
> Trigger file:         /tmp/recovery.pgsql.trigger.5432
>
> Waiting for WAL file: 0000000500004BAF000000B0
>
> WAL file path:        /pgsql2/walfiles/0000000500004BAF000000B0
>
> Restoring to:         pg_xlog/RECOVERYXLOG
>
> Sleep interval:       2 seconds
>
> Max wait interval:    0 forever
>
> *Command for restore:  cp "/pgsql2/walfiles/0000000500004BAF000000B0"
> "pg_xlog/RECOVERYXLOG"*
>
> Keep archive history: 0000000500004BAE000000F7 and later
>
> WAL file not present yet. Checking for trigger file...
>
> WAL file not present yet. Checking for trigger file...
>
> WAL file not present yet. Checking for trigger file...
>
> WAL file not present yet. Checking for trigger file...
>
> WAL file not present yet. Checking for trigger file...
>
>
> CONFIDENTIALITY NOTICE
> The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
> information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
> other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
> and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
> for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: pg_standby replication problem

From
Alan Hodgson
Date:
On Monday, June 09, 2014 04:28:53 PM Khangelani Gama wrote:
> Please help me with this, my secondary server shows a replication problem.
> It stopped at the file called *0000000500004BAF000000AF …*then from here
> primary server kept on sending walfiles, until the walfiles used up the
> disc space in the data directory. How do I fix this problem. It’s postgres
> 9.1.2.
>

It looks to me like your archive_command is probably failing on the primary
server. If that fails, the logs will build up and fill up your disk as
described. And they wouldn't be available to the slave to find.



Re: pg_standby replication problem

From
Khangelani Gama
Date:
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Alan Hodgson
Sent: Monday, June 09, 2014 4:51 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] pg_standby replication problem

On Monday, June 09, 2014 04:28:53 PM Khangelani Gama wrote:
> Please help me with this, my secondary server shows a replication problem.
> It stopped at the file called *0000000500004BAF000000AF …*then from
> here primary server kept on sending walfiles, until the walfiles used
> up the disc space in the data directory. How do I fix this problem.
> It’s postgres 9.1.2.
>

It looks to me like your archive_command is probably failing on the primary
server. If that fails, the logs will build up and fill up your disk as
described. And they wouldn't be available to the slave to find.


I am sorry, I am still trying to understand all the settings, the person who
set up the servers left the company.

In primary server, postgresql.conf shows the following:

# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

wal_level = archive
# - Checkpoints -

checkpoint_segments = 128
checkpoint_timeout = 15min
checkpoint_warning = 885s
# - Archiving -

archive_mode = on
#archive_mode = off             # allows archiving to be done
archive_command = '/home/cdbs/bin/run_replication.sh %p %f'

# REPLICATION
#------------------------------------------------------------------------------

# - Master Server -

# These settings are ignored on a standby server

max_wal_senders = 3



The setting archive_command points to a script being run and the variable %p
and %f being passed.




There is replication script running in the primary server  has the
following:


while [ $test = "false" ]
do
        rsync -a /pgsql2/data/${src}
postgres@10.58.101.10:/pgsql2/walfiles/${dest} >>
/tmp/run_replication.sh.out 2>> /tmp/run_replication.sh.out
        test=`ssh AB_CDS3 "if [ -f /pgsql2/walfiles/${dest} ];then echo
'true' ;else echo 'false';fi"`
        if [ ${test} = "false" ]
        then
                echo "Test is false for CDS3, sleeping 10" >>
/tmp/run_replication.sh.out
                sleep 10
                cnt=$(( $cnt + 1 ))
                if [ ${cnt} -ge 60 ]
                then
                        message="Replication ERROR: Unable to send WAL
file(${desc}) from CDS to CDS3"
                        echo "`date` : ${message}" >>
/tmp/run_replication.sh.out
                        sendsms
                fi
        fi
done





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.



Re: pg_standby replication problem

From
Khangelani Gama
Date:
I just saw got  this from the primary server (/tmp/run_replication.sh.out),
secondary server's IP 10.58.101.10.


replication started: Sun Jun  8 00:05:26 SAST 2014 source:
pg_xlog/0000000500004BAF000000AF, dest: 0000000500004BAF000000AF
replication finished: Sun Jun  8 00:05:33 SAST 2014
replication started: Sun Jun  8 00:05:33 SAST 2014 source:
pg_xlog/0000000500004BAF000000B0, dest: 0000000500004BAF000000B0
ssh: connect to host 10.58.101.10 port 22: Connection timed out^M
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
replication finished: Sun Jun  8 00:07:41 SAST 2014
replication started: Sun Jun  8 00:07:41 SAST 2014 source:
pg_xlog/0000000500004BAF000000B1, dest: 0000000500004BAF000000B1
replication finished: Sun Jun  8 00:07:53 SAST 2014
replication started: Sun Jun  8 00:07:53 SAST 2014 source:
pg_xlog/0000000500004BAF000000B2, dest: 0000000500004BAF000000B2
replication finished: Sun Jun  8 00:07:57 SAST 2014
replication started: Sun Jun  8 00:07:58 SAST 2014 source:
pg_xlog/0000000500004BAF000000B3, dest: 0000000500004BAF000000B3
replication finished: Sun Jun  8 00:08:06 SAST 2014
replication started: Sun Jun  8 00:08:06 SAST 2014 source:
pg_xlog/0000000500004BAF000000B4, dest: 0000000500004BAF000000B4
replication finished: Sun Jun  8 00:08:11 SAST 2014
replication started: Sun Jun  8 00:08:11 SAST 2014 source:
pg_xlog/0000000500004BAF000000B5, dest: 0000000500004BAF000000B5
replication finished: Sun Jun  8 00:08:16 SAST 2014
replication started: Sun Jun  8 00:08:16 SAST 2014 source:
pg_xlog/0000000500004BAF000000B6, dest: 0000000500004BAF000000B6
replication finished: Sun Jun  8 00:08:22 SAST 2014


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.



Re: pg_standby replication problem

From
Khangelani Gama
Date:
-----Original Message-----
From: Khangelani Gama [mailto:kgama@argility.com]
Sent: Monday, June 09, 2014 5:26 PM
To: 'Alan Hodgson'; 'pgsql-general@postgresql.org'
Subject: RE: [GENERAL] pg_standby replication problem

I just saw got  this from the primary server (/tmp/run_replication.sh.out),
secondary server's IP 10.58.101.10.


replication started: Sun Jun  8 00:05:26 SAST 2014 source:
pg_xlog/0000000500004BAF000000AF, dest: 0000000500004BAF000000AF replication
finished: Sun Jun  8 00:05:33 SAST 2014 replication started: Sun Jun  8
00:05:33 SAST 2014 source: pg_xlog/0000000500004BAF000000B0, dest:
0000000500004BAF000000B0
ssh: connect to host 10.58.101.10 port 22: Connection timed out^M
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
replication finished: Sun Jun  8 00:07:41 SAST 2014 replication started: Sun
Jun  8 00:07:41 SAST 2014 source: pg_xlog/0000000500004BAF000000B1, dest:
0000000500004BAF000000B1 replication finished: Sun Jun  8 00:07:53 SAST 2014
replication started: Sun Jun  8 00:07:53 SAST 2014 source:
pg_xlog/0000000500004BAF000000B2, dest: 0000000500004BAF000000B2 replication
finished: Sun Jun  8 00:07:57 SAST 2014 replication started: Sun Jun  8
00:07:58 SAST 2014 source: pg_xlog/0000000500004BAF000000B3, dest:
0000000500004BAF000000B3 replication finished: Sun Jun  8 00:08:06 SAST 2014
replication started: Sun Jun  8 00:08:06 SAST 2014 source:
pg_xlog/0000000500004BAF000000B4, dest: 0000000500004BAF000000B4 replication
finished: Sun Jun  8 00:08:11 SAST 2014 replication started: Sun Jun  8
00:08:11 SAST 2014 source: pg_xlog/0000000500004BAF000000B5, dest:
0000000500004BAF000000B5 replication finished: Sun Jun  8 00:08:16 SAST 2014
replication started: Sun Jun  8 00:08:16 SAST 2014 source:
pg_xlog/0000000500004BAF000000B6, dest: 0000000500004BAF000000B6 replication
finished: Sun Jun  8 00:08:22 SAST 2014




Since there was a Connection time out Problem in the primary server, how can
I make disc space in the secondary server for the replication to continue
from where it stopped. Do I remove waltfiles from the secondary server?

Disc space Breakdown:


4.0K    ./backup
12K     ./copy
4.9T    ./data
204K    ./test
16K     ./lost+found
361G    ./walfiles
5.3T    .


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.