Thread: pg_standby replication problem

pg_standby replication problem

From
Khangelani Gama
Date:
Hi All

I would like to re-post the problem we have. The secondary server ran out
the disc space due the replication problem (Connection Time out).Since there
was a Connection time out Problem in the primary server, how can I make disc
space in the secondary server for the replication to continue from where it
stopped. Do I remove walfiles from the secondary server?


Below it's the details of the log file from the primary server:

replication started: Sun Jun  8 00:05:26 SAST 2014 source:
pg_xlog/0000000500004BAF000000AF, dest: 0000000500004BAF000000AF replication
finished: Sun Jun  8 00:05:33 SAST 2014 replication started: Sun Jun  8
00:05:33 SAST 2014 source: pg_xlog/0000000500004BAF000000B0, dest:
0000000500004BAF000000B0
ssh: connect to host 10.58.101.10 port 22: Connection timed out^M
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
replication finished: Sun Jun  8 00:07:41 SAST 2014 replication started: Sun
Jun  8 00:07:41 SAST 2014 source: pg_xlog/0000000500004BAF000000B1, dest:
0000000500004BAF000000B1 replication finished: Sun Jun  8 00:07:53 SAST 2014
replication started: Sun Jun  8 00:07:53 SAST 2014 source:
pg_xlog/0000000500004BAF000000B2, dest: 0000000500004BAF000000B2 replication
finished: Sun Jun  8 00:07:57 SAST 2014 replication started: Sun Jun  8
00:07:58 SAST 2014 source: pg_xlog/0000000500004BAF000000B3, dest:
0000000500004BAF000000B3 replication finished: Sun Jun  8 00:08:06 SAST 2014
replication started: Sun Jun  8 00:08:06 SAST 2014 source:
pg_xlog/0000000500004BAF000000B4, dest: 0000000500004BAF000000B4 replication
finished: Sun Jun  8 00:08:11 SAST 2014 replication started: Sun Jun  8
00:08:11 SAST 2014 source: pg_xlog/0000000500004BAF000000B5, dest:
0000000500004BAF000000B5 replication finished: Sun Jun  8 00:08:16 SAST 2014
replication started: Sun Jun  8 00:08:16 SAST 2014 source:
pg_xlog/0000000500004BAF000000B6, dest: 0000000500004BAF000000B6 replication
finished: Sun Jun  8 00:08:22 SAST 2014




Disc space Breakdown:


4.0K    ./backup
12K     ./copy
4.9T    ./data
204K    ./test
16K     ./lost+found
361G    ./walfiles
5.3T    .



Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              57G   15G   39G  28% /
/dev/mapper/vg0-pgsql2
                      5.4T  5.3T     0 100% /pgsql2
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                  30G     0   30G   0% /dev/shm


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.



Re: pg_standby replication problem

From
Khangelani Gama
Date:
This is the standby replication setting with archive_command sending the
WALs from master to standby.



-----Original Message-----
From: Khangelani Gama [mailto:kgama@argility.com]
Sent: Monday, June 09, 2014 8:06 PM
To: pgsql-general@postgresql.org
Subject: pg_standby replication problem

Hi All

I would like to re-post the problem we have. The secondary server ran out
the disc space due the replication problem (Connection Time out).Since there
was a Connection time out Problem in the primary server, how can I make disc
space in the secondary server for the replication to continue from where it
stopped. Do I remove walfiles from the secondary server?


Below it's the details of the log file from the primary server:

replication started: Sun Jun  8 00:05:26 SAST 2014 source:
pg_xlog/0000000500004BAF000000AF, dest: 0000000500004BAF000000AF replication
finished: Sun Jun  8 00:05:33 SAST 2014 replication started: Sun Jun  8
00:05:33 SAST 2014 source: pg_xlog/0000000500004BAF000000B0, dest:
0000000500004BAF000000B0
ssh: connect to host 10.58.101.10 port 22: Connection timed out^M
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
replication finished: Sun Jun  8 00:07:41 SAST 2014 replication started: Sun
Jun  8 00:07:41 SAST 2014 source: pg_xlog/0000000500004BAF000000B1, dest:
0000000500004BAF000000B1 replication finished: Sun Jun  8 00:07:53 SAST 2014
replication started: Sun Jun  8 00:07:53 SAST 2014 source:
pg_xlog/0000000500004BAF000000B2, dest: 0000000500004BAF000000B2 replication
finished: Sun Jun  8 00:07:57 SAST 2014 replication started: Sun Jun  8
00:07:58 SAST 2014 source: pg_xlog/0000000500004BAF000000B3, dest:
0000000500004BAF000000B3 replication finished: Sun Jun  8 00:08:06 SAST 2014
replication started: Sun Jun  8 00:08:06 SAST 2014 source:
pg_xlog/0000000500004BAF000000B4, dest: 0000000500004BAF000000B4 replication
finished: Sun Jun  8 00:08:11 SAST 2014 replication started: Sun Jun  8
00:08:11 SAST 2014 source: pg_xlog/0000000500004BAF000000B5, dest:
0000000500004BAF000000B5 replication finished: Sun Jun  8 00:08:16 SAST 2014
replication started: Sun Jun  8 00:08:16 SAST 2014 source:
pg_xlog/0000000500004BAF000000B6, dest: 0000000500004BAF000000B6 replication
finished: Sun Jun  8 00:08:22 SAST 2014




Disc space Breakdown:


4.0K    ./backup
12K     ./copy
4.9T    ./data
204K    ./test
16K     ./lost+found
361G    ./walfiles
5.3T    .



Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              57G   15G   39G  28% /
/dev/mapper/vg0-pgsql2
                      5.4T  5.3T     0 100% /pgsql2
/dev/sda1              99M   12M   83M  13% /boot
tmpfs                  30G     0   30G   0% /dev/shm


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.



Re: pg_standby replication problem

From
Alan Hodgson
Date:
On Monday, June 09, 2014 08:05:41 PM Khangelani Gama wrote:
> Hi All
>
> I would like to re-post the problem we have. The secondary server ran out
> the disc space due the replication problem (Connection Time out).

The secondary server would not (could not) run out of drive space due to a
problem on the primary. You probably need to figure out why that server is out
of drive space and fix it, and then I expect your replication problem will fix
itself. If you do not have a process cleaning up old archived WAL files, and
those are stored on the secondary, that could be the source of your problem.

If you also have a separate networking issue (for the connection timeout),
then you might need to fix that, too.




Re: pg_standby replication problem

From
Adrian Klaver
Date:
On 06/09/2014 11:15 AM, Khangelani Gama wrote:
> This is the standby replication setting with archive_command sending the
> WALs from master to standby.
>
>

What are the conf settings on the standby server?


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: pg_standby replication problem

From
Khangelani Gama
Date:
-----Original Message-----
From: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
Sent: Tuesday, June 10, 2014 1:42 AM
To: Khangelani Gama; pgsql-general@postgresql.org
Subject: Re: [GENERAL] pg_standby replication problem

On 06/09/2014 11:15 AM, Khangelani Gama wrote:
> This is the standby replication setting with archive_command sending
> the WALs from master to standby.
>
>



Thanks  for feedback from everyone, I will try and remove the correct old
walfiles.




What are the conf settings on the standby server?



Standby server config settings are as follows:


# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

wal_level = archive
#wal_level = minimal                    # minimal, archive, or hot_standby
                                        # (change requires restart)
#fsync = on                             # turns forced synchronization on or
off
synchronous_commit = off                # synchronization level; on, off, or
local
#wal_sync_method = fsync




# - Checkpoints -

checkpoint_segments = 128
checkpoint_timeout = 15min
checkpoint_warning = 885s
#checkpoint_segments = 3

# - Archiving -

archive_mode = on
#archive_mode = off             # allows archiving to be done
                                # (change requires restart)

# REPLICATION
#------------------------------------------------------------------------------

# - Master Server -

# These settings are ignored on a standby server

max_wal_senders = 3             # max number of walsender processes
                                # (change requires restart)



# This is used when logging to stderr:
logging_collector = on
#logging_collector = off







--
Adrian Klaver
adrian.klaver@aklaver.com


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.



Re: pg_standby replication problem

From
Adrian Klaver
Date:
On 06/09/2014 10:02 PM, Khangelani Gama wrote:
> -----Original Message-----
> From: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
> Sent: Tuesday, June 10, 2014 1:42 AM
> To: Khangelani Gama; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] pg_standby replication problem
>
> On 06/09/2014 11:15 AM, Khangelani Gama wrote:
>> This is the standby replication setting with archive_command sending
>> the WALs from master to standby.
>>
>>
>
>
>
> Thanks  for feedback from everyone, I will try and remove the correct old
> walfiles.
>
>
>
>
> What are the conf settings on the standby server?
>
>
>
> Standby server config settings are as follows:
>

My mistake, I should have been more specific.

What is in your recovery.conf file on the standby?

Is the standby something you really want to try to rescue at this point
or would it be feasible just to start over?

If you do decide to start over a little time spent on what you want to
happen would help out.

Options to look at:

1) Streaming replication. WAL files are streamed from master to standby.

2) Hot standby. The standby can process read only queries while in
standby mode.

3) Archiving. WAL files are archived in a location for possible use by
standby. Needs some mechanism to prune files or you could fill a disk again.

All 3 of the above can be combined if desired, which is where the
thinking time part comes in.

Places to start looking for information:

http://www.postgresql.org/docs/9.3/interactive/high-availability.html

https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial



--
Adrian Klaver
adrian.klaver@aklaver.com


Re: pg_standby replication problem

From
Khangelani Gama
Date:
Thank You, I will have a look.

-----Original Message-----
From: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
Sent: Tuesday, June 10, 2014 3:45 PM
To: Khangelani Gama; pgsql-general@postgresql.org
Subject: Re: [GENERAL] pg_standby replication problem

On 06/09/2014 10:02 PM, Khangelani Gama wrote:
> -----Original Message-----
> From: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
> Sent: Tuesday, June 10, 2014 1:42 AM
> To: Khangelani Gama; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] pg_standby replication problem
>
> On 06/09/2014 11:15 AM, Khangelani Gama wrote:
>> This is the standby replication setting with archive_command sending
>> the WALs from master to standby.
>>
>>
>
>
>
> Thanks  for feedback from everyone, I will try and remove the correct
> old walfiles.
>
>
>
>
> What are the conf settings on the standby server?
>
>
>
> Standby server config settings are as follows:
>

My mistake, I should have been more specific.

What is in your recovery.conf file on the standby?

Is the standby something you really want to try to rescue at this point or
would it be feasible just to start over?

If you do decide to start over a little time spent on what you want to
happen would help out.

Options to look at:

1) Streaming replication. WAL files are streamed from master to standby.

2) Hot standby. The standby can process read only queries while in standby
mode.

3) Archiving. WAL files are archived in a location for possible use by
standby. Needs some mechanism to prune files or you could fill a disk again.

All 3 of the above can be combined if desired, which is where the thinking
time part comes in.

Places to start looking for information:

http://www.postgresql.org/docs/9.3/interactive/high-availability.html

https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial



--
Adrian Klaver
adrian.klaver@aklaver.com


CONFIDENTIALITY NOTICE
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential
information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by
anyone
other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer
immediately
and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no
liability
for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.