Thread: Streaming replica refuse to do restore_command

Streaming replica refuse to do restore_command

From
Soni M
Date:
Hello All,

This is how i set up the db :
Slave using streaming replica.
We configure slave to run pg_dump which usually last for about 12 hours.
We have limited pg_xlog on slave.
Once the pg_xlog on slave is full while pg_dump still in progress.

2014-08-11 09:39:23.226 CDT,,,25779,,53d26b30.64b3,2,,2014-07-25 09:35:28 CDT,,0,FATAL,53100,"could not write to file ""pg_xlog/xlogtemp.25779"": No space left on device",,,,,,,,,""

after some restart point, the slave is trying to connect to primary and refuse to do restore_command.

Is this an expected behaviour?

Full log :

2014-08-11 09:39:23.226 CDT,,,25779,,53d26b30.64b3,2,,2014-07-25 09:35:28 CDT,,0,FATAL,53100,"could not write to file ""pg_xlog/xlogtemp.25779"": No space left on device",,,,,,,,,""
2014-08-11 09:46:36.163 CDT,,,14394,,53c582ea.383a,14031,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: time",,,,,,,,,""
2014-08-11 09:49:33.786 CDT,,,14394,,53c582ea.383a,14032,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 21862 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=176.358 s, sync=1.243 s, total=177.623 s; sync files=683, longest=0.159 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:49:33.787 CDT,,,14394,,53c582ea.383a,14033,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC5/36CE52A0","last completed transaction was at log time 2014-08-10 23:01:55.754838-05",,,,,,,,""
2014-08-11 09:49:33.790 CDT,,,14394,,53c582ea.383a,14034,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 09:51:37.256 CDT,,,14394,,53c582ea.383a,14035,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 76725 buffers (14.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=122.170 s, sync=1.284 s, total=123.465 s; sync files=770, longest=0.424 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:51:37.256 CDT,,,14394,,53c582ea.383a,14036,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC5/BE0D9590","last completed transaction was at log time 2014-08-11 01:10:49.220537-05",,,,,,,,""
2014-08-11 09:51:37.258 CDT,,,14394,,53c582ea.383a,14037,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 09:54:23.552 CDT,,,14394,,53c582ea.383a,14038,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 49243 buffers (9.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=164.859 s, sync=1.428 s, total=166.293 s; sync files=794, longest=0.157 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:54:23.552 CDT,,,14394,,53c582ea.383a,14039,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC6/571657C8","last completed transaction was at log time 2014-08-11 04:43:16.23604-05",,,,,,,,""
2014-08-11 09:54:23.561 CDT,,,14394,,53c582ea.383a,14040,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 09:57:51.829 CDT,,,14394,,53c582ea.383a,14041,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 215682 buffers (41.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=206.897 s, sync=1.364 s, total=208.267 s; sync files=697, longest=0.110 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:57:51.829 CDT,,,14394,,53c582ea.383a,14042,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC7/FBE0D48","last completed transaction was at log time 2014-08-11 06:42:22.051994-05",,,,,,,,""
2014-08-11 09:57:51.832 CDT,,,14394,,53c582ea.383a,14043,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:00:18.828 CDT,,,14394,,53c582ea.383a,14044,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 127358 buffers (24.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=145.721 s, sync=1.263 s, total=146.995 s; sync files=905, longest=0.038 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:00:18.828 CDT,,,14394,,53c582ea.383a,14045,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC7/E80EDE30","last completed transaction was at log time 2014-08-11 07:15:48.30725-05",,,,,,,,""
2014-08-11 10:00:18.833 CDT,,,14394,,53c582ea.383a,14046,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:03:27.063 CDT,,,14394,,53c582ea.383a,14047,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 168653 buffers (32.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=186.808 s, sync=1.409 s, total=188.229 s; sync files=886, longest=0.151 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:03:27.063 CDT,,,14394,,53c582ea.383a,14048,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC8/6D003160","last completed transaction was at log time 2014-08-11 07:40:58.390313-05",,,,,,,,""
2014-08-11 10:03:27.065 CDT,,,14394,,53c582ea.383a,14049,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:06:43.148 CDT,,,14394,,53c582ea.383a,14050,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 220017 buffers (42.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=194.483 s, sync=1.590 s, total=196.083 s; sync files=890, longest=0.245 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:06:43.149 CDT,,,14394,,53c582ea.383a,14051,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC9/2F02EEF0","last completed transaction was at log time 2014-08-11 08:09:49.370415-05",,,,,,,,""
2014-08-11 10:06:43.151 CDT,,,14394,,53c582ea.383a,14052,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:09:40.521 CDT,,,14394,,53c582ea.383a,14053,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 174348 buffers (33.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=175.910 s, sync=1.448 s, total=177.370 s; sync files=882, longest=0.172 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:09:40.521 CDT,,,14394,,53c582ea.383a,14054,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC9/FB00D4B8","last completed transaction was at log time 2014-08-11 08:40:41.226457-05",,,,,,,,""
2014-08-11 10:09:40.526 CDT,,,14394,,53c582ea.383a,14055,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:12:05.971 CDT,,,14394,,53c582ea.383a,14056,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 172634 buffers (32.9%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=143.857 s, sync=1.578 s, total=145.445 s; sync files=909, longest=0.087 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:12:05.971 CDT,,,14394,,53c582ea.383a,14057,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2ACA/E6895640","last completed transaction was at log time 2014-08-11 08:59:41.72795-05",,,,,,,,""
2014-08-11 10:12:05.979 CDT,,,14394,,53c582ea.383a,14058,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:14:46.943 CDT,,,14394,,53c582ea.383a,14059,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 144919 buffers (27.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=159.633 s, sync=1.323 s, total=160.964 s; sync files=865, longest=0.071 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:14:46.943 CDT,,,14394,,53c582ea.383a,14060,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2ACB/7D006098","last completed transaction was at log time 2014-08-11 09:25:45.324747-05",,,,,,,,""
2014-08-11 10:14:46.970 CDT,,,14394,,53c582ea.383a,14061,,2014-07-15 14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,1,,2014-08-11 10:15:53 CDT,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,2,,2014-08-11 10:15:53 CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:  requested WAL segment 0000000300002ACC000000A6 has already been removed
",,,,,,,,,""
2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,1,,2014-08-11 10:15:58 CDT,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,2,,2014-08-11 10:15:58 CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:  requested WAL segment 0000000300002ACC000000A6 has already been removed
",,,,,,,,,""
2014-08-11 10:16:04.037 CDT,,,29151,,53e8de34.71df,1,,2014-08-11 10:16:04 CDT,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
2014-08-11 10:16:04.037 CDT,,,29151,,53e8de34.71df,2,,2014-08-11 10:16:04 CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:  requested WAL segment 0000000300002ACC000000A6 has already been removed
",,,,,,,,,""
2014-08-11 10:16:09.432 CDT,,,29157,,53e8de39.71e5,1,,2014-08-11 10:16:09 CDT,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
2014-08-11 10:16:09.432 CDT,,,29157,,53e8de39.71e5,2,,2014-08-11 10:16:09 CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:  requested WAL segment 0000000300002ACC000000A6 has already been removed


--
Regards,

Soni Maula Harriz

Re: Streaming replica refuse to do restore_command

From
Michael Paquier
Date:
On Tue, Aug 12, 2014 at 2:10 PM, Soni M <diptatapa@gmail.com> wrote:
> This is how i set up the db :
> Slave using streaming replica.
> We configure slave to run pg_dump which usually last for about 12 hours.
> We have limited pg_xlog on slave.
> Once the pg_xlog on slave is full while pg_dump still in progress.
What is the version of Postgres you are using?

> after some restart point, the slave is trying to connect to primary and
> refuse to do restore_command.
> Is this an expected behaviour?
> 2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,2,,2014-08-11 10:15:53
> CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
> requested WAL segment 0000000300002ACC000000A6 has already been removed
> ",,,,,,,,,""
> 2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,1,,2014-08-11 10:15:58
> CDT,,0,LOG,00000,"streaming replication successfully connected to
> primary",,,,,,,,,""

A restart point is created each time a standby server shuts down
during WAL replay. It seems that you stopped it for a too long time,
and that WAL files that this server expected to find are done because
it was out of sight for a too long time. Then, when a standby starts
and enters in recovery, it tries first to recover necessary WAL files
from the archives with restore_command and then switches to streaming
mode if it cannot find what is necessary. Are you sure that you kept
enough WAL history in your archives and that restore_command is
pointing to the right path/host?
Regards,
--
Michael


Re: Streaming replica refuse to do restore_command

From
Soni M
Date:

On Tue, Aug 12, 2014 at 12:37 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Aug 12, 2014 at 2:10 PM, Soni M <diptatapa@gmail.com> wrote:
> This is how i set up the db :
> Slave using streaming replica.
> We configure slave to run pg_dump which usually last for about 12 hours.
> We have limited pg_xlog on slave.
> Once the pg_xlog on slave is full while pg_dump still in progress.
What is the version of Postgres you are using?

Pg 9.1 on from RHEL packages 

> after some restart point, the slave is trying to connect to primary and
> refuse to do restore_command.
> Is this an expected behaviour?
> 2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,2,,2014-08-11 10:15:53
> CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
> requested WAL segment 0000000300002ACC000000A6 has already been removed
> ",,,,,,,,,""
> 2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,1,,2014-08-11 10:15:58
> CDT,,0,LOG,00000,"streaming replication successfully connected to
> primary",,,,,,,,,""

A restart point is created each time a standby server shuts down
during WAL replay. It seems that you stopped it for a too long time,
and that WAL files that this server expected to find are done because
it was out of sight for a too long time. Then, when a standby starts
and enters in recovery, it tries first to recover necessary WAL files
from the archives with restore_command and then switches to streaming
mode if it cannot find what is necessary. Are you sure that you kept
enough WAL history in your archives and that restore_command is
pointing to the right path/host?
Regards,
--
Michael

the wal files needed still on the archives and it's on the right path.

I try this and it's solved.
I moved all pg_xlog files to a bigger storage and manually copy needed wal files from archive, and restart the server and then it runs smoothly

--
Regards,

Soni Maula Harriz